Recent

Author Topic: DataProblems Maybe  (Read 5225 times)

lucamar

  • Hero Member
  • *****
  • Posts: 1942
Re: DataProblems Maybe
« Reply #15 on: April 24, 2019, 01:05:31 am »
Doing:
Code: [Select]
RCD.ICAO := ExtractWord(3,ED2String ,['['..']']);is the same as doing:
Code: [Select]
RCD.ICAO := ExtractWord(3,ED2String ,['[', '\', ']']);which may or may not matter to you...

Anyway, getting an empty string from ExtractWord means that there is no such word in your string, i.e. the word-index is out of range, so add:
Code: [Select]
ShowMessage('"' + ED2String + '"');just before the previous line to see what, if anything, ED2String contains (the double quotes are there for you to see something even if the string is empty).

Most probably ED2String is wrong, because if it were right ExtractWord(3,ED2String, ['[',']']) would return the word 'Nil', as demonstrated by the attached image.
« Last Edit: April 24, 2019, 01:10:01 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

Thausand

  • Full Member
  • ***
  • Posts: 227
Re: DataProblems Maybe
« Reply #16 on: April 24, 2019, 01:11:00 am »
@JLWest:
you have file with many data for test ? Example my work also when not data good.

i ask: why change delimiter any time ?

josh

  • Hero Member
  • *****
  • Posts: 747
Re: DataProblems Maybe
« Reply #17 on: April 24, 2019, 01:54:07 am »
Attached simple application, that extract the encapsulated strings and puts them into an array.
The array data will start in the array at index 1.
Works fine with your data line.
Development Installation Lazarus 1.3, FPC 2.7.1,Windows 7/8 32/64, OSX, *nix

Test Environment Lazarus & FPC Trunk on Windows and OSX (Cocoa Mainly on OSX). Testing also Crosscompile windows to OSX.. 
Any posts made from 2015 will be based on Lazarus Trunk.

lucamar

  • Hero Member
  • *****
  • Posts: 1942
Re: DataProblems Maybe
« Reply #18 on: April 24, 2019, 03:52:25 am »
Attached simple application, that extract the encapsulated strings and puts them into an array.
The array data will start in the array at index 1.
Works fine with your data line.

Nice example but having ExtractWord() (which, as demonstrated, works and is more capable), why reinvent the wheel? :)

ETA: Forgot what I came here for! Lack of sleep combined with insomnia...

Anyway, since I couldn't sleep I have added a few niceties to my example. Now you can:
  • Anywhere: Whack Ctr+O to load and Ctrl+S to save SrcList from/to a file. No more recompiling to add test strings :)
  • In the SrcList (the one on the left):
    • <Insert> Adds a new line (same reason as Load/Save)
    • <Delete> Deletes the current line
    • <Enter> Does like Double-click
  • Hmmm... nothing else, I think?

Well, have fun!
« Last Edit: April 24, 2019, 04:24:15 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 545
Re: DataProblems Maybe
« Reply #19 on: April 24, 2019, 09:37:31 am »
GDrive Links

https://drive.google.com/open?id=1MVwkFVJUImSIJBgW_Sl2ox2Eb2wwi7H0
https://drive.google.com/open?id=1vr77NTnzTVnbbXRn413I6rmWtk3xaMZd
https://drive.google.com/open?id=177Lgy7GeOzgwxrRckR_Vx0Uvzv3WrMgM

Here is the files and code.

The pathing will have to change. I should have changed it to the install path.

 I cut the data way down.

Just hit the test button.
On the first record I get 8 fields and then it goes to 7 (which is right).

I posted the code here and on the Gdrive.
Thanks:


JLWEST
Lazuras ver 2.0.2 
 FPC 3.0.4, Lazarus IDE v1.8.2 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
3952 GB (1.5 SSD)

Thausand

  • Full Member
  • ***
  • Posts: 227
Re: DataProblems Maybe
« Reply #20 on: April 24, 2019, 10:00:22 am »
Thanksy JLWest.

I make fast test and write:
Code: [Select]
--------------------------------
proc file ByAirport.txt
--------------------------------
Error: not know line 1 and have 8 word
Info: line 2 have bit1 valid float value = 35.35
Info: line 2 have bit2 valid float value = -116.89
Info: line 3 have bit1 valid float value = 33.49
Info: line 3 have bit2 valid float value = -111.64
Info: line 4 have bit1 valid float value = 45.47
Info: line 4 have bit2 valid float value = -105.46
Info: line 5 have bit1 valid float value = 28.85
Info: line 5 have bit2 valid float value = -82.35
Info: line 6 have bit1 valid float value = 27.23
Info: line 6 have bit2 valid float value = -80.97
Info: line 7 have bit1 valid float value = 19.83
Info: line 7 have bit2 valid float value = -155.98
Info: line 8 have bit1 valid float value = 40.29
Info: line 8 have bit2 valid float value = -82.74
Info: line 9 have bit1 valid float value = 41.64
Info: line 9 have bit2 valid float value = -87.12
Info: line 10 have bit1 valid float value = 41.98
Info: line 10 have bit2 valid float value = -89.56
Info: line 11 have bit1 valid float value = 40.03
Info: line 11 have bit2 valid float value = -89.13
Info: line 12 have bit1 valid float value = 38.18
Info: line 12 have bit2 valid float value = -89.81
Info: line 13 have bit1 valid float value = 38.73
Info: line 13 have bit2 valid float value = -94.93
Info: line 14 have bit1 valid float value = 31.95
Info: line 14 have bit2 valid float value = -89.24
Info: line 15 have bit1 valid float value = 43.95
Info: line 15 have bit2 valid float value = -86.42
Info: line 16 have bit1 valid float value = 46.30
Info: line 16 have bit2 valid float value = -95.71
--------------------------------
proc file Composite.txt
--------------------------------
Error: not know line 1 and have 8 word
Info: line 2 have bit1 valid float value = 45.47
Info: line 2 have bit2 valid float value = -105.46
Info: line 3 have bit1 valid float value = 28.85
Info: line 3 have bit2 valid float value = -82.35
Info: line 4 have bit1 valid float value = 27.23
Info: line 4 have bit2 valid float value = -80.97
Info: line 5 have bit1 valid float value = 41.64
Info: line 5 have bit2 valid float value = -87.12
Info: line 6 have bit1 valid float value = 41.98
Info: line 6 have bit2 valid float value = -89.56
Info: line 7 have bit1 valid float value = 40.03
Info: line 7 have bit2 valid float value = -89.13

That write have error. I look hexa and file read start:
Code: [Select]
EF BB FF 5B 30 30 .....
That "EF BB FF" is make error and confuse extractword or TStrings ... i not know why there ? is unicode ?
« Last Edit: April 24, 2019, 10:06:33 am by Thausand »

BrunoK

  • Full Member
  • ***
  • Posts: 157
  • Retired programmer
Re: DataProblems Maybe
« Reply #21 on: April 24, 2019, 12:01:38 pm »
"EF BB FF" seems to be UTF-8 byte order mark (BOM) see https://en.wikipedia.org/wiki/Byte_order_mark
Lazarus trunk r. 59978/03.01.2019 (+/- patches regarding enabled, TScrollBar, TCursorImage). FPC 3.0.4 32 bits. (+heaptrc with leaked ClassName+Revisited TList) , Windows 10 Pro x64 (v. 1803)

Thausand

  • Full Member
  • ***
  • Posts: 227
Re: DataProblems Maybe
« Reply #22 on: April 24, 2019, 12:49:07 pm »
Thanksy BrunoK.

That good read. I not know and think FFFE and FEFF is bom  (iws many more) :-[

I sorry and not know good how solve program for user JLWest...  :'(

lucamar

  • Hero Member
  • *****
  • Posts: 1942
Re: DataProblems Maybe
« Reply #23 on: April 24, 2019, 03:21:46 pm »
On the first record I get 8 fields and then it goes to 7 (which is right).

As I surmised: not a problem of code but of data.

You're reading an UTF8 BOM along with the first record which gets taken as the first field (after all, it ends in a '['), so all the rest are off-by one and when you were trying to read the 6th word, it was, from its point of view correctly, returning "K2".

Don't save files with UTF8 BOM, it's an absurd convention invented by Microsoft to avoid having to check if a file really contains UTF-8 data.

If you can't avoid having the files with the UTF-8 BOM, you can load them first in a TMemo and assign the Memo.Lines tio the listbox items.

One other, unimportant, thing: why are you loading the files by hand (with assing, readln, etc.) instead of using ListBox.Items.LoadFromFile()?

ETA By the way, the UTF-8 BOM is not:
Code: [Select]
#$EF + #$BB + #$FFbut
Code: [Select]
#$EF + #$BB + #$BF
Let's be precise with these kind of things :)
« Last Edit: April 24, 2019, 04:34:30 pm by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 545
Re: DataProblems Maybe
« Reply #24 on: April 24, 2019, 04:36:54 pm »
@JLWest:
you have file with many data for test ? Example my work also when not data good.

i ask: why change delimiter any time ?

Well I don't need to change the delimiter.

I did change from:

|00CA||7826300||Barstow||United States||K2||35.349333||-116.893333| 

to : [00CA][7826300][Barstow][United States][K2][35.349333][-116.893333]

Which was about 3 hours work.

But I don't need to change anymore or during the running of the program.



 
JLWEST
Lazuras ver 2.0.2 
 FPC 3.0.4, Lazarus IDE v1.8.2 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
3952 GB (1.5 SSD)

JLWest

  • Hero Member
  • *****
  • Posts: 545
Re: DataProblems Maybe
« Reply #25 on: April 24, 2019, 04:42:40 pm »
Attached simple application, that extract the encapsulated strings and puts them into an array.
The array data will start in the array at index 1.
Works fine with your data line.

I'll look at the code. I need the data converted  from the following line to a record:

 ICAO    Haash      City         Country         Code     Lat           Lon
[00CA][7826300][Barstow][United States][K2][35.349333][-116.893333]

To: a record:

 TData = record
   ICAO          : String[8];
   Region        : String[3];
   Hash           : Double;
   HashStr       : String[12];
   Lat              : Double;
   LatStr          : String[12];
   Lon              : Double;
   LonStr          : String[12];
   RCDLine       : String[95];
   Distance       : Double;
   DistanceStr   : String[18];

JLWEST
Lazuras ver 2.0.2 
 FPC 3.0.4, Lazarus IDE v1.8.2 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
3952 GB (1.5 SSD)

lucamar

  • Hero Member
  • *****
  • Posts: 1942
Re: DataProblems Maybe
« Reply #26 on: April 24, 2019, 04:53:20 pm »
Don't know why but your posts keep sticking in the front of my head ... so here is a five-minutes, no-frills, bug-attracting function to load your UTF-8 files into a listbox:
Code: Pascal  [Select]
  1. procedure LoadListFromFile(AListBox: TListBox; const AFileName: String);
  2. {NOTE: should be a Boolean function and check whether the file exists,
  3.        the ListBox exists, the files is of the correct type, etc.}
  4. const
  5.   U8BOM: String[3] = #$EF#$BB#$BF;
  6. var
  7.   AFileStream: TFileStream;
  8.   BOMTest: String[3];
  9. begin
  10.   AFileStream := TFileStream.Create(AFilename, fmOpenRead);
  11.   try
  12.     BOMTest[0] := #3;
  13.     AFileStream.Read(BOMTest[1], 3);
  14.     if BOMTest <> U8BOM then
  15.       {Rewind if no BOM}
  16.       AFileStream.Seek(0, soFromBeginning);
  17.     AListBox.Items.LoadFromStream(AFileStream);
  18.   finally
  19.     FileStream.Free;
  20.   end;
  21. end;

I need the data converted  from the following line to a record:
Code: [Select]
ICAO    Haash      City         Country         Code     Lat           Lon
[00CA][7826300][Barstow][United States][K2][35.349333][-116.893333]

To: a record:
Code: [Select]
TData = record
   ICAO          : String[8];
   Region        : String[3];
   Hash           : Double;
   HashStr       : String[12];
   Lat              : Double;
   LatStr          : String[12];
   Lon              : Double;
   LonStr          : String[12];
   RCDLine       : String[95];
   Distance       : Double;
   DistanceStr   : String[18];
 end;

Once the problems with loading the files are solved, that should be easy; just a matter of
Code: [Select]
  Data.WhateverField := ExtractWord(X, TheLine)and then generating the other (calculated?) fields.
« Last Edit: April 24, 2019, 06:16:46 pm by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

Thausand

  • Full Member
  • ***
  • Posts: 227
Re: DataProblems Maybe
« Reply #27 on: April 24, 2019, 04:56:21 pm »
ETA By the way, the UTF-8 BOM is not:
Code: [Select]
#$EF + #$BB + #$FFbut
Code: [Select]
#$EF + #$BB + #$BF
Let's be precise with these kind of things :)
I sorry lucamar. I make copy-paste error  :-[ (is better write read-write error because hexy-edit no have copy-paste).

Thausand

  • Full Member
  • ***
  • Posts: 227
Re: DataProblems Maybe
« Reply #28 on: April 24, 2019, 04:59:42 pm »
Once the problems with loading the files are solved, that should be easy; just a matter of
Code: [Select]
  Data.WhateverField := ExtractWord(X, TheLine)and then generating the other (calculated?) fields.
I have question. If data utf-8 then record string short and extractword not work. so make ansi. Then away  utf-8 codec and not can write fancy letter greek, hyroglyph etc ?

add:

Oh, you have clever bom skip  :)

i write more wrong all ways skip  :D
Code: Pascal  [Select]
  1.   ...
  2.     FileStream:= TFileStream.Create(Filename, fmOpenRead);
  3.     FileStream.Position:= 3;
  4.     Lines.Clear;
  5.     Lines.LoadFromStream(FileStream);
  6.     FileStream.Free;
  7.   ...
« Last Edit: April 24, 2019, 05:25:20 pm by Thausand »

JLWest

  • Hero Member
  • *****
  • Posts: 545
Re: DataProblems Maybe
« Reply #29 on: April 24, 2019, 05:07:32 pm »
On the first record I get 8 fields and then it goes to 7 (which is right).

As I surmised: not a problem of code but of data.

You're reading an UTF8 BOM along with the first record which gets taken as the first field (after all, it ends in a '['), so all the rest are off-by one and when you were trying to read the 6th word, it was, from its point of view correctly, returning "K2".

Don't save files with UTF8 BOM, it's an absurd convention invented by Microsoft to avoid having to check if a file really contains UTF-8 data.

If you can't avoid having the files with the UTF-8 BOM, you can load them first in a TMemo and assign the Memo.Lines tio the listbox items.

One other, unimportant, thing: why are you loading the files by hand (with assing, readln, etc.) instead of using ListBox.Items.LoadFromFile()?

ETA By the way, the UTF-8 BOM is not:
Code: [Select]
#$EF + #$BB + #$FFbut
Code: [Select]
#$EF + #$BB + #$BF
Let's be precise with these kind of things :)

UTF8 BOM   <--- No Idea what that is.

The data is extracted from a file of 7.9 million records. And I guess the 7.9 million records are UTF8 BOM.

"One other, unimportant, thing: why are you loading the files by hand (with assing, readln, etc.) instead of using ListBox.Items.LoadFromFile()?"

Well basically I'm reading a text file into a listbox.

1. I know how to do it this.
2. Habit. Maybe Bad Habit.


"If you can't avoid having the files with the UTF-8 BOM, you can load them first in a TMemo and assign the Memo.Lines tio the listbox items."

Don't understand the  "and assign the Memo.Lines to the listbox items.

Listbox1.Items.Add(Line) :=   Memo.Lines ???

There are two data files for the program. One is 38,000 records and the other is about 16,000. I guess both are  UTF-8 BOM.

If I load the 38,000 records into a Memo1 and then load them into a listbox and the save them to a text file.

Will that get rid of the UTF-8 BOM in the text file?

Well I can't avoid it as the data is extracted from a 7.9 million data set.
 
Not against pre-processing the file into ASCII if there is a way.

 
JLWEST
Lazuras ver 2.0.2 
 FPC 3.0.4, Lazarus IDE v1.8.2 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
3952 GB (1.5 SSD)