DataProblems Maybe

lucamar

Hero Member
Posts: 4219

Re: DataProblems Maybe

« Reply #15 on: April 24, 2019, 01:05:31 am »

Doing:

Code: [Select]

RCD.ICAO := ExtractWord(3,ED2String ,['['..']']);is the same as doing:

Code: [Select]

RCD.ICAO := ExtractWord(3,ED2String ,['[', '\', ']']);which may or may not matter to you...

Anyway, getting an empty string from ExtractWord means that there is no such word in your string, i.e. the word-index is out of range, so add:

Code: [Select]

ShowMessage('"' + ED2String + '"');just before the previous line to see what, if anything, ED2String contains (the double quotes are there for you to see something even if the string is empty).

Most probably ED2String is wrong, because if it were right ExtractWord(3,ED2String, ['[',']']) would return the word 'Nil', as demonstrated by the attached image.

ExtractWord Test_002.png (31.01 kB, 552x349 - viewed 144 times.)

« Last Edit: April 24, 2019, 01:10:01 am by lucamar »

Logged

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)

Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

Thausand

Sr. Member
Posts: 292

Re: DataProblems Maybe

« Reply #16 on: April 24, 2019, 01:11:00 am »

@JLWest:
you have file with many data for test ? Example my work also when not data good.

i ask: why change delimiter any time ?

Logged

Josh

Hero Member
Posts: 1274

Re: DataProblems Maybe

« Reply #17 on: April 24, 2019, 01:54:07 am »

Attached simple application, that extract the encapsulated strings and puts them into an array.
The array data will start in the array at index 1.
Works fine with your data line.

ExtraxtDataToArray.zip (65.6 kB - downloaded 61 times.)

Logged

The best way to get accurate information on the forum is to post something wrong and wait for corrections.

lucamar

Hero Member
Posts: 4219

Re: DataProblems Maybe

« Reply #18 on: April 24, 2019, 03:52:25 am »

Quote from: josh on April 24, 2019, 01:54:07 am

Attached simple application, that extract the encapsulated strings and puts them into an array.
The array data will start in the array at index 1.
Works fine with your data line.

Nice example but having ExtractWord() (which, as demonstrated, works and is more capable), why reinvent the wheel?

ETA: Forgot what I came here for! Lack of sleep combined with insomnia...

Anyway, since I couldn't sleep I have added a few niceties to my example. Now you can:

Anywhere: Whack Ctr+O to load and Ctrl+S to save SrcList from/to a file. No more recompiling to add test strings
In the SrcList (the one on the left):

<Insert> Adds a new line (same reason as Load/Save)
<Delete> Deletes the current line
<Enter> Does like Double-click

Hmmm... nothing else, I think?

Well, have fun!

ExtWordTest.zip (5.4 kB - downloaded 64 times.)

« Last Edit: April 24, 2019, 04:24:15 am by lucamar »

Logged

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)

Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

JLWest

Hero Member
Posts: 1293

Re: DataProblems Maybe

« Reply #19 on: April 24, 2019, 09:37:31 am »

GDrive Links

https://drive.google.com/open?id=1MVwkFVJUImSIJBgW_Sl2ox2Eb2wwi7H0
https://drive.google.com/open?id=1vr77NTnzTVnbbXRn413I6rmWtk3xaMZd
https://drive.google.com/open?id=177Lgy7GeOzgwxrRckR_Vx0Uvzv3WrMgM

Here is the files and code.

The pathing will have to change. I should have changed it to the install path.

I cut the data way down.

Just hit the test button.
On the first record I get 8 fields and then it goes to 7 (which is right).

I posted the code here and on the Gdrive.
Thanks:

Temp.zip (7.66 kB - downloaded 59 times.)

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Thausand

Sr. Member
Posts: 292

Re: DataProblems Maybe

« Reply #20 on: April 24, 2019, 10:00:22 am »

Thanksy JLWest.

I make fast test and write:

Code: [Select]

--------------------------------
proc file ByAirport.txt
--------------------------------
Error: not know line 1 and have 8 word
Info: line 2 have bit1 valid float value = 35.35
Info: line 2 have bit2 valid float value = -116.89
Info: line 3 have bit1 valid float value = 33.49
Info: line 3 have bit2 valid float value = -111.64
Info: line 4 have bit1 valid float value = 45.47
Info: line 4 have bit2 valid float value = -105.46
Info: line 5 have bit1 valid float value = 28.85
Info: line 5 have bit2 valid float value = -82.35
Info: line 6 have bit1 valid float value = 27.23
Info: line 6 have bit2 valid float value = -80.97
Info: line 7 have bit1 valid float value = 19.83
Info: line 7 have bit2 valid float value = -155.98
Info: line 8 have bit1 valid float value = 40.29
Info: line 8 have bit2 valid float value = -82.74
Info: line 9 have bit1 valid float value = 41.64
Info: line 9 have bit2 valid float value = -87.12
Info: line 10 have bit1 valid float value = 41.98
Info: line 10 have bit2 valid float value = -89.56
Info: line 11 have bit1 valid float value = 40.03
Info: line 11 have bit2 valid float value = -89.13
Info: line 12 have bit1 valid float value = 38.18
Info: line 12 have bit2 valid float value = -89.81
Info: line 13 have bit1 valid float value = 38.73
Info: line 13 have bit2 valid float value = -94.93
Info: line 14 have bit1 valid float value = 31.95
Info: line 14 have bit2 valid float value = -89.24
Info: line 15 have bit1 valid float value = 43.95
Info: line 15 have bit2 valid float value = -86.42
Info: line 16 have bit1 valid float value = 46.30
Info: line 16 have bit2 valid float value = -95.71
--------------------------------
proc file Composite.txt
--------------------------------
Error: not know line 1 and have 8 word
Info: line 2 have bit1 valid float value = 45.47
Info: line 2 have bit2 valid float value = -105.46
Info: line 3 have bit1 valid float value = 28.85
Info: line 3 have bit2 valid float value = -82.35
Info: line 4 have bit1 valid float value = 27.23
Info: line 4 have bit2 valid float value = -80.97
Info: line 5 have bit1 valid float value = 41.64
Info: line 5 have bit2 valid float value = -87.12
Info: line 6 have bit1 valid float value = 41.98
Info: line 6 have bit2 valid float value = -89.56
Info: line 7 have bit1 valid float value = 40.03
Info: line 7 have bit2 valid float value = -89.13

That write have error. I look hexa and file read start:

Code: [Select]

EF BB FF 5B 30 30 .....

That "EF BB FF" is make error and confuse extractword or TStrings ... i not know why there ? is unicode ?

« Last Edit: April 24, 2019, 10:06:33 am by Thausand »

Logged

BrunoK

Sr. Member
Posts: 452
Retired programmer

Re: DataProblems Maybe

« Reply #21 on: April 24, 2019, 12:01:38 pm »

"EF BB FF" seems to be UTF-8 byte order mark (BOM) see https://en.wikipedia.org/wiki/Byte_order_mark

Logged

Thausand

Sr. Member
Posts: 292

Re: DataProblems Maybe

« Reply #22 on: April 24, 2019, 12:49:07 pm »

Thanksy BrunoK.

That good read. I not know and think FFFE and FEFF is bom (iws many more)

I sorry and not know good how solve program for user JLWest...

Logged

lucamar

Hero Member
Posts: 4219

Re: DataProblems Maybe

« Reply #23 on: April 24, 2019, 03:21:46 pm »

Quote from: JLWest on April 24, 2019, 09:37:31 am

On the first record I get 8 fields and then it goes to 7 (which is right).

As I surmised: not a problem of code but of data.

You're reading an UTF8 BOM along with the first record which gets taken as the first field (after all, it ends in a '['), so all the rest are off-by one and when you were trying to read the 6th word, it was, from its point of view correctly, returning "K2".

Don't save files with UTF8 BOM, it's an absurd convention invented by Microsoft to avoid having to check if a file really contains UTF-8 data.

If you can't avoid having the files with the UTF-8 BOM, you can load them first in a TMemo and assign the Memo.Lines tio the listbox items.

One other, unimportant, thing: why are you loading the files by hand (with assing, readln, etc.) instead of using ListBox.Items.LoadFromFile()?

ETA By the way, the UTF-8 BOM is not:

Code: [Select]

#$EF + #$BB + #$FFbut

Code: [Select]

#$EF + #$BB + #$BF
Let's be precise with these kind of things

« Last Edit: April 24, 2019, 04:34:30 pm by lucamar »

Logged

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)

Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

JLWest

Hero Member
Posts: 1293

Re: DataProblems Maybe

« Reply #24 on: April 24, 2019, 04:36:54 pm »

Quote from: Thausand on April 24, 2019, 01:11:00 am

@JLWest:
you have file with many data for test ? Example my work also when not data good.

i ask: why change delimiter any time ?

Well I don't need to change the delimiter.

I did change from:

|00CA||7826300||Barstow||United States||K2||35.349333||-116.893333|

to : [00CA][7826300][Barstow][United States][K2][35.349333][-116.893333]

Which was about 3 hours work.

But I don't need to change anymore or during the running of the program.

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

Hero Member
Posts: 1293

Re: DataProblems Maybe

« Reply #25 on: April 24, 2019, 04:42:40 pm »

Quote from: josh on April 24, 2019, 01:54:07 am

Attached simple application, that extract the encapsulated strings and puts them into an array.
The array data will start in the array at index 1.
Works fine with your data line.

I'll look at the code. I need the data converted from the following line to a record:

ICAO Haash City Country Code Lat Lon
[00CA][7826300][Barstow][United States][K2][35.349333][-116.893333]

To: a record:

TData = record
ICAO : String[8];
Region : String[3];
Hash : Double;
HashStr : String[12];
Lat : Double;
LatStr : String[12];
Lon : Double;
LonStr : String[12];
RCDLine : String[95];
Distance : Double;
DistanceStr : String[18];

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

Hero Member
Posts: 4219

Re: DataProblems Maybe

« Reply #26 on: April 24, 2019, 04:53:20 pm »

Don't know why but your posts keep sticking in the front of my head ... so here is a five-minutes, no-frills, bug-attracting function to load your UTF-8 files into a listbox:

Code: Pascal [Select][+]

procedure LoadListFromFile(AListBox: TListBox; const AFileName: String);
{NOTE: should be a Boolean function and check whether the file exists,
       the ListBox exists, the files is of the correct type, etc.}
const
  U8BOM: String[3] = #$EF#$BB#$BF;
var
  AFileStream: TFileStream;
  BOMTest: String[3];
begin
  AFileStream := TFileStream.Create(AFilename, fmOpenRead);
  try
    BOMTest[0] := #3;
    AFileStream.Read(BOMTest[1], 3);
    if BOMTest <> U8BOM then
      {Rewind if no BOM}
      AFileStream.Seek(0, soFromBeginning);
    AListBox.Items.LoadFromStream(AFileStream);
  finally
    FileStream.Free;
  end;
end;

Quote from: JLWest on April 24, 2019, 04:42:40 pm

I need the data converted from the following line to a record:
Code: [Select]
ICAO Haash City Country Code Lat Lon [00CA][7826300][Barstow][United States][K2][35.349333][-116.893333]
To: a record:
Code: [Select]
TData = record ICAO : String[8]; Region : String[3]; Hash : Double; HashStr : String[12]; Lat : Double; LatStr : String[12]; Lon : Double; LonStr : String[12]; RCDLine : String[95]; Distance : Double; DistanceStr : String[18]; end;

Once the problems with loading the files are solved, that should be easy; just a matter of

Code: [Select]

Data.WhateverField := ExtractWord(X, TheLine)and then generating the other (calculated?) fields.

« Last Edit: April 24, 2019, 06:16:46 pm by lucamar »

Logged

Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)

Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

Thausand

Sr. Member
Posts: 292

Re: DataProblems Maybe

« Reply #27 on: April 24, 2019, 04:56:21 pm »

Quote from: lucamar on April 24, 2019, 03:21:46 pm

ETA By the way, the UTF-8 BOM is not:
Code: [Select]
#$EF + #$BB + #$FFbut
Code: [Select]
#$EF + #$BB + #$BF
Let's be precise with these kind of things

I sorry lucamar. I make copy-paste error

(is better write read-write error because hexy-edit no have copy-paste).

Logged

Thausand

Sr. Member
Posts: 292

Re: DataProblems Maybe

« Reply #28 on: April 24, 2019, 04:59:42 pm »

Quote from: lucamar on April 24, 2019, 04:53:20 pm

Once the problems with loading the files are solved, that should be easy; just a matter of
Code: [Select]
Data.WhateverField := ExtractWord(X, TheLine)and then generating the other (calculated?) fields.

I have question. If data utf-8 then record string short and extractword not work. so make ansi. Then away utf-8 codec and not can write fancy letter greek, hyroglyph etc ?

add:

Oh, you have clever bom skip

i write more wrong all ways skip

Code: Pascal [Select][+]

  ...
    FileStream:= TFileStream.Create(Filename, fmOpenRead);
    FileStream.Position:= 3;
    Lines.Clear;
    Lines.LoadFromStream(FileStream);
    FileStream.Free;
  ...

« Last Edit: April 24, 2019, 05:25:20 pm by Thausand »

Logged

JLWest

Hero Member
Posts: 1293

Re: DataProblems Maybe

« Reply #29 on: April 24, 2019, 05:07:32 pm »

Quote from: lucamar on April 24, 2019, 03:21:46 pm

Quote from: JLWest on April 24, 2019, 09:37:31 am
On the first record I get 8 fields and then it goes to 7 (which is right).

As I surmised: not a problem of code but of data.

You're reading an UTF8 BOM along with the first record which gets taken as the first field (after all, it ends in a '['), so all the rest are off-by one and when you were trying to read the 6th word, it was, from its point of view correctly, returning "K2".

Don't save files with UTF8 BOM, it's an absurd convention invented by Microsoft to avoid having to check if a file really contains UTF-8 data.

If you can't avoid having the files with the UTF-8 BOM, you can load them first in a TMemo and assign the Memo.Lines tio the listbox items.

One other, unimportant, thing: why are you loading the files by hand (with assing, readln, etc.) instead of using ListBox.Items.LoadFromFile()?

ETA By the way, the UTF-8 BOM is not:
Code: [Select]
#$EF + #$BB + #$FFbut
Code: [Select]
#$EF + #$BB + #$BF
Let's be precise with these kind of things

UTF8 BOM <--- No Idea what that is.

The data is extracted from a file of 7.9 million records. And I guess the 7.9 million records are UTF8 BOM.

"One other, unimportant, thing: why are you loading the files by hand (with assing, readln, etc.) instead of using ListBox.Items.LoadFromFile()?"

Well basically I'm reading a text file into a listbox.

1. I know how to do it this.
2. Habit. Maybe Bad Habit.

"If you can't avoid having the files with the UTF-8 BOM, you can load them first in a TMemo and assign the Memo.Lines tio the listbox items."

Don't understand the "and assign the Memo.Lines to the listbox items.

Listbox1.Items.Add(Line) := Memo.Lines ???

There are two data files for the program. One is 38,000 records and the other is about 16,000. I guess both are UTF-8 BOM.

If I load the 38,000 records into a Memo1 and then load them into a listbox and the save them to a text file.

Will that get rid of the UTF-8 BOM in the text file?

Well I can't avoid it as the data is extracted from a 7.9 million data set.

Not against pre-processing the file into ASCII if there is a way.

Logged

FPC 3.2.0, Lazarus IDE v2.0.4
Windows 10 Pro 32-GB
Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Lazarus

Bookstore

Search

Recent

Author Topic: DataProblems Maybe (Read 29778 times)

lucamar

Re: DataProblems Maybe

Thausand

Re: DataProblems Maybe

Josh

Re: DataProblems Maybe

lucamar

Re: DataProblems Maybe

JLWest

Re: DataProblems Maybe

Thausand

Re: DataProblems Maybe

BrunoK

Re: DataProblems Maybe

Thausand

Re: DataProblems Maybe

lucamar

Re: DataProblems Maybe

JLWest

Re: DataProblems Maybe

JLWest

Re: DataProblems Maybe

lucamar

Re: DataProblems Maybe

Thausand

Re: DataProblems Maybe

Thausand

Re: DataProblems Maybe

JLWest

Re: DataProblems Maybe

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook