Recent

Author Topic: DataProblems Maybe  (Read 6279 times)

john horst

  • Jr. Member
  • **
  • Posts: 53
    • JHorst
Re: DataProblems Maybe
« Reply #45 on: April 26, 2019, 12:31:40 am »
@JLWest If you just want to use the file and have access to sed (diy) or dos2unix (fix it for you) will do what you want. In parallel on the cli you can achieve this in seconds. Just throwing that out there.

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #46 on: April 26, 2019, 12:52:33 am »
@ John Horst

I didn't understand what you said.

Maybe something about online?
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

john horst

  • Jr. Member
  • **
  • Posts: 53
    • JHorst
Re: DataProblems Maybe
« Reply #47 on: April 26, 2019, 01:08:22 am »
 :) Basically, dos2unix is a *nix tool to remove the junk some MS tools like to add. https://www.liquidweb.com/kb/dos2unix-removing-hidden-windows-characters-from-files/

Apparently there is a version for Windows that can be run in your powershell or whatever it is called on windows. https://sourceforge.net/projects/dos2unix/

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #48 on: April 26, 2019, 01:16:23 am »
This is pretty funny. not funny, funny. I have a procedure that @lucamar wrote as an example.

It basically loads a file into a TFileStream and dumps it into a listbox. It strips out the UT-8 BOM problem on the load.

So I thought I would see if I could run the 7.9 million records thru the procedure and then write them out to a clean text file.

Well I got it running and and it loaded about 45% of the 7.9 mil records and then quit.

The records are in the list box and the program is still running.

But this will work.

All I need to do is load  2.4 million records, write them out repeat until done. Maybe have 3 or 4 files. Thenp ut the three or 4 files into one file.

I could even rebuild the 7.9 million.

What I extract from the 7.9 million is only 38,000 records. 

 
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #49 on: April 26, 2019, 01:19:10 am »
:) Basically, dos2unix is a *nix tool to remove the junk some MS tools like to add. https://www.liquidweb.com/kb/dos2unix-removing-hidden-windows-characters-from-files/

Apparently there is a version for Windows that can be run in your powershell or whatever it is called on windows. https://sourceforge.net/projects/dos2unix/

I check it out, thanks.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

  • Hero Member
  • *****
  • Posts: 2081
Re: DataProblems Maybe
« Reply #50 on: April 26, 2019, 01:39:30 am »
Can I write a program using your
procedure CorrectFile( const ASrcName: String; const ADestName: String); to convert the 7.9 Million records in the Apt.Dat file to ASCII.

Yes, of course. In fact I made one to test the function before posting, if you want it.

But let me polish it a little before uploading :)

Well I got it running and and it loaded about 45% of the 7.9 mil records and then quit.

The records are in the list box and the program is still running.

I've run sometimes into similar issues: seems to be a gotcha of TStrings when the strings (or the count) grow too much--as you know already from your other posts about that giant file :)
« Last Edit: April 26, 2019, 01:46:28 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #51 on: April 26, 2019, 01:45:34 am »
@lucumar

If your code gets any smarter it will be able to write by it;s self without our.

I'll wait.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

  • Hero Member
  • *****
  • Posts: 2081
Re: DataProblems Maybe
« Reply #52 on: April 26, 2019, 04:52:27 am »
If your code gets any smarter it will be able to write by it;s self without our.

Nah! The thing is that it was just a quick and dirty test of the BOM clean up function. I had to pretty it up a little to be almost production-ready.

Note that "almost": it lacks a couple or three hours more to get it really ready for public consumption. As it is now it just works, without frills. :)

Find it attached.

Oh, and sorry for the delay; had a couple personal matters to attend to.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #53 on: April 28, 2019, 10:31:07 am »


Code: Pascal  [Select]
  1. function TForm1.Decompose(ARCD : TData ) : TData;
  2.    var RCD : TData;
  3.     Count  : Integer;
  4.     AWord  : String;
  5.     AFloat : Extended;
  6.     Delims : TSysCharSet;
  7.     RCDString : String[95];
  8.     LatLon  : Extended;
  9.    begin
  10.     RCD := aRCD;
  11.     Delims := ['[',']'];
  12.     {Extract all words}
  13.     RCDString := RCD.RCDLine;
  14.     Count := 0;
  15.     repeat
  16.       Inc(Count);
  17.      RCDString := '[00VA][5890633][Alton][United States][K6][36.576][-78.999166667]';
  18.       AWord := ExtractWord(Count, RCDString, Delims);
  19.       if TryStrToFloat(AWord, AFloat) then  begin LatLon := AFloat; end;
  20.        Case Count of
  21.         0  : ShowMessage('Bad Data  in RCD in Decompose Function');
  22.         1  : RCD.ICAO := AWord;
  23.         2  : begin
  24.              RCD.Hash := LatLon;
  25.              RCD.HashStr := FloatToStr(LatLon);
  26.              end;
  27.         3  :  RCD.City :=  AWord;
  28.         4  :  RCD.Country := AWord;
  29.         5  :  RCD.Region := AWord;
  30.         6  :  Begin
  31.                RCD.Lat := LatLon;
  32.                RCD.LatStr := FloatToStr(LatLon);
  33.               end;
  34.         7  :  Begin
  35.                RCD.Lon := LatLon;
  36.                RCD.LonStr := FloatToStr(LatLon);
  37.               end;
  38.        end;
  39.     until AWord.IsEmpty;
  40.     Result := RCD;
  41.   end;  

I still have data problems. I have redone the data 3 times today and can't get past this problem.

The issue
Code: Pascal  [Select]
  1. unit1.pas(724,6) Fatal: illegal character "'ï'" ($EF)
on line 17.

So under the debugger I copied the data raw and pasted it  into RCDString just before I try to parse it. Still get the issue. I think the $EF is the UTF8 BOM.

I'm more than wiling to post this on my GDrive if someone is willing to look at it.

It's easy to duplicate. Start the program and click once on the first record in a listbox2.

Dosn't happen on any other listbox.

I have five listboxes on the screen and it only  happens on the one.

I change the first record and second record gets the issue.

But If I click one the second record first it's fine. Go back to the first record and I get the issue.

Is there a way aroud this?



 

FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

  • Hero Member
  • *****
  • Posts: 2081
Re: DataProblems Maybe
« Reply #54 on: April 28, 2019, 11:36:25 am »
I'm more than wiling to post this on my GDrive if someone is willing to look at it.

Yeah, let me give it a look. But you have to be quick on the upload. The planning says this coming week I'm not going to sleep  :P

Remember to upload everything: program, data files, ... everything.

And be patient ... although I know you're. ;)
« Last Edit: April 28, 2019, 11:38:31 am by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #55 on: April 28, 2019, 08:33:56 pm »
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #56 on: April 28, 2019, 08:40:08 pm »
One of the data files and the program have the same name. MSTRegions.

So there should be 1 program fie posted here and on the Gdrive  and three data files.


https://drive.google.com/open?id=1Ui4xrVTBAxtQTuLlnUiHvlrJxpaGTQIV
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #57 on: April 28, 2019, 08:53:04 pm »
To Recreate the issue step by step:

Shift-F9
Start it under the debugger.

When the program comes up there are 5 list boxes.

It's a very busy screen.

The far left listbox is not loaded by a serrate file. It's extracted while loading by airports.

When it loads double click on the lower middle listbox top record.

should be:
[00VA][5890633][Alton][United States][K6][36.576][-78.999166667]

The issue will come up at line 628.



« Last Edit: April 28, 2019, 09:09:37 pm by JLWest »
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

lucamar

  • Hero Member
  • *****
  • Posts: 2081
Re: DataProblems Maybe
« Reply #58 on: April 28, 2019, 09:21:27 pm »
OK, downloaded all five files.

I'll give it a look after supper or tomorrow--you have left it for a little late in the day. ;) I'll tell you something afterwards, at least after analyzing the data files (should only take some minutes).
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus 2.0.2/2.0.4  - FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

JLWest

  • Hero Member
  • *****
  • Posts: 595
Re: DataProblems Maybe
« Reply #59 on: April 28, 2019, 09:40:49 pm »
Thank you.

I made the post last night at 2:00 in the morning. Was going to load the data up but wanted to include some documentation on your copyright and a readme but it was too late form me.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB