Recent

Author Topic: Large text files, Really large.  (Read 1422 times)

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Large text files, Really large.
« on: October 04, 2020, 09:29:31 am »
I have written a program to work with a really large text files. It actually was started over a year ago. Some may remember a 9+ million line text file called Apt.Dat.

I only want to edit specific types of records (1302, 1300 and 1301 types);  There were questions at the time about making a meaningful user display something that would make sense to the user. I think I have solved that problem.

The program is about 80% finished. I think it needs some graphics which I'm about to start working on.

The working of the program works with  listboxes and dynamic arrays. However, I'm wondering if TstringList would work better than dynamic arrays in all of the instances or a few or none.

I have attached a screenshot, but I realize not much can be determined from the screenshot. So I have loaded  the program code  on my GDrive. Along with a small text data file which should allow you to run some of the features.

It does a recursive search for apt.dat files on a system which in a live environment there could be hundreds.


https://drive.google.com/file/d/1CAuXnzCGmEsIz5UhdiokCFVAw43LzxjU/view?usp=sharing

Will accept any and all help/criticism.

Thanks





 

FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Ñuño_Martínez

  • Hero Member
  • *****
  • Posts: 1186
    • Burdjia
Re: Large text files, Really large.
« Reply #1 on: October 04, 2020, 11:27:14 am »
IIRC TStringList uses TList to store the strings and the associated objects (if any).

Maybe, since your application has so extensive use of strings, it is a good idea it to extend the TStrings class to use a more efficient way (i.e. load the lines when they're used instead of load all them to memory).
Are you interested in game programming? Join the Pascal Game Development community!
Also visit the Game Development Portal

egsuh

  • Hero Member
  • *****
  • Posts: 1297
Re: Large text files, Really large.
« Reply #2 on: October 04, 2020, 11:55:16 am »
AFAIK, TStringList is more efficient than dynamic arrays. If you add one more element, dynamic arrays must copy all the existing data to new memory space first. I use dynamic arrays heavily but my applications do not have so many elements. So for me the efficiency does not matter, but if you are dealing with large number of elements, then it will matter.

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Large text files, Really large.
« Reply #3 on: October 04, 2020, 09:02:12 pm »
@Nuno_Martnize

I more or less do your suggestion. The big 9 million records I load once in a dynamic array and then make an index or hash table array of the file to speed searching. I was thinking maybe the  hash table could be a TStringlist. It's format is: I wouldn't be destroyed until the program is closed.

LELL 3
TN05 168
SNFL 394

Then there is the custom airports apt.dat files. One for each custom airport you have installed. A few of thes may have several airports in one apt.dat but most are 1 airport. When there are multi-airports in a apt.dat I basically build a smaller version of the Master Apt.Dat and hash file. However, it is created and then destroyed on demand. I'm thinking the custom Airport apt.dat and hash file if built could be tstringlist.

@egsuh

In all the processing I don't add any records to the arrays, only modify the data in an array element.
« Last Edit: October 04, 2020, 09:04:39 pm by JLWest »
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

Handoko

  • Hero Member
  • *****
  • Posts: 5158
  • My goal: build my own game engine using Lazarus
Re: Large text files, Really large.
« Reply #4 on: October 04, 2020, 09:13:12 pm »
Maybe you need to use a database.

Long time ago, I wrote some codes for benchmarking the performance of TDBF and SQLite. The result was 11:1 (smaller means faster).

TDBF is 11 times slower than SQLite.

A good database can fully use the power your machine can offer. It is written by professionals for reliability and performance. So I believe if you use a good database, the performance will be better.

jamie

  • Hero Member
  • *****
  • Posts: 6131
Re: Large text files, Really large.
« Reply #5 on: October 05, 2020, 12:12:45 am »
@Nuno_Martnize

I more or less do your suggestion. The big 9 million records I load once in a dynamic array and then make an index or hash table array of the file to speed searching. I was thinking maybe the  hash table could be a TStringlist. It's format is: I wouldn't be destroyed until the program is closed.

LELL 3
TN05 168
SNFL 394

Then there is the custom airports apt.dat files. One for each custom airport you have installed. A few of thes may have several airports in one apt.dat but most are 1 airport. When there are multi-airports in a apt.dat I basically build a smaller version of the Master Apt.Dat and hash file. However, it is created and then destroyed on demand. I'm thinking the custom Airport apt.dat and hash file if built could be tstringlist.

@egsuh

In all the processing I don't add any records to the arrays, only modify the data in an array element.

Why don't you just read the file once up startup and build a HASH table that builds file pointers so when you want to access some data you just read it from file directly by setting the file pointer directly..

 This way you don't need to eat up all that memory and you could also keep a check on the size of file date of the file so if it makes a change then  you could update automatically...

EDIT:
 I wanted to add, although I really hate generics because I think they are useless in many ways, I guess a generic for a HAST table where you can define the type of data it points to would make it useful there, at least you could also build a different HAST table that points to different stuff.
« Last Edit: October 05, 2020, 12:15:10 am by jamie »
The only true wisdom is knowing you know nothing

JLWest

  • Hero Member
  • *****
  • Posts: 1293
Re: Large text files, Really large.
« Reply #6 on: October 05, 2020, 03:06:27 am »
@jamie

Maybe I didn't explain it well, however, I only read the master apt.dat (9 million records) one time and I build it's hash table 1 time. Generics? Maybe, Not sure how I would implement it.

It's all the other apt.dat files on the x-plane 11 system that may benefit from a tstringlist approach. Every custom airport you have loaded has it's own apt.dat. Each one is maybe  a few hundred records up 15,000+-;

@Handoko

I tried to import the 9 million records into SQLLite3 and it would stop at 25% and die.
FPC 3.2.0, Lazarus IDE v2.0.4
 Windows 10 Pro 32-GB
 Intel i7 770K CPU 4.2GHz 32702MB Ram
GeForce GTX 1080 Graphics - 8 Gig
4.1 TB

 

TinyPortal © 2005-2018