Recent

Author Topic: Simple file with only one single record, but of HUGE size  (Read 9877 times)

Epand

  • New Member
  • *
  • Posts: 25
Simple file with only one single record, but of HUGE size
« on: December 11, 2018, 11:56:54 am »
Hello all

At the moment I’m not so familiar with FreePascal, but I want to learn as much as possible.
Within the following weeks I want to tackle the realization of the following idea:

One simple file with only one single record, but that “only one single” record shall have huge size.
To be more specific, it shall have the largest size which can be addressed by FreePascal and the system it's running on.
The “only one single” record itself shall contain only text, its characters consisting only of numerals/digits and letters strung together.

The system configuration will be:
CPU: Intel Core I7
OS: Windows 10 Pro 64 bit
LazarusIde with FreePascal

Please can you give me a hint what’s the maximum size – a size as huge as possible – of a simple file containing only one single record?
I want to know the regarding file size limit, but I have not seen it in the docs when looking there, may be I have missed it.

Have a nice day
Michael
« Last Edit: December 11, 2018, 12:01:15 pm by Epand »

Handoko

  • Hero Member
  • *****
  • Posts: 5131
  • My goal: build my own game engine using Lazarus
Re: Simple file with only one single record, but of HUGE size
« Reply #1 on: December 11, 2018, 12:02:51 pm »
For this case, I will use
TStringList.LoadFromFile

Read more about TStringList:
https://www.freepascal.org/docs-html/rtl/classes/tstringlist.html

Or maybe:
https://www.freepascal.org/docs-html/rtl/system/blockread.html

I haven't tested what their limit is. But I believe it is as much as the OS, File System and free memory allowed.

32-bit OSes usually has memory access limit about 3-4 GB. FAT32 has maximum file size of 4 GB. 64-bit OSes and NTFS has larger file size limit.

It is not wise to load all the big file content to the memory. You should load a small chunk, do some processing, free the memory and continue the load the next chunk. So you may consider to use BlockRead.
« Last Edit: December 11, 2018, 12:11:02 pm by Handoko »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Simple file with only one single record, but of HUGE size
« Reply #2 on: December 11, 2018, 12:11:22 pm »
tstringlist is the easiest, as it gives you free lineparsing, but not the best. It uses double the memory the list requires.

Moreover, it has a 32-bit line index which is a limit in 64-bit if you have (very) small lines.

Simply a blockread (or stream.read) into a large array is the best solution. Or better, multiple large arrays (like 1gb each or so)

There are ways to get the filesize of the file (filesize(), or stream.size) before doing the actual read, so that the array can be allocated.

But what to use, depends on what you want to achieve, and how crucial it is to go to the limit. (IOW how much you want to work, learn and experiment)

Note that in Pascal speak, "records" are always of fixed size.

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Simple file with only one single record, but of HUGE size
« Reply #3 on: December 11, 2018, 02:59:41 pm »
Please can you give me a hint what’s the maximum size – a size as huge as possible – of a simple file containing only one single record?
I want to know the regarding file size limit, but I have not seen it in the docs when looking there, may be I have missed it.

Generally speaking, to know the file size limits of any file related process you have to look at the size of the vars used to address it. If it's a 16 bits var it won't be able to address more than 64K; if 32 bits the limit then is 2 or 4 Gigas, and so on. For example, seeing the declaration of Seek:
Code: Pascal  [Select][+][-]
  1. procedure Seek(var f: file; Pos: Int64);
you know it can address a maximum of 8,388,607 Tb

Do note that it also depends on the file-system: some file systems impose a limit due to the size of the vars they use to address the files :)
« Last Edit: December 11, 2018, 03:20:05 pm by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

Epand

  • New Member
  • *
  • Posts: 25
Re: Simple file with only one single record, but of HUGE size
« Reply #4 on: December 11, 2018, 04:11:15 pm »
Thanks to you.



For this case, I will use
TStringList.LoadFromFile

Read more about TStringList:
https://www.freepascal.org/docs-html/rtl/classes/tstringlist.html

Or maybe:
https://www.freepascal.org/docs-html/rtl/system/blockread.html

Filling the huge file, writing into it will be done only character per character, curious, but a must.

A totally other point is, that inspecting the content of the single-record-file done by blockread with certain chunks will probably result in a speedup of the inspection.

Keeping in mind especially the mentioned kind of writing into the file, for that I suppose you would prefer TstringList, wouldn’t you?



...
But what to use, depends on what you want to achieve, and how crucial it is to go to the limit. (IOW how much you want to work, learn and experiment)

It’s very crucial. It will lead to what you mentioned, a large amount of work, learning and experimenting.



Generally speaking, to know the file size limits of any file related process you have to look at the size of the vars used to address it. If it's a 16 bits var it won't be able to address more than 64K; if 32 bits the limit then is 2 or 4 Gigas, and so on. For example, seeing the declaration of Seek:
Code: Pascal  [Select][+][-]
  1. procedure Seek(var f: file; Pos: Int64);

Okay, I’ll look into that.
« Last Edit: December 11, 2018, 04:23:25 pm by Epand »

Handoko

  • Hero Member
  • *****
  • Posts: 5131
  • My goal: build my own game engine using Lazarus
Re: Simple file with only one single record, but of HUGE size
« Reply #5 on: December 11, 2018, 04:28:25 pm »
Keeping in mind especially the mentioned kind of writing into the file, I suppose you would prefer TstringList, wouldn’t you?

If the file size < 100 MB and need some text operations (showing it to a TMemo, doing searching, etc) I prefer TStringList

but

If the file size is > 100 MB I prefer BlockRead and BlockWrite

Also, you may want to consider TStream/TFileStream as mentioned by marcov. You can think TFileStream as a modern version of BlockRead/BlockWrite.

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Simple file with only one single record, but of HUGE size
« Reply #6 on: December 11, 2018, 05:27:51 pm »
To be more specific, it shall have the largest size which can be addressed by FreePascal and the system it's running on.
The requisites you are mentioning are difficult to manage.  On Windows, the largest file size (at least theoretically) is 16TB, that's Terabytes, that's over 16,000 Gigabytes. 

Once your files go over some reasonable size, I'd say about 300MB, what you want to do is map sections of the file (never the entire file.)  Depending on what you're doing, how the contents of the file may be manipulated, you may need more than one section.   You don't change/manipulate the data in the sections, you accumulate changes in memory buffers that apply to a given section, once the user is done fiddling with the file, you apply the changes to the mappings as you create a new file (that includes the changes.)

That's the basic concept used to manage large, multi terabyte databases.

The most important thing when you are dealing with really large files is, avoid doing I/O.    Delay doing I/Os as long as possible, when you need to do I/O, batch them in file local groups, that often allows you to consolidate a number of I/Os into a smaller number of I/Os (that's the objective.)

The important thing to have very clear in mind is that, the way you deal with very large files, depends on how the files need to be manipulated.  There are some basic rules of thumb that apply but, ultimately, the algorithms to use depend on how the data inside the file needs to be managed.

To recap,

1. Use file mappings
2. Map _sections_ of the file, not the entire file (the O/S may not even allow mapping the entire file.)
3. Buffer and accumulate  any changes that must be made to the sections.
4. Consolidate the changes in memory
5. Merge the mapped buffers and the changes as the new (changed) file is created.

(it is usually a desirable thing to have a log of changes, allows for undo and user work recovery in case of something bad happening to the user's system - how you do that, is yet another ball of wax.)

HTH.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

ASerge

  • Hero Member
  • *****
  • Posts: 2223
Re: Simple file with only one single record, but of HUGE size
« Reply #7 on: December 11, 2018, 07:50:13 pm »
On Windows, the largest file size (at least theoretically) is 16TB, that's Terabytes, that's over 16,000 Gigabytes. 
Just to check, I tried to create large files on my Windows 7 system. Result:
16TB - "Invalid function call"
16TB-64KB - OK, but I can't delete the file or it's only deleted after a long wait. It looks like some system scanning process doesn't understand these sizes.
16TB-1MB - OK.
The test is attached and a disk of any size is suitable for testing 8).

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Simple file with only one single record, but of HUGE size
« Reply #8 on: December 11, 2018, 09:34:27 pm »
Just to check, I tried to create large files on my Windows 7 system.
You're courageous.  There are some things I don't trust Windows enough to even try.

Result:
16TB - "Invalid function call"
16TB-64KB - OK, but I can't delete the file or it's only deleted after a long wait. It looks like some system scanning process doesn't understand these sizes.
16TB-1MB - OK.
The test is attached and a disk of any size is suitable for testing 8).
I'm surprised it managed/accepted such file sizes.  16TB - 1MB is impressive.  Of course, that is just for a file "stub" of that size.  Attempting to fill the file would probably be a different experience.  (I most definitely wouldn't try that.) ;)

Thank you for trying and reporting the result.  Definitely educational. :)
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Epand

  • New Member
  • *
  • Posts: 25
Re: Simple file with only one single record, but of HUGE size
« Reply #9 on: December 11, 2018, 09:46:32 pm »
If the file size < 100 MB and need some text operations (showing it to a TMemo, doing searching, etc) I prefer TStringList
but
If the file size is > 100 MB I prefer BlockRead and BlockWrite

The different conclusions, are these your casual preferences or is there any technical reason for a 100 MB "crossroads"?
Please excuse me if the answer may be obvious and I don't see it.



Just to check, I tried to create large files on my Windows 7 system. Result: ...

Thank you very much for the "create large files" results. Has anybody done similiar checking under Windows 10 Pro?



The requisites you are mentioning ...

I'll return to your post later.

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Simple file with only one single record, but of HUGE size
« Reply #10 on: December 11, 2018, 10:31:13 pm »
If the file size < 100 MB and need some text operations (showing it to a TMemo, doing searching, etc) I prefer TStringList
but
If the file size is > 100 MB I prefer BlockRead and BlockWrite

The different conclusions, are these your casual preferences or is there any technical reason for a 100 MB "crossroads"?
Please excuse me if the answer may be obvious and I don't see it.

I'm not Handoko, but I woud guess the actual value can be picked out of a hat--or rather after some careful testing. In my case, for example, I use a hard limit of 1/8 of the RAM size (i.e. 64Mb for 512Mb of RAM and so on) which is a size that permits extracting and treating TStrings.Text with relative comfort while still being able to multitask without the system thrashing too much. If the file is greater, then I use a buffered stream with a 1 to 4 Mb buffer, if it can be set.

But in this, as in most similar things YMMV :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

soerensen3

  • Full Member
  • ***
  • Posts: 213
Re: Simple file with only one single record, but of HUGE size
« Reply #11 on: December 11, 2018, 11:05:05 pm »
Hard to say, could you probably attach the file?  :P Only kidding!

But since you didn't specify what you want to do, saving a text file is slower and bigger in size than writing a binary file. Maybe you need to use a text file but if you don't then maybe a binary file is the better option. Anyway the rules of not reading the whole file at once if you don't have to also apply for binary files. As mentioned before a Stream is probably what you want because it is easy to use and does work with text and with binary files. Just in case.. Binary files save the data more or less how they are stored in memory but they are not human readable. You probably still have to implement your own read and write functions. Start with small files to practice before you go for the really big files.
Lazarus 1.9 with FPC 3.0.4
Target: Manjaro Linux 64 Bit (4.9.68-1-MANJARO)

Handoko

  • Hero Member
  • *****
  • Posts: 5131
  • My goal: build my own game engine using Lazarus
Re: Simple file with only one single record, but of HUGE size
« Reply #12 on: December 12, 2018, 04:18:31 am »
If the file size < 100 MB and need some text operations (showing it to a TMemo, doing searching, etc) I prefer TStringList
but
If the file size is > 100 MB I prefer BlockRead and BlockWrite

The different conclusions, are these your casual preferences or is there any technical reason for a 100 MB "crossroads"?
Please excuse me if the answer may be obvious and I don't see it.

The value is based on my experience.

I work as a computer technician. I deal with lots of Windows computers. Locally, there still many home office computers using WinXP Core 2 Duo 1 GB RAM. Although the computers are outdated, it still works. And using System Information, I can see they usually have 300 MB or more free available memory.

That makes me think we should request memory space less than 300 MB if I want to create applications that can run on low spec computers. Why I choose 100 MB? Easy: 100 for the buffer + 100 for the application (as marcov said the application will automatically request double the memory if you use TStringList) + 100 free for Windows.

I never really build any memory hungry programs. But I ever created some tests using BlockRead/Write to parse big files with about 20 MB buffer. It worked smoothly.

So that's why I think 100 MB should be the good size as limit for a single TStringList or buffer for BlockRead. But if you're targeting on high-end computers, you can set the value higher, maybe 1 GB.
« Last Edit: December 12, 2018, 07:05:31 am by Handoko »

Epand

  • New Member
  • *
  • Posts: 25
Re: Simple file with only one single record, but of HUGE size
« Reply #13 on: December 12, 2018, 10:29:50 am »
The requisites you are mentioning are difficult to manage.  On Windows, the largest file size (at least theoretically) is 16TB, that's Terabytes, that's over 16,000 Gigabytes. 

Maybe I should have described better, sorry for any inconvenience.
Regarding the file size limit "as huge as possible" I have thought of the most fortunate case; most fortunate from the point of view, that the running system wouldn't finally tumble to any problems and crash.
That may differ from the logical limit of the OS or the physical limit.

Quote
That's the basic concept used to manage large, multi terabyte databases.

Does it differ if it's no database but only one single file of large/huge size?

Quote
The important thing to have very clear in mind is that, the way you deal with very large files, depends on how the files need to be manipulated.  There are some basic rules of thumb that apply but, ultimately, the algorithms to use depend on how the data inside the file needs to be managed.

The file will be filled sequentially character per character, no other choice. When full it will be read sequentially, that reading can be done in chunks, no problem.
There will ne no random access.

damieiro

  • Full Member
  • ***
  • Posts: 200
Re: Simple file with only one single record, but of HUGE size
« Reply #14 on: December 12, 2018, 11:28:39 am »
Quote
The file will be filled sequentially character per character, no other choice. When full it will be read sequentially, that reading can be done in chunks, no problem.
There will ne no random access.

It seems a tape problem, not a one file problem. And it seems that you can write as much tape as you want...

So the maximum size it's irrelevant if no random access is needed.
Let's take a 1Gb file Mybigfile.txt
You can chunk in a easy way it in 10 Files like Mybigfile001.txt, Mybigfile002.txt and so on.
The "file concept" is not tied to only One physical archive. You can have many phisical files that are TheBigFile.

And if you want to use only one file (not needed in that problem) you can append a single file in secuential writes to maximum os capacity or hd storage

How is your process?

-> Make a lot of random chars to a big array
-> Write it to a big file
-> Then read it from start?

But you need ALL the filesize in memory? Has any sense in a sequential read and write?

You can do
1 -> Make a lot of random chars to a big array
2 -> Write or Append if not the first to a big file or make more files.
-> Repeat  until  file is bigger than HD capacity or OS maximum or maximun arbitrary you want...
« Last Edit: December 12, 2018, 11:31:55 am by damieiro »

 

TinyPortal © 2005-2018