Recent

Author Topic: Blockread 32-bit limit  (Read 3232 times)

440bx

  • Hero Member
  • *****
  • Posts: 6138
Re: Blockread 32-bit limit
« Reply #15 on: June 29, 2025, 03:29:31 am »
I might have to go to C++.  :o Though, if it is an underlying OS issue, I guess that won't help.
You mentioned you're running under Windows. 

Under Windows, ReadFile takes a 32bit value as the number of bytes to read, therefore Windows will force you to read "small chunks" of 4GBs.  NtReadFile which is what ReadFile uses, also uses a 32bit value to specify the number of bytes to read, therefore even using the native API forces the maximum size of a read to be 4GBs.

Since file mapping is page based, I don't see that as a way to circumvent that limitation.  IOW, file mapping will make the O/S break I/Os in chunks of 1 page which may usually be 4K or 4MB, not anything greater than that.

It doesn't look like there is a way around performing multiple reads to read files larger than 4GBs.  BlockRead's int64 count parameter won't work under Windows and any value greater than high(DWORD) is invalid and will not deliver the expected result.

ETA:

Actually, to be precise, the 32bit value limits the size of the read buffer.  Consequently it is not possible to read more than 4GB because the maximum buffer size is limited to 4GB.  That's imposed by NtReadFile.
« Last Edit: June 29, 2025, 03:38:53 am by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

ASerge

  • Hero Member
  • *****
  • Posts: 2477
Re: Blockread 32-bit limit
« Reply #16 on: June 29, 2025, 04:20:41 am »
This file is big enough to where I can only do individual fields, but those are in the tens of gigabytes.
You can use memory mapping in Windows. This example is for a 27 GB file. The reading is very fast (less than 1 second):
Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, Forms, Controls, Graphics, Dialogs, StdCtrls;
  9.  
  10. type
  11.   TForm1 = class(TForm)
  12.     btnTest: TButton;
  13.     procedure btnTestClick(Sender: TObject);
  14.   private
  15.   public
  16.   end;
  17.  
  18. var
  19.   Form1: TForm1;
  20.                
  21. implementation
  22.  
  23. {$R *.lfm}
  24.  
  25. uses RtlConsts, Windows;
  26.  
  27. type
  28.   TMemMapRead = class(TCustomMemoryStream)
  29.   strict private
  30.     FMapHandle: THandle;
  31.     procedure CloseMapping;
  32.   public
  33.     constructor Create(const FileName: string; ShareWrite: Boolean = False);
  34.     function Write(const Buffer; Count: LongInt): LongInt; override;
  35.     destructor Destroy; override;
  36.   end;
  37.  
  38. { TMemMapRead }
  39.  
  40. constructor TMemMapRead.Create(const FileName: string; ShareWrite: Boolean);
  41. const
  42.   ShareFlag: array[Boolean] of DWORD = (FILE_SHARE_READ, FILE_SHARE_READ or FILE_SHARE_WRITE);
  43. var
  44.   FileSize: LARGE_INTEGER;
  45.   HFile: THandle;
  46.   P: Pointer;
  47. begin
  48.   inherited Create;
  49.   HFile := CreateFileW(PWideChar(UTF8Decode(FileName)), GENERIC_READ,
  50.     ShareFlag[ShareWrite], nil, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
  51.   if HFile = INVALID_HANDLE_VALUE then
  52.     RaiseLastOSError;
  53.   FileSize.LowPart := GetFileSize(HFile, @FileSize.HighPart);
  54.   if FileSize.QuadPart > 0 then
  55.   begin
  56.     FMapHandle := CreateFileMapping(HFile, nil, PAGE_READONLY, 0, 0, nil);
  57.     CloseHandle(HFile);
  58.     if FMapHandle = 0 then
  59.       RaiseLastOSError;
  60.     P := MapViewOfFile(FMapHandle, FILE_MAP_READ, 0, 0, 0);
  61.     if P = nil then
  62.       RaiseLastOSError;
  63.     SetPointer(P, FileSize.QuadPart);
  64.   end
  65.   else
  66.   begin
  67.     CloseHandle(HFile);
  68.     SetPointer(nil, 0);
  69.   end;
  70. end;
  71.  
  72. procedure TMemMapRead.CloseMapping;
  73. begin
  74.   if Memory <> nil then
  75.   begin
  76.     UnmapViewOfFile(Memory);
  77.     SetPointer(nil, 0);
  78.   end;
  79.   if FMapHandle <> 0 then
  80.   begin
  81.     CloseHandle(FMapHandle);
  82.     FMapHandle := 0;
  83.   end;
  84. end;
  85.  
  86. destructor TMemMapRead.Destroy;
  87. begin
  88.   CloseMapping;
  89.   inherited;
  90. end;
  91.  
  92. function TMemMapRead.Write(const Buffer; Count: LongInt): LongInt;
  93. begin
  94.   raise EStreamError.CreateRes(@SWriteError);
  95. end;
  96.  
  97. { TForm1 }
  98.  
  99. procedure TForm1.btnTestClick(Sender: TObject);
  100. var
  101.   Mem: TMemMapRead;
  102. begin
  103.   Mem := TMemMapRead.Create('D:\Users\Serge\VirtualBox VMs\Work Stations\MSEdge - Win10\MSEdge - Win10-disk002.vmdk');
  104.   try
  105.     Mem.Position := 4*1024*1024*1024 + 1;
  106.     Caption := Mem.ReadDWord.ToString;
  107.   finally
  108.     Mem.Free;
  109.   end;
  110. end;

Hartmut

  • Hero Member
  • *****
  • Posts: 1100
Re: Blockread 32-bit limit
« Reply #17 on: June 29, 2025, 07:26:38 am »
If you want to read large files, map them into memory instead. On unix you can use mmap (fpmmap). On windows there is CreateFileMapping.

A cross-platform alternative to this is to use package mORMot2 which runs on Windows/Linux/BSD/MacOS. For memory mapping of files use type 'TMemoryMap' in unit mormot.core.os = https://synopse.info/files/doc/api/mormot.core.os.html#TMEMORYMAP

For blockread and blockwrite I found out 2 additional important things:

1) If you want to use blockread for files >= 2 GB, you can use a loop and read chunks of up to 2 GB. But do not use a maximum size of 2,147,483,647 = $7FFFFFFF bytes! Instead use a maximum size of 2,147,479,552 = $7FFFF000. Depending of the low 12 bits of your buffer address bigger values may fail. Same for blockwrite. Tested with FPC 3.2.0 on Linux in a 64-bit program.

2) For blockread and blockwrite there exist alternative calls with a 'Result':
Code: Pascal  [Select][+][-]
  1. procedure BlockRead(var f: File; var Buf; Count: Int64; var Result: Int64);
  2. procedure BlockWrite(var f: File; const Buf; Count: Int64; var Result: Int64);
Usually  programmers use this alternative for reading, if they don't know exactly, how many bytes are left to read. And if 'Result' is < 'Count' they stop reading, thinking they have reached the EOF. But this is wrong if 'Count' is >= 2 GB. In this case 'Result' will never be > $7FFFF000. If you then stop reading, you miss the complete rest of the file.
Same for blockwrite. Tested with FPC 3.2.0 on Linux in a 64-bit program.

jamie

  • Hero Member
  • *****
  • Posts: 7596
Re: Blockread 32-bit limit
« Reply #18 on: June 29, 2025, 03:04:26 pm »
Let the poster do there C++ version if they are so confident, they will find the same limitations!

Jamie
The only true wisdom is knowing you know nothing

Thaddy

  • Hero Member
  • *****
  • Posts: 18765
  • To Europe: simply sell USA bonds: dollar collapses
Re: Blockread 32-bit limit
« Reply #19 on: June 29, 2025, 03:45:39 pm »
Yes, pretty funny that he can dp better elsewhere. As you remark, same problem is C++.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

dsbw

  • Newbie
  • Posts: 5
Re: Blockread 32-bit limit
« Reply #20 on: June 30, 2025, 03:39:40 am »
Quote
Let the poster do there C++ version if they are so confident, they will find the same limitations!

Quote
Yes, pretty funny that he can dp better elsewhere. As you remark, same problem is C++.

Guys, it's literally on the same line I wrote:

Quote
I might have to go to C++.  :o Though, if it is an underlying OS issue, I guess that won't help.

The first things I saw implied it was an FPC issue. It appears not to be!

You can use memory mapping in Windows. This example is for a 27 GB file. The reading is very fast (less than 1 second):

Thank you very much for this. That code is fast because you're making a single call to a random position. Which is cool. But if you read the entire file serially, you'll find it's quite slow. (My 36GB file takes over 2 minutes.)

Lacking a solution, I broke down and read the file in 2GB chunks, stitched the stragglers at the end of every block to the beginning of the next block, and reading the file 3 times took less than 20 seconds. Then I discovered that was due to a bug that broke strings.   :'( It was sorta funny. If I set my block size to:

Code: Pascal  [Select][+][-]
  1. BlockSize: Int64 = 2048 * 1024 * 1024;


It came back blazingly fast, but for some reason started capping the strings it was finding at 255 bytes. If I do this:

Code: Pascal  [Select][+][-]
  1. BlockSize: Int64 = 2047 * 1024 * 1024; // 64MB chunks instead of 2GB

It works but alas, is nowhere near as fast.

Final test: Reading the file in 2GB chunks is only about 20% faster (120 seconds to 100 seconds).

I have to decide whether that's worth the code overhead.

440bx

  • Hero Member
  • *****
  • Posts: 6138
Re: Blockread 32-bit limit
« Reply #21 on: June 30, 2025, 05:04:17 am »
Thank you very much for this. That code is fast because you're making a single call to a random position. Which is cool. But if you read the entire file serially, you'll find it's quite slow. (My 36GB file takes over 2 minutes.)
You can combine @ASerge's solution with ReadFile.

Basically, using ReadFile you can read as much as 4GB in one shot into the file mapping (at least in theory.)  IOW, you don't need to read one page at a time (which will have a tendency to be slow because every time a page is accessed for the first time it causes an exception handled in ring-0 to associate physical memory with the page.)

Doing it that way, the only thing you have to bother yourself with is calculating the correct offset in the mapping for every read.   Quite simple actually and extremely quick.

If you want to maximize speed, you could use ReadFileEx and asynchronously read blocks into the mapping and process already read blocks while new ones are being read.  Disclaimer: I know it can be done but, I've never indulged into it because I didn't run into a case where the performance gain would be worth the additional code complexity.

ETA:

I forgot to mention one thing.

The first time you allocate a large block of memory, Windows normally allocates NO memory at all, it only gives the program address space.  if the space is committed then Windows will automatically associate a page of real memory with that address whenever the address is touched (read or written.) 

The net effect of that is, the first time a page is accessed a transition to ring-0 takes place for the O/S to map physical memory to that virtual address.  This will have one very noticeable side effect: the first time the file is read will take noticeably longer than the subsequent times because the first time incurs all those transitions for the O/S to associate real memory with a virtual address.

You want to make it as fast as ring-3 code can make it ? ... use VirtualAllocEx to map a fairly large buffer (multi-gigabyte) while specifying MEM_LARGE_PAGES.  That causes the O/S to associate physical memory with the entire range of address space requested.   It also means the allocation will take significantly longer than a normal 4K page allocation because the O/S is actually managing memory to satisfy the request.  If that sounds good to you, there are significant downsides to that method: the first one is that enough physical memory in contiguous blocks of 4MB need to be available to satisfy the request.  IOW, it is a very real possibility that the request may fail due to insufficient memory because there aren't enough 4MB contiguous blocks to cover the requested range.  The other significant problem is that the memory used is NOT paged, it is locked and used exclusively by your process.  IOW, you can very easily make one of today's fastest machine be slower than a 64KB original IBM PC by allocating memory that way.  Succinctly, it is the easiest way to starve the system of memory.   Fortunatley, in addition to all that, you need SeLockMemoryPrivilege to do that.

« Last Edit: June 30, 2025, 05:36:00 am by 440bx »
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

dsbw

  • Newbie
  • Posts: 5
Re: Blockread 32-bit limit
« Reply #22 on: June 30, 2025, 08:41:40 pm »
That's interesting. Yeah, I did see the whole issue of Windows (particularly 11?) being sort of haphazard with regard to memory and making it hard to allocate huge chunks.

Thanks for the feedback!

Thaddy

  • Hero Member
  • *****
  • Posts: 18765
  • To Europe: simply sell USA bonds: dollar collapses
Re: Blockread 32-bit limit
« Reply #23 on: July 01, 2025, 10:32:55 am »
It is not a Windows only issue, although 440bx answer is Windows centric: the same issue happens on Unixes.
On such big allocations, one usually uses windowed streams for - sequential - access. Otherwise an SQL or NOSQL database which has planar access. These can be allocated in memory.
« Last Edit: July 01, 2025, 10:46:04 am by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

creaothceann

  • Sr. Member
  • ****
  • Posts: 278
Re: Blockread 32-bit limit
« Reply #24 on: August 04, 2025, 07:07:48 pm »

Thaddy

  • Hero Member
  • *****
  • Posts: 18765
  • To Europe: simply sell USA bonds: dollar collapses
Re: Blockread 32-bit limit
« Reply #25 on: August 04, 2025, 09:54:21 pm »
 ;) ;) ;D
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

Warfley

  • Hero Member
  • *****
  • Posts: 2040
Re: Blockread 32-bit limit
« Reply #26 on: August 04, 2025, 11:00:51 pm »
Final test: Reading the file in 2GB chunks is only about 20% faster (120 seconds to 100 seconds).

I have to decide whether that's worth the code overhead.
This should not be the case. The slowest thing about reading a file is of course the IO, if you got an SSD, there should be absolutely no difference between a one time read and many small reads.
The only difference between small and big reads are
1. the context switch when calling a kernel function (around a few milliseconds), which in terms of computer speeds is a lot (normal function calls are in the order of nanoseconds)
2. The speed of your code when doing the read.

Both of which is negligable compared to the IO speed, once you hit a certain threshold.

I quickly written a chunked read function, which just takes a file and reads the whole contents into a continuous block of memory (byte array for convinience):
Code: Pascal  [Select][+][-]
  1. uses
  2.   Classes, SysUtils, Math;
  3.  
  4. function ReadFile(const FName: String; ChunkSize: Integer): TBytes;
  5. var
  6.   fs: TFileStream;
  7.   head: SizeInt;
  8. begin
  9.   Result:=nil;
  10.   fs:=TFileStream.Create(FName, fmOpenRead);
  11.   try
  12.     SetLength(Result, fs.Size);
  13.     head:=0;
  14.     while head<Length(Result) do
  15.     begin
  16.       fs.ReadBuffer(Result[head],Min(ChunkSize,Length(Result)-head));
  17.       Inc(head, ChunkSize);
  18.     end;
  19.   finally
  20.     fs.Free;
  21.   end;
  22. end;

And tested it with a 17GB file:
Code: Pascal  [Select][+][-]
  1. var
  2.   Start: QWord;
  3.   cs: Integer;
  4.   data: TBytes;
  5. begin
  6.   cs:=1024;
  7.   while cs<1024*1024*1024 do
  8.   begin
  9.     data := nil; // Dealloc before doing the measurement
  10.     Start := GetTickCount64;
  11.     data := ReadFile(ParamStr(1),cs);
  12.     WriteLn('Chunksize ', cs, ': ', GetTickCount64-start);
  13.     cs := cs * 2;
  14.   end;
  15.   ReadLn;
  16. end.

Here are the results:
Code: Text  [Select][+][-]
  1. Chunksize 1024: 891
  2. Chunksize 2048: 562
  3. Chunksize 4096: 438
  4. Chunksize 8192: 297
  5. Chunksize 16384: 234
  6. Chunksize 32768: 203  <-- No improvement from this point on
  7. Chunksize 65536: 219
  8. Chunksize 131072: 187
  9. Chunksize 262144: 204
  10. Chunksize 524288: 187
  11. Chunksize 1048576: 219
  12. Chunksize 2097152: 172
  13. Chunksize 4194304: 218
  14. Chunksize 8388608: 204
  15. Chunksize 16777216: 203
  16. Chunksize 33554432: 172
  17. Chunksize 67108864: 187
  18. Chunksize 134217728: 172
  19. Chunksize 268435456: 187
  20. Chunksize 536870912: 204
  21.  
Time in Milliseconds. Basically once you are in the order of multiple page sizes, there is no difference anymore. It doesn't matter if you read chunks in the order of Gigabytes or megabytes. As soon as you are at a few page sizes (so a few tens of kilobytes) it's the same speed.

Page size on my system (windows default) should be 4kb, so why it hit's it's peak at 4-8 pages I don't know, I would have assumed it does after 2 pages but still the expected result. Once you go magnitudes over the page limit it just normalizes
« Last Edit: August 04, 2025, 11:06:02 pm by Warfley »

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 881
Re: Blockread 32-bit limit
« Reply #27 on: August 04, 2025, 11:34:54 pm »
It's Windows limitation. It's easy to overcome it via simple code, if you don't care about tiny overhead, caused by extra system calls:
Code: Pascal  [Select][+][-]
  1. while Size > 0 do begin
  2.   if Size > MaxFileSize32 then
  3.     CurrentSize := MaxFileSize32
  4.   else
  5.     CurrentSize := Size;
  6.   ReadFile(Buffer, CurrentSize);
  7.   Size -= CurrentSize;
  8.   Buffer += CurrentSize;
  9. end;
  10.  
File mapping would only be more effective in case of lack of memory. Why? Because mapped file isn't swapped. It works as swapfile by itself.
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

440bx

  • Hero Member
  • *****
  • Posts: 6138
Re: Blockread 32-bit limit
« Reply #28 on: August 05, 2025, 12:25:12 am »
Another thing that greatly affects read speed is file fragmentation. 

If the file is in a single contiguous block in adjacent tracks that will provide good performance.  If the file sectors are all over the place, that's a recipe for poor performance no matter the buffer size.  Of course, SSDs are not nearly as sensitive to the equivalent problem.

Regardless of O/S the underlying drive hardware also has a significant impact on read/write performance and, both the hardware and the algorithms used are optimized for the common case, which doesn't include reading 4GB (or much less for that matter) of data in a single I/O.
FPC v3.2.2 and Lazarus v4.0rc3 on Windows 7 SP1 64bit.

PascalDragon

  • Hero Member
  • *****
  • Posts: 6353
  • Compiler Developer
Re: Blockread 32-bit limit
« Reply #29 on: August 05, 2025, 09:43:12 pm »
Regardless of O/S the underlying drive hardware also has a significant impact on read/write performance and, both the hardware and the algorithms used are optimized for the common case, which doesn't include reading 4GB (or much less for that matter) of data in a single I/O.

And most hardware will have to split up such transfers anyway. E.g. many USB sticks and bridges only really support up to 2 MiB per transfer and above that might even become unstable or throw errors.

 

TinyPortal © 2005-2018