Recent

Author Topic: Blockread 32-bit limit  (Read 3206 times)

dsbw

  • Newbie
  • Posts: 5
Blockread 32-bit limit
« on: June 28, 2025, 01:30:09 am »
Here's some code I've used many times over the years, but which gives me a disk read error now:

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var f: File;
  3.    fs: Int64;
  4.     p: Pointer;
  5. begin
  6.    AssignFile(f, 'megagoals.txt');
  7.    reset(f, 1);
  8.    fs := FileSize(f);
  9.    p := GetMem(fs);
  10.    BlockRead(f, p^, fs);
  11.    Button1.Caption := IntToStr(fs)+ ' Bytes Read!';
  12. end;
  13.  

If after getting the filesize, I do this:

Code: Pascal  [Select][+][-]
  1.    fs :=  2147483647;    

It works. If I do this:

Code: Pascal  [Select][+][-]
  1.    fs :=  2147483648;    

I get the error. This would seem to indicate there's a 32-bit signed integer in the codepile. But looking at it, it should use the 64-bit everything, as I've specified.

Thoughts? I've looked for switches or some indication of something dropping this to 32-bits but haven't seen anything leap out at me.

jamie

  • Hero Member
  • *****
  • Posts: 7516
Re: Blockread 32-bit limit
« Reply #1 on: June 28, 2025, 01:44:37 am »
is your target 32 bit?
The only true wisdom is knowing you know nothing

dsbw

  • Newbie
  • Posts: 5
Re: Blockread 32-bit limit
« Reply #2 on: June 28, 2025, 03:00:17 am »
|| is your target 32 bit?

Presumably. But I can't figure out how.

I looked at my GUI app and saw Win32 GUI app targeted so I unchecked that. Still got the same Disk Read error when I crossed that threshhold.

Then I dropped out of the GUI builder to build a straight console app, targeted Win64. Same thing: 7 works, 8 fails.  %)

Handoko

  • Hero Member
  • *****
  • Posts: 5515
  • My goal: build my own game engine using Lazarus
Re: Blockread 32-bit limit
« Reply #3 on: June 28, 2025, 03:47:59 am »
I tested your code on Windows 11 64-bit Lazarus 3.8 64-bit with 2 files. If the file size was 2130180486, everything worked correctly. But if the file size was 2243830707, I got the error below:

d2010

  • Sr. Member
  • ****
  • Posts: 251
Re: Blockread 32-bit limit
« Reply #4 on: June 28, 2025, 04:59:12 am »
Thoughts? I've looked for switches or some indication of something dropping this to 32-bits but haven't seen anything leap out at me.

You cannot overdraft the memory DDR3 with too-big Buffer, because anyone
need work of buffer each-each by time, each each max20MB  per 01sec.
If you increase the buffer upload to 1000MB, then You slow-down the
speed of buffer, speed of 20MB, down to per60sec, from per01sec.
Always, You must use fix buffer, not dynamically of size.(e.g. 2MB), even my files
have the sizes 2MB, 1.79MB, 99MB, 50MB..
 :-\
Code: Pascal  [Select][+][-]
  1. procedure TMainForm.Cd_or_DVdToIso(Const SrceFile,DestFile:string;Var cancel:boolean);
  2. var
  3.   src, dst: Integer;
  4.   buf: Pointer;
  5.   n, bytes, bytestocopy:Isscint23dll;
  6.   lastpart: array [0..4] of Integer;
  7. begin
  8.   if Tag = 0 then
  9.     begin
  10.       Tag := 1;
  11.       cancel := False;
  12.       src := INVALID_HANDLE_VALUE;
  13.       dst := INVALID_HANDLE_VALUE;
  14.       try
  15.         GetMem(buf, (1 shl 20)+(1 shl 20)); // 20 Megabyte ???
  16.         src := CreateFile(PChar(SrceFile), GENERIC_READ, FILE_SHARE_READ, nil, OPEN_EXISTING, 0, 0);
  17.         if (src = INVALID_HANDLE_VALUE) then
  18.           raise EInOutError.Create('Calendar cu texte din opera părintelui Arsenie Boca
  19. ');
  20.         dst := CreateFile(PChar(DestFile), GENERIC_WRITE, FILE_SHARE_READ, nil, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);
  21.         if (dst = INVALID_HANDLE_VALUE) then
  22.           raise EInOutError.Create('Părintele Arsenie Boca cinstit ca Sfântul Cuvios Mărturisitor ...');
  23.         bytestocopy := SetFilePointer(src, 0, nil, FILE_END);
  24.         SetFilePointer(src, -sizeof(lastpart), nil, FILE_END);
  25.         if (not ReadFile(src, lastpart, sizeof(lastpart), n, nil)) then
  26.           raise EInOutError.Create('Sfântul Arsenie De La Prislop');
  27. .....
  28.  
You search ??? inside the code.
« Last Edit: June 28, 2025, 05:15:02 am by d2010 »

Jorg3000

  • Jr. Member
  • **
  • Posts: 81
Re: Blockread 32-bit limit
« Reply #5 on: June 28, 2025, 06:14:10 am »
Hi!
I don't think it's a problem with GetMem, but that the file functions are not consistently 64 bit.

BlockRead() internally calls the Do_Read() function, whose len parameter is only 32 bits. I think this is a bug.

Procedure BlockRead(var f:File;var Buf;Count:Int64;var Result:Int64);

function do_read(h:thandle; addr:pointer; len: longint): longint;   

MarkMLl

  • Hero Member
  • *****
  • Posts: 8527
Re: Blockread 32-bit limit
« Reply #6 on: June 28, 2025, 08:53:41 am »
OP: you're asking a specific question in a general forum while giving absolutely no information on the platform(s) you're using.

There's a whole lot of problems with various APIs on different OSes and processors which limit the size of transfers. The main issue is that they tend to be poorly-understood and worse-documented.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12634
  • FPC developer.
Re: Blockread 32-bit limit
« Reply #7 on: June 28, 2025, 10:20:46 am »
I don't think it's a problem with GetMem, but that the file functions are not consistently 64 bit.

File pointers are 64-bit. Memory array sizes are a totally different angle.

Hartmut

  • Hero Member
  • *****
  • Posts: 1061
Re: Blockread 32-bit limit
« Reply #8 on: June 28, 2025, 11:27:04 am »
I can confirm that this is a bug at least in FPC 3.0.4 + 3.2.0. It occurs also on Linux in a 64-bit program.

And blockwrite has the same bug!

If you use {$I+} then an exception occurs. If you use {$I-} then function ioresult will return 100 or 101.

Please file a bug report in https://gitlab.com/freepascal.org/fpc/source/-/issues and post it's link here, so that it can be found easily by people who have reached this topic. Thanks.

jamie

  • Hero Member
  • *****
  • Posts: 7516
Re: Blockread 32-bit limit
« Reply #9 on: June 28, 2025, 11:59:50 am »
ReadFile in widows uses a DWORD for count, not  Int64.

Don't know if that helps at all but I do know that FPC likes to convert things to signed integers when it shouldn't.

Jamie
The only true wisdom is knowing you know nothing

jamie

  • Hero Member
  • *****
  • Posts: 7516
Re: Blockread 32-bit limit
« Reply #10 on: June 28, 2025, 12:18:24 pm »
has anyone tried using a "Cardinal" instead for "fs" ?

The limit is still DWORD and last time I knew, Cardinal was supposed to work for that, or maybe it don't in fpc.
The only true wisdom is knowing you know nothing

Warfley

  • Hero Member
  • *****
  • Posts: 2038
Re: Blockread 32-bit limit
« Reply #11 on: June 28, 2025, 12:48:49 pm »
If you want to read large files, map them into memory instead. On unix you can use mmap (fpmmap). On windows there is CreateFileMapping.

With both of those you'll get a pointer to the raw data of the file you can access as if you read the whole file using a Filestream. But there are a two main advantages:
1. Multiple processes that map the same file as readonly will share the same virtual memory space for that file meaning the data is only read once
2. The file is not read as once but whenever you access a part of the file which was not loaded the OS will load it

If you insist on reading a large file, you should split the read up into multiple smaller reads, preferrably not larger than a page size (4K), this way the OS level call will not block for as long reducing the chance of hitting a signal during the read, and also if a signal hits, which interrupts the read, you don't need to redo the whole read just the block.
« Last Edit: June 28, 2025, 12:51:29 pm by Warfley »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12634
  • FPC developer.
Re: Blockread 32-bit limit
« Reply #12 on: June 28, 2025, 02:47:49 pm »
If you want to read large files, map them into memory instead. On unix you can use mmap (fpmmap). On windows there is CreateFileMapping.

With both of those you'll get a pointer to the raw data of the file you can access as if you read the whole file using a Filestream. But there are a two main advantages:
1. Multiple processes that map the same file as readonly will share the same virtual memory space for that file meaning the data is only read once
2. The file is not read as once but whenever you access a part of the file which was not loaded the OS will load it

That is an often made point, but there are also downsides to memory mapping, I actually once tried to build a MUD like game on top of memory mapping technology (admittedly in a time when 64MB was a lot of memory), but reverted to a more traditional caching (with a maximum amount of memory dedicated to the cache) to avoid my main datastructures being paged out and suffer bad response to other users (other than the one that did the massive query).


PascalDragon

  • Hero Member
  • *****
  • Posts: 6311
  • Compiler Developer
Re: Blockread 32-bit limit
« Reply #13 on: June 28, 2025, 04:08:54 pm »
Here's some code I've used many times over the years, but which gives me a disk read error now:

Code: Pascal  [Select][+][-]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. var f: File;
  3.    fs: Int64;
  4.     p: Pointer;
  5. begin
  6.    AssignFile(f, 'megagoals.txt');
  7.    reset(f, 1);
  8.    fs := FileSize(f);
  9.    p := GetMem(fs);
  10.    BlockRead(f, p^, fs);
  11.    Button1.Caption := IntToStr(fs)+ ' Bytes Read!';
  12. end;
  13.  

If after getting the filesize, I do this:

Code: Pascal  [Select][+][-]
  1.    fs :=  2147483647;    

Why would you even want to read such a big file at once? While storages become faster (~300 MB/s for fast HDDs, ~ 600 MB/s for S-ATA SSDs and >> 3 GB/s for current NVMe drives with PCIe 4.0 or newer) do you really not want to provide feedback for the user or give them the chance to abort an operation (depending on the use case)? The operating system will have to split up the operation anyway, cause the devices are not able to handle such sizes (e.g. USB devices are often limited to 1 or 2 MB per transfer and even NVMe drives have a speed cap at around 8 MB).
Thus it would be better anyway to use smaller transfer sizes (e.g. 2 or 4 MB is rather nice) and read the file in multiple parts. Thus you can not only provide feedback to the user (especially on slower drives), but also won't have to deal with such size limits.

dsbw

  • Newbie
  • Posts: 5
Re: Blockread 32-bit limit
« Reply #14 on: June 29, 2025, 01:47:58 am »
Thanks for the feedback, guys. It's much appreciated. Let me address some of the points raised and clarify what I'm doing at the bottom:

Quote
OP: you're asking a specific question in a general forum while giving absolutely no information on the platform(s) you're using.

So, pro forma, I should say I'm running on Windows 10 or 11. On the other hand, shouldn't a call with 64-bit integers be reliably callable across platforms? (Or it should raise an error where it can't be done.)

Quote
d2010

I'm sorry, I don't really understand what you're getting at. The mission is: Read the entire file into memory in one call. 25 years ago I could read a 1GB file into memory on a machine with 1GB, no problem, I feel like 35GB should be well within reach now for a machine with 64GB. (It's not, of course, which is why I'm here. )

BlockRead() internally calls the Do_Read() function, whose len parameter is only 32 bits. I think this is a bug.

I think it's a bug, too! I mean, maybe for whatever reason it can't be done, but I think it should be indicated somehow, somewhere.

has anyone tried using a "Cardinal" instead for "fs" ?

I just did. It does not error out. However, it manages this by being completely wrong about the filesize, returning 1 billion something.

If you want to read large files, map them into memory instead.
No! Or, more politely, no, thank you!  :D (I'll explain at the bottom.)

With both of those you'll get a pointer to the raw data of the file you can access as if you read the whole file using a Filestream. But there are a two main advantages:
1. Multiple processes that map the same file as readonly will share the same virtual memory space for that file meaning the data is only read once
2. The file is not read as once but whenever you access a part of the file which was not loaded the OS will load it

1. There are no multiple processes.
2. I am accessing it all at once. Serially, from top-to-bottom every time.

If you insist on reading a large file, you should split the read up into multiple smaller reads, preferrably not larger than a page size (4K), this way the OS level call will not block for as long reducing the chance of hitting a signal during the read, and also if a signal hits, which interrupts the read, you don't need to redo the whole read just the block.
I mean, I can stream the stupid thing in. I can break it into chunks. I'm coming from the modern dynamic language world which would love nothing more than to put it in a big bloated tree of some kind. I will elaborate further:

Why would you even want to read such a big file at once?

I'm glad you asked that!  :D

do you really not want to provide feedback for the user or give them the chance to abort an operation (depending on the use case)?
There is no user. There is no operation. There is only analysis.

Thus it would be better anyway to use smaller transfer sizes (e.g. 2 or 4 MB is rather nice) and read the file in multiple parts. Thus you can not only provide feedback to the user (especially on slower drives), but also won't have to deal with such size limits.

So, here's the situation: I'm slicing up a database table to learn about its contents. I say to my program "Here's a file. Figure out what's REALLY in it—because DB specs are lazy and sloppy and abused—and give me a report." Generally, I'm able to go through an entire table at once. This file is big enough to where I can only do individual fields, but those are in the tens of gigabytes.

By far the most time consuming part of this process is reading from the disk. It's the difference between seconds or minutes (depending on filesize) and instantaneous. It's the difference between being able to run something dozens of times very quickly to figure out how to tweak parameters and filter out garbage, and having to wait ten minutes and then pick up context.

I might have to go to C++.  :o Though, if it is an underlying OS issue, I guess that won't help.

(I've been doing this so long, by the way, that I first had to read in 64KB blocks because THAT was the largest value you could blockread/blockwrite. It was needlessly complicated because the data generally comes in as a text file, and there's block boundaries don't fall neatly on line ends. I swore, with God As My Witness, I would never go back to trying read chunked files, once the 32-bit addressing came around.)

 

TinyPortal © 2005-2018