Lazarus

Free Pascal => FPC development => Topic started by: lagprogramming on October 04, 2021, 10:45:30 am

Title: TFileStream file size limit
Post by: lagprogramming on October 04, 2021, 10:45:30 am
   Hi!
   When trying to save a file in a linux-x86_64 environment, an application returns an error. I've noticed that the error appears when the file exceeds the maximum value of a longint. Looking at the assembler window of Lazarus I've assumed that the error was related to the procedure TStream.WriteBuffer(const Buffer; Count: Longint); that is found in rtl/objpas/classes/streams.inc.
Because the situation appears to be simmilar I've modified both TStream.WriteBuffer(const Buffer; Count: Longint) and TStream.ReadBuffer(var Buffer; Count: Longint);
from
Code: Pascal  [Select][+][-]
  1. procedure TStream.ReadBuffer(var Buffer; Count: Longint);
  2.  
  3. Var
  4.   r,t : longint;
  5.  
  6. begin
  7.   t:=0;
  8.   repeat
  9.     r:=Read(PByte(@Buffer)[t],Count-t);
  10.     inc(t,r);
  11.   until (t=Count) or (r<=0);
  12.   if (t<Count) then
  13.     Raise EReadError.Create(SReadError);
  14. end;
  15.  
  16. procedure TStream.WriteBuffer(const Buffer; Count: Longint);
  17.  
  18. var
  19.   r,t : Longint;
  20.  
  21.   begin
  22.     T:=0;
  23.     Repeat
  24.        r:=Write(PByte(@Buffer)[t],Count-t);
  25.        inc(t,r);
  26.     Until (t=count) or (r<=0);
  27.     if (t<Count) then
  28.        Raise EWriteError.Create(SWriteError);
  29.   end;
to
Code: Pascal  [Select][+][-]
  1. procedure TStream.ReadBuffer(var Buffer; Count: NativeInt);
  2.  
  3. Var
  4.   r,t : NativeInt;
  5.  
  6. begin
  7.   t:=0;
  8.   repeat
  9.     r:=Count-t;
  10.     if r>maxlongint then r:=maxlongint;
  11.     r:=Read(PByte(@Buffer)[t],r);
  12.     inc(t,r);
  13.   until (t=Count) or (r<=0);
  14.   if (t<Count) then
  15.     Raise EReadError.Create(SReadError);
  16. end;
  17.  
  18. procedure TStream.WriteBuffer(const Buffer; Count: NativeInt);
  19.  
  20. var
  21.   r,t : NativeInt;
  22.  
  23.   begin
  24.     T:=0;
  25.     Repeat
  26.        r:=Count-t;
  27.        if r>maxlongint then r:=maxlongint;
  28.        r:=Write(PByte(@Buffer)[t],r);
  29.        inc(t,r);
  30.     Until (t=count) or (r<=0);
  31.     if (t<Count) then
  32.        Raise EWriteError.Create(SWriteError);
  33.   end;
   I've built fpc clean and after that I've rebuilt Lazarus clean but the application returned the same error when trying to write files with a size greater than the maximum value of a longint. I've looked at the assembler window of Lazarus again and to my surprise I've seen a "CLASSES$_$TSTREAM_$__$$_WRITEBUFFER$formal$LONGINT" line followed by the same assembly code. It's like I haven't changed anything in the streams.inc file.

   What am I doing wrong, how comes that the assembly code remained the same? Remember that it's a x86_64 target. I'm using relatively new fpc and Lazarus sources and the error in the application appears at a TFilestream.WriteBuffer routine. Could it be that an optimized assembly procedure is used instead of a generic pascal-written procedure!? By the way, I think there is or was a bug report related to this situation, the new interface of the bug tracking system is...new to me :-[.
Title: Re: TFileStream file size limit
Post by: marcov on October 04, 2021, 11:05:59 am
There are two issues here:


The first should be ok, the second is not yet supported. Afaik the variable declaration was considered not worth it, and many 64-bit targets also don't support it. So since the safest blocksize is probably 2GB-1, easiest is to write in one GB increments.

p.s. It might you hit some overload issue somewhere with your recompile.
Title: Re: TFileStream file size limit
Post by: lagprogramming on October 04, 2021, 02:55:54 pm
   The following code fails to write to file
Code: Pascal  [Select][+][-]
  1. procedure writefail;
  2. var
  3.   x:TFilestream;
  4.   p:pointer;
  5. begin
  6.   p:=getmem(maxlongint+1);
  7.   x:=TFilestream.Create('/home/user/bin.txt',fmcreate);
  8.   x.WriteBuffer(p^,maxlongint+1);//Fails to write garbage in linux-x86_64
  9.   x.Free;
  10. end;

   If procedure TStream.ReadBuffer(var Buffer; Count: Longint) and procedure TStream.WriteBuffer(const Buffer; Count: Longint) should be left intact then wouldn't be a good idea to modify the following two procedures?
   
Code: Pascal  [Select][+][-]
  1. procedure TStream.ReadBuffer(var Buffer: TBytes; Offset, Count: NativeInt);
  2. begin
  3.   ReadBuffer(Buffer[OffSet],Count);
  4. end;
  5.  
  6. procedure TStream.WriteBuffer(const Buffer: TBytes; Offset, Count: NativeInt);
  7. begin
  8.   WriteBuffer(Buffer[Offset],Count);
  9. end;

   If the above code would be modified adding a repeat/while loop, fpc might have two routines that work as expected. In my point of view it should be better than writing new procedures within local units of developing projects, procedures that would try to do what a developer expects from the above procedure declarations.
Title: Re: TFileStream file size limit
Post by: korba812 on October 04, 2021, 03:28:12 pm
   The following code fails to write to file
Code: Pascal  [Select][+][-]
  1. procedure writefail;
  2. var
  3.   x:TFilestream;
  4.   p:pointer;
  5. begin
  6.   p:=getmem(maxlongint+1);
  7.   x:=TFilestream.Create('/home/user/bin.txt',fmcreate);
  8.   x.WriteBuffer(p^,maxlongint+1);//Fails to write garbage in linux-x86_64
  9.   x.Free;
  10. end;
The result of "MaxLongint + 1" is Int64. Try "MaxLongint - 1".
Title: Re: TFileStream file size limit
Post by: marcov on October 04, 2021, 04:00:40 pm
If the function has a type that is limited to 2GB-1, why he would try 2GB or more is a mystery to me.   

Note that the upper limit is not 64-bit, but how large a block an application can allocate , which depends on a lot of things.

Anyway, afaik there are reports for them, search them and comment on them.

Keep in mind that  adding a loop to those functions adds code for people using this for short writes, just to benefit the few people creating a disk/ssd exerciser/benchmark or so doing gigantic writes.
Title: Re: TFileStream file size limit
Post by: Thaddy on October 04, 2021, 04:31:14 pm
AFAIK Address space is limited to high(qword), at lot.., but single read/writes are limited to high(nativeint) which is indeed 2G
Note that in practise it limits to available memrory otherwise a EOutOfMemory is thrown.
Note that for 32 bit WIN there is a PE flag - since windows 7 - that extends available memory to 4G but still with the limitation ffor single read/writes
Title: Re: TFileStream file size limit
Post by: Thaddy on October 04, 2021, 04:39:33 pm
The result of "MaxLongint + 1" is Int64. Try "MaxLongint - 1".
In Freepascal simply use High(<type>);
Title: Re: TFileStream file size limit
Post by: marcov on October 04, 2021, 04:53:00 pm
AFAIK Address space is limited to high(qword), at lot.., but single read/writes are limited to high(nativeint) which is indeed 2G

size_t does not need to cover the whole address space. If the kernel doesn't hand out such large consecutive blocks to applications, it is only a waste of bits.

Title: Re: TFileStream file size limit
Post by: Thaddy on October 04, 2021, 05:06:44 pm
true. btw I should have written High(NativeInt). And that on 32 bit the flag for windows is IMAGE_FILE_LARGE_ADDRESS_AWARE. I don't know if you defined those consts in windows.pas.
See MSDN or https://docwiki.embarcadero.com/RADStudio/Alexandria/en/PE_(portable_executable)_header_flags_(Delphi)

The value of the flag is $20
Title: Re: TFileStream file size limit
Post by: marcov on October 04, 2021, 05:08:44 pm
Image header structs and constants have been reworked one or two years ago.

Doesn't mean it will be perfect, but most should be there, including fairly recent stuff.
Title: Re: TFileStream file size limit
Post by: Thaddy on October 04, 2021, 05:15:11 pm
Yes, you are doing a good job there. Tnx
Title: Re: TFileStream file size limit
Post by: lagprogramming on October 04, 2021, 05:51:10 pm
   Targeting linux-x86_64 we have:
sizeof(nativeint)=sizeof(sizeuint)=sizeof(int64)=8
high(nativeint)=high(int64)
sizeof(longint)=sizeof(longword)=4

   In streams.inc we have:
procedure TStream.ReadBuffer(var Buffer: TBytes; Offset, Count: NativeInt);
procedure TStream.WriteBuffer(const Buffer: TBytes; Offset, Count: NativeInt);

   The purpose of having these two procedures declared in rtl knowing that it's guaranteed to have a failure if you pass a Count value greater than maxlongint remains a mistery to me.
   Thank you for your patience.
Title: Re: TFileStream file size limit
Post by: marcov on October 04, 2021, 05:57:22 pm
   In streams.inc we have:
procedure TStream.ReadBuffer(var Buffer: TBytes; Offset, Count: NativeInt);
procedure TStream.WriteBuffer(const Buffer: TBytes; Offset, Count: NativeInt);

I missed that, it is still longint in 3.2.x.  I don't know who changed that (michael?) Please file a bug for the trunk issue.
Title: Re: TFileStream file size limit
Post by: PascalDragon on October 05, 2021, 09:15:18 am
AFAIK Address space is limited to high(qword), at lot.., but single read/writes are limited to high(nativeint) which is indeed 2G

High(NativeInt) on a 64-bit system is equal to High(Int64).
Title: Re: TFileStream file size limit
Post by: Thaddy on October 05, 2021, 09:47:15 am
In de context of 32 bit, Sarah. That's why I suggested it to express address space.
Title: Re: TFileStream file size limit
Post by: PascalDragon on October 05, 2021, 01:32:03 pm
In de context of 32 bit, Sarah. That's why I suggested it to express address space.

You don't mention 32-bit here anywhere except for the PE flag (and the other messages before yours hadn't mentioned 32-bit either):

AFAIK Address space is limited to high(qword), at lot.., but single read/writes are limited to high(nativeint) which is indeed 2G
Note that in practise it limits to available memrory otherwise a EOutOfMemory is thrown.
Note that for 32 bit WIN there is a PE flag - since windows 7 - that extends available memory to 4G but still with the limitation ffor single read/writes
Title: Re: TFileStream file size limit
Post by: lagprogramming on October 17, 2021, 04:54:01 pm
   I've stopped using the file related classes like TFileStream and went to a lower level: FileOpen, FileCreate, FileClose, FileSeek and so on.
   The following function returns a negative number when passing a 3GB file. The problem is at FileSeek.

Code: Pascal  [Select][+][-]
  1. Function GetFileSizeUsingFileSeek(FilePath:string):Int64;
  2. var
  3.   FileH:THandle;
  4. begin
  5.   FileH:=FileOpen(FilePath,fmOpenRead);
  6.   Result:=FileSeek(FileH,0,fsFromEnd);//Also FileSeek returns a wrong result for files greater than 2GB. In linux-x86_64 it may return negative values.
  7.   FileClose(FileH);
  8. end;

   In order to avoid such problems I've decided to modify the file format that is used by the application and avoid file related classes by using low level routines, which means that for me the problem will be solved soon. To be frank, I've never expected such a problem. Probably I'm one of the few who uses files so large in size. In the future I expect other developers to have the same problem.
Title: Re: TFileStream file size limit
Post by: AlexTP on October 17, 2021, 05:04:10 pm
Quote
The following function returns a negative number when passing a 3GB file. The problem is at FileSeek.
Code: Pascal  [Select][+][-]
  1. Function FileSeek (Handle : THandle; FOffset, Origin: Longint) : Longint;
  2. Function FileSeek (Handle : THandle; FOffset: Int64; Origin: Longint) : Int64;
  3.  
your code calls 1st overload! so it returns nagative res.
try to call 2nd overload.

Result:=FileSeek(FileH,Int64(0),fsFromEnd);
Title: Re: TFileStream file size limit
Post by: Thaddy on October 17, 2021, 05:05:39 pm
The problem is merely the declaration as a signed type instead of unsigned.
Title: Re: TFileStream file size limit
Post by: SymbolicFrank on October 26, 2021, 11:30:21 pm
What is the speed increase of using the largest buffer the OS will give you over that 1 GB buffer? Is that measurable?

I mean, you need at least an SSD, which write in blocks the size of megabytes. Many random reads will overflow their cache memory, because the individual blocks are too small.

But that's just the thing SSDs excel at: a very large amount of random writes. Because there is no mechanical movement. No need to wait for the head to slowly travel to the right location.

For the fastest possible, sustained sequential write of a large file, the main thing is to kill all other applications that read or write files, or even use the PCI bus. Kill all other running programs.

And at that point, it doesn't matter how large your buffer is, because the tiny overhead is washed away by having all available bandwidth for yourself. Operating systems are quite good at optimizing the available resources.
Title: Re: TFileStream file size limit
Post by: PascalDragon on October 27, 2021, 09:33:08 am
What is the speed increase of using the largest buffer the OS will give you over that 1 GB buffer? Is that measurable?

Please note that this will depend on the hardware. If the device is connected by USB for example the transfer size is caped at 1 MiB or at most 2 MiB cause larger sizes tend not to be supported that well by hardware. ATA, SCSI and NVMe also have some limits. The OS will handle that transparently for you of course, but this might result in copying and thus decreased performance compared to manually splitting the transfers (of course one needs to know the best transfer size then...).
Title: Re: TFileStream file size limit
Post by: marcov on October 27, 2021, 10:12:13 am
What is the speed increase of using the largest buffer the OS will give you over that 1 GB buffer? Is that measurable?

I mean, you need at least an SSD, which write in blocks the size of megabytes. Many random reads will overflow their cache memory, because the individual blocks are too small.

Most SSD DRAM cache are in the single GIGAbyte order too, at best. And some of those many small writes might vacate the cache faster than one large big block (depending on the firmware)

post edited, one gigabyte as magnitude, not megabyte of course
Title: Re: TFileStream file size limit
Post by: SymbolicFrank on October 27, 2021, 11:13:58 am
So, the difference between one and many IOPS? Then again, the OS will try to use all free memory as disk cache as well. What is faster, a single, large buffer, managed by the application and only a small cache, managed by the OS, or the other way around? Does the OS limit the cache size for a single IOP?
Title: Re: TFileStream file size limit
Post by: marcov on October 27, 2021, 02:34:13 pm
So, the difference between one and many IOPS?

IOPS where? Between application and kernel (iow syscall), between kernel and device controller, or between

Quote
Then again, the OS will try to use all free memory as disk cache as well.

But if it must hold a part of a large buffer till the controller is ready for it, that is waste too, since the already written part could have been returned to the OS.

Quote
What is faster, a single, large buffer, managed by the application and only a small cache, managed by the OS, or the other way around? Does the OS limit the cache size for a single IOP?

For the last bit: seems win7 at least had a 2GB limit https://community.osr.com/discussion/193831/limitations-on-dma-transfer-size

In general, the rule of thumb is that reducing of number of IOPS only improves performance logarithmically, while extremely large buffers might hit all kind of small inefficiencies. Note also that many DMA controllers (even on embedded ARMs and MIPS like PIC32) have scatter/gather DMA, which allow to write multiple buffers in one go. (IOPS).

Also most IOPS benchmarks are for large (high core/socket count) transactional system with multiple threads doing independent (non related) I/O, and such benchmarks scale badly to synchonous (one thread) I/O systems because all time waiting while the device to complete a transaction is lost IOPS.
Title: Re: TFileStream file size limit
Post by: SymbolicFrank on October 27, 2021, 03:06:59 pm
In other words: do use a buffer and don't write each byte individually, but leave the rest to the OS and hardware?
Title: Re: TFileStream file size limit
Post by: marcov on October 27, 2021, 03:19:26 pm
In other words: do use a buffer and don't write each byte individually, but leave the rest to the OS and hardware?

Do use a buffer, but 32kb to 1mB is quite alright for most things. Maybe if you write a disk clone software higher buffers are more worthwhile, but even that will have decreasing returns while the buffer gets larger. I can't really imagine 1GB buffers adding much.
Title: Re: TFileStream file size limit
Post by: SymbolicFrank on October 27, 2021, 05:40:35 pm
Agreed.
Title: Re: TFileStream file size limit
Post by: PascalDragon on October 28, 2021, 09:21:29 am
Maybe if you write a disk clone software higher buffers are more worthwhile, but even that will have decreasing returns while the buffer gets larger.

Our main product at work is disk cloning software and we see a decrease already with 4 or 8 MiB buffers.
Title: Re: TFileStream file size limit
Post by: lagprogramming on January 22, 2022, 02:57:43 pm
   I expect this to be related to the subject.
   TMemoryStream.SaveToFile silently writes an empty file if the stream size is greater than the maximum value of a longint, at least in a linux-x86_64 environment.

Code: Pascal  [Select][+][-]
  1. procedure WriteGarbage(const FileAddr:string; const GarbageSize:longword);
  2. var s:TMemoryStream;
  3. begin
  4.   s:=TMemoryStream.Create;
  5.   s.SetSize(GarbageSize);
  6.   s.SaveToFile(FileAddr);
  7.   s.Free;
  8. end;
  9.  
  10. procedure TForm1.Button1Click(Sender: TObject);
  11. begin
  12.   writegarbage('garbagelongint.bin', high(longint));//Works OK!
  13.   writegarbage('garbagelongword.bin', high(longint)+1024*1024*1024);//FAILURE!!!
  14. end;
Title: Re: TFileStream file size limit
Post by: ASerge on January 22, 2022, 03:20:47 pm
I think this is a question of a separate topic. In Windows, a zero file of 3 GB is written.
Title: Re: TFileStream file size limit
Post by: lagprogramming on January 22, 2022, 05:27:30 pm
I think this is a question of a separate topic. In Windows, a zero file of 3 GB is written.

I've double checked. It's not a linux file system limitation. Development fpc sources(trunc) have been used, but it's an old problem.
If SaveToFile calls something like "WriteBuffer(Memory^,Size);" where Size is int64 on x86_64 targets then the rtl devs might have hit the same problem as I did, meaning that they might have used a routine limited to longints. In this topic I've suggested adding some overloaded functions for 64bit integers. Anyway, the forum moderators and the rtl developers should know about this situation. Maybe this situation is encountered only when targeting linux-x86_64, I don't know.
Title: Re: TFileStream file size limit
Post by: AlexTP on January 22, 2022, 05:29:52 pm
Reported to https://gitlab.com/freepascal.org/fpc/source/-/issues/39540
Title: Re: TFileStream file size limit
Post by: ASerge on January 22, 2022, 08:59:58 pm
If SaveToFile calls something like "WriteBuffer(Memory^,Size);" where Size is int64 on x86_64 targets then the rtl devs might have hit the same problem as I did
The WriteBuffer function has a limitation, but the SaveToFile function does not.
Code: Pascal  [Select][+][-]
  1. {$MODE OBJFPC}
  2.  
  3. uses Classes;
  4.  
  5. var
  6.   M: TMemoryStream;
  7. begin
  8.   M := TMemoryStream.Create;
  9.   try
  10.     M.Size := LongWord(3 * 1024 * 1024 * 1024); // Even as LongWord
  11.     M.SaveToFile('3Gb.bin');
  12.   finally
  13.     M.Free;
  14.   end;
  15. end.
File 3Gb.bin has a size of 3 GB. FPC 3.2.2 x64, Windows.
Title: Re: TFileStream file size limit
Post by: AlexTP on January 22, 2022, 09:42:01 pm
@ASerge
It's strange, because I have the same test code, and I got 0-sized file.
Code: Pascal  [Select][+][-]
  1. uses SysUtils, Classes;
  2.  
  3. procedure WriteGarbage(const fn: string; const aSize: Int64);
  4. var
  5.   s:TMemoryStream;
  6. begin
  7.   s:=TMemoryStream.Create;
  8.   s.SetSize(aSize);
  9.   s.SaveToFile(fn);
  10.   s.Free;
  11. end;
  12.  
  13. begin
  14.   WriteGarbage('3g.bin', Int64(3)*1024*1024*1024);
  15. end.
  16.  
Title: Re: TFileStream file size limit
Post by: lagprogramming on January 24, 2022, 07:25:45 pm
Can you take a look at the "Small safety precaution" (042eb7e8) commit.
https://gitlab.com/freepascal.org/fpc/source/-/commit/042eb7e8c196b6a0c8f6016af9985b8b3062703b (https://gitlab.com/freepascal.org/fpc/source/-/commit/042eb7e8c196b6a0c8f6016af9985b8b3062703b)
The first silent error that remains undetected if when more bytes than intended are read or written. This one is very nasty because if you write data backwards it may write over the data that has been previously written. The second silent error is when the Read/Write functions return errors(The r<=0 and w<=0 conditions).

Code: Pascal  [Select][+][-]
  1. procedure TStream.ReadBuffer(var Buffer; Count: NativeInt);
  2. var
  3.   r,t: NativeInt;
  4. begin
  5.   t:=0;
  6.   repeat
  7.     r:=Count-t;
  8.     if r>High(Longint) then r:=High(Longint);
  9.     r:=Read(PByte(@Buffer)[t],r);
  10.     inc(t,r);
  11.   until (t>=Count) or (r<=0);
  12.   if (t<Count) then
  13.     raise EReadError.Create(SReadError);
  14. end;
  15.  
  16. procedure TStream.WriteBuffer(const Buffer; Count: NativeInt);
  17. var
  18.   w,t: NativeInt;
  19. begin
  20.   t:=0;
  21.   repeat
  22.     w:=Count-t;
  23.     if w>High(Longint) then w:=High(Longint);
  24.     w:=Write(PByte(@Buffer)[t],w);
  25.     inc(t,w);
  26.   until (t>=count) or (w<=0);
  27.   if (t<Count) then
  28.     raise EWriteError.Create(SWriteError);
  29. end;
In the rtl documentation I have, regarding the Write and Read functions of TStreams and TStreams descendants, I can't find anything written about the situation when the TStream descendant reads/writes more than the Count parameter value. When looking at TStream.ReadBuffer and TStream.WriteBuffer I've found the following texts: "ReadBuffer reads Count bytes of the stream into Buffer. If the stream does not contain Count bytes, then an exception is raised." and "WriteBuffer writes Count bytes to the stream from Buffer. If the stream does not allow Count bytes to be written, then an exception is raised.". Does the quoted texts mean that ReadBuffer and WriteBuffer should raise exceptions if we try to read/write a less or equal to zero amount of bytes!?
Title: Re: TFileStream file size limit
Post by: ASerge on January 24, 2022, 07:29:06 pm
It's strange, because I have the same test code, and I got 0-sized file.
Your code also successfully creates a 3 GB file. Maybe there is no disk space, or you did not wait for the end of the program?
Title: Re: TFileStream file size limit
Post by: AlexTP on January 24, 2022, 07:45:21 pm
@ASerge, with the last FPC modification, it works good (I tested before this, with the Old Write() which failed on 3Gb size).
Title: Re: TFileStream file size limit
Post by: lagprogramming on January 25, 2022, 06:30:50 pm
@Alextp & @ASerge
I've updated the FPC development sources and now stream data can be saved in files larger than high(longint), at least on linux-x86_64. So, to me it appears like the problem has been solved.
Good job!
Title: Re: TFileStream file size limit
Post by: Thaddy on January 25, 2022, 06:57:25 pm
Hmm....
A stream is always looking forward (any look-backs are Aberration by theory). A windowed stream on top of a file has unlimited size (upto the ordinal for which it is declared). That means that overlapped files applied to a windowed stream are not limited by any size and can be as large as the disk size allows. Strange nobody mentioned that. It is just basic computer science, very basic. A windowed stream can be implemented as a descendant of TStream or TFileStream.

Just my 2 cents. And OS independent. I am sure this is already implemented somewhere by someone in Pascal.
TinyPortal © 2005-2018