* * *

Author Topic: Bug in TReadBufStream and proposed fix  (Read 3124 times)

wp

  • Hero Member
  • *****
  • Posts: 3956
Re: Bug in TReadBufStream and proposed fix
« Reply #15 on: August 02, 2017, 01:05:21 pm »
No, Blestan, the buffering by the OS is very ineffective (at least on Windows). This is the output of the following test program on Win10:

Quote
Creating test file with 1000000 bytes... Done.
Reading byte by byte using a TMemorystream: 0.015 sec
Reading byte by byte using a TFileStream: 2.157 sec
Reading byte by byte using a TReadBufStream: 0.031 sec

Code: Pascal  [Select]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   Classes, Bufstream, SysUtils;
  7.  
  8. var
  9.   stream: TStream;
  10.   bstream: TBufStream;
  11.   data: array of byte;
  12.   t: TDateTime;
  13.   i: Integer;
  14.  
  15. const
  16.   FILENAME = 'testdata.dat';
  17.   N = 1000000;
  18.  
  19. begin
  20.   Write('Creating test file with ', N, ' bytes...');
  21.   stream := TFileStream.Create(FILENAME, fmCreate);
  22.   try
  23.     setLength(data, N);  // 100 MB test file
  24.     Stream.WriteBuffer(data[0], Length(data));
  25.   finally
  26.     stream.Free;
  27.   end;
  28.   WriteLn(' Done.');
  29.  
  30.   stream := TMemoryStream.Create;
  31.   try
  32.     Write('Reading byte by byte using a TMemorystream: ');
  33.     t := Now;
  34.     TMemorystream(stream).LoadFromFile(FILENAME);
  35.     for i := 0 to stream.Size-1 do
  36.       stream.ReadByte;
  37.     t := Now - t;
  38.     writeLn(FormatDateTime('s.zzz" sec"', t));
  39.   finally
  40.     stream.Free;
  41.   end;
  42.  
  43.   stream := TFileStream.Create(FILENAME, fmOpenRead);
  44.   try
  45.     Write('Reading byte by byte using a TFileStream: ');
  46.     t := Now;
  47.     for i := 0 to stream.Size-1 do
  48.       stream.ReadByte;
  49.     t := Now - t;
  50.     WriteLn(FormatDateTime('s.zzz" sec"', t));
  51.  
  52.     stream.Position := 0;
  53.     bstream := TReadBufStream.Create(stream, 1024);
  54.     try
  55.       Write('Reading byte by byte using a TReadBufStream: ');
  56.       t := Now;
  57.       for i := 0 to stream.Size-1 do
  58.         bstream.ReadByte;
  59.       t := Now - t;
  60.       WriteLn(FormatDateTime('s.zzz" sec"', t));
  61.     finally
  62.       bstream.Free;
  63.     end;
  64.   finally
  65.     stream.Free;
  66.     DeleteFile(FILENAME);
  67.   end;
  68. end.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

howardpc

  • Hero Member
  • *****
  • Posts: 2422
Re: Bug in TReadBufStream and proposed fix
« Reply #16 on: August 02, 2017, 01:16:54 pm »
Linux file buffering is noticeably better in this particular test.
Results for Mint 18.0

Quote
Creating test file with 1000000 bytes... Done.
Reading byte by byte using a TMemorystream: 0.017 sec
Reading byte by byte using a TFileStream: 0.177 sec
Reading byte by byte using a TReadBufStream: 0.013 sec

Thaddy

  • Hero Member
  • *****
  • Posts: 4779
Re: Bug in TReadBufStream and proposed fix
« Reply #17 on: August 02, 2017, 01:21:53 pm »
@wp
@howardpc

That's the reason there is a TBufStream. Also note, old hands, that buffers are only efficient when aligned and preferably either file system default allocation size or a multitude of that. An arbitrary buffer size can lead to performance loss instead of performance gain.
In the olden days I kept thinking 512,412,no 512, but I learned to think 4096 already (omg) or worse... 8-)
« Last Edit: August 02, 2017, 01:25:48 pm by Thaddy »
"Logically, no number of positive outcomes at the level of experimental testing can confirm a scientific theory, but a single counterexample is logically decisive."

Blestan

  • Sr. Member
  • ****
  • Posts: 456
Re: Bug in TReadBufStream and proposed fix
« Reply #18 on: August 02, 2017, 01:49:49 pm »
the memorystrem example is not suitable for this case because it's basicly only one big read from the disk and the just some move operations
you sould aleast divide the timing s 2 one after the load from file and one after the byte per byte operations.
btw a better approach in this case is to use memory mapped  io ... to map the file and then just doing pointer operations
Speak postscript or die!
Translate to pdf and live!

wp

  • Hero Member
  • *****
  • Posts: 3956
Re: Bug in TReadBufStream and proposed fix
« Reply #19 on: August 02, 2017, 02:00:20 pm »
Memory stream is mentioned for no reason - forget it.

I don't understand the rest of the message (why don't you read it again after sending it?). If you mean the time to open the filestream - this is zero.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

glorfin

  • New member
  • *
  • Posts: 22
Re: Bug in TReadBufStream and proposed fix
« Reply #20 on: August 02, 2017, 02:01:54 pm »
@marcov: msnd :"  File buffering is usually handled by the system behind the scenes and is considered part of file caching within the Windows operating system unless otherwise specified. "
Nevertheless, use of TReadBufStream for buffering of file read sped up file reading 10 times in my case (from 34 seconds to 3.5 seconds). So, it is definitely worthwhile.


Blestan

  • Sr. Member
  • ****
  • Posts: 456
Re: Bug in TReadBufStream and proposed fix
« Reply #21 on: August 02, 2017, 02:15:19 pm »
@wp:  TMemorystream(stream).LoadFromFile(FILENAME);
does the actual reading from disk and costs the most of the time ...
readbyte calls are just move() operations
Speak postscript or die!
Translate to pdf and live!

wp

  • Hero Member
  • *****
  • Posts: 3956
Re: Bug in TReadBufStream and proposed fix
« Reply #22 on: August 02, 2017, 02:34:30 pm »
Yes, certainly. As I said, forget the MemoryStream data.

The main statement of the speed comparison test that I wanted to make is that TBufStream is 1-2 orders of magnitude faster than "out-of-the-box" TFileStream when reading small chunks of data (on Windows). Of course, we don't know how slow TFileStream would be if the OS would not buffer file access. But even then, TBufStream (or TMemoryStream if the file is not too large) is worth considering.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

Blestan

  • Sr. Member
  • ****
  • Posts: 456
Re: Bug in TReadBufStream and proposed fix
« Reply #23 on: August 02, 2017, 02:37:51 pm »
and ofcourse if the file is <2GB memory mapping will be the fastest access because is at the speed of the kernel io operations
Speak postscript or die!
Translate to pdf and live!

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 5875
Re: Bug in TReadBufStream and proposed fix
« Reply #24 on: August 02, 2017, 04:31:49 pm »
Which means they implement a stream filter. The rest of your explanation is as always correct. Files are seekable, streams are not.
It is just that somebody decided not to document it as such and introduced a misnominer that is hard to get rid of.
In this case a TBufStream is a stream filter, not the actual stream.

TFileStream is already seekable.

ASerge

  • Sr. Member
  • ****
  • Posts: 472
Re: Bug in TReadBufStream and proposed fix
« Reply #25 on: August 03, 2017, 03:17:37 am »
and ofcourse if the file is <2GB memory mapping will be the fastest access because is at the speed of the kernel io operations
Yes. This test shows that BufStream is not so good, but the mem map is the fastest (Window 7, x64)
Code: Pascal  [Select]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$APPTYPE CONSOLE}
  5.  
  6. uses
  7.   Windows, Classes, Bufstream, SysUtils;
  8.  
  9. type
  10.   TFileMapRead = class(TCustomMemoryStream)
  11.   private
  12.     FMapHandle: THandle;
  13.     procedure CloseMapping;
  14.   public
  15.     constructor Create(const FileName: string);
  16.     destructor Destroy; override;
  17.   end;
  18.  
  19.  
  20. procedure DoReadByByte(Stream: TStream);
  21. var
  22.   i: Integer;
  23. begin
  24.   for i := 0 to Stream.Size - 1 do
  25.     Stream.ReadByte;
  26. end;
  27.  
  28. procedure WriteTimeDiff(From: TDateTime);
  29. begin
  30.   WriteLn(FormatDateTime('s.zzz" sec"', Now - From));
  31. end;
  32.  
  33. var
  34.   Stream: TStream;
  35.   BStream: TBufStream;
  36.   Data: array of byte;
  37.   t: TDateTime;
  38.  
  39. const
  40.   FILENAME = 'testdata.dat';
  41.   N = 1000000;
  42.  
  43. constructor TFileMapRead.Create(const FileName: string);
  44. var
  45.   FileSize: Integer;
  46.   HFile: THandle;
  47.   P: Pointer;
  48. begin
  49.   inherited Create;
  50.   HFile := CreateFile(PChar(FileName), GENERIC_READ,
  51.     FILE_SHARE_READ, nil, OPEN_EXISTING,
  52.     FILE_ATTRIBUTE_NORMAL or FILE_FLAG_SEQUENTIAL_SCAN, 0);
  53.   if HFile = INVALID_HANDLE_VALUE then
  54.     RaiseLastOSError;
  55.   FileSize := GetFileSize(HFile, nil);
  56.   if FileSize > 0 then
  57.   begin
  58.     FMapHandle := CreateFileMapping(HFile, nil, PAGE_READONLY, 0, 0, nil);
  59.     CloseHandle(HFile);
  60.     if FMapHandle = 0 then
  61.       RaiseLastOSError;
  62.     P := MapViewOfFile(FMapHandle, FILE_MAP_READ, 0, 0, 0);
  63.     if P = nil then
  64.       RaiseLastOSError;
  65.     SetPointer(P, FileSize);
  66.   end
  67.   else
  68.   begin
  69.     CloseHandle(HFile);
  70.     SetPointer(nil, 0);
  71.   end;
  72. end;
  73.  
  74.  
  75. procedure TFileMapRead.CloseMapping;
  76. begin
  77.   if Memory <> nil then
  78.   begin
  79.     UnmapViewOfFile(Memory);
  80.     SetPointer(nil, 0);
  81.   end;
  82.   if FMapHandle <> 0 then
  83.   begin
  84.     CloseHandle(FMapHandle);
  85.     FMapHandle := 0;
  86.   end;
  87. end;
  88.  
  89. destructor TFileMapRead.Destroy;
  90. begin
  91.   CloseMapping;
  92.   inherited;
  93. end;
  94.  
  95. begin
  96.   Write('Creating test file with ', N, ' bytes...');
  97.   Stream := TFileStream.Create(FILENAME, fmCreate);
  98.   try
  99.     SetLength(Data, N);  // 100 MB test file
  100.     Stream.WriteBuffer(Data[0], Length(Data));
  101.   finally
  102.     Stream.Free;
  103.   end;
  104.   WriteLn(' Done.');
  105.   Stream := TMemoryStream.Create;
  106.   try
  107.     Write('Reading byte by byte using a TMemorystream: ');
  108.     t := Now;
  109.     TMemorystream(Stream).LoadFromFile(FILENAME);
  110.     DoReadByByte(Stream);
  111.     WriteTimeDiff(t);
  112.   finally
  113.     Stream.Free;
  114.   end;
  115.   Stream := TFileStream.Create(FILENAME, fmOpenRead);
  116.   try
  117.     Write('Reading byte by byte using a TFileStream: ');
  118.     t := Now;
  119.     DoReadByByte(Stream);
  120.     WriteTimeDiff(t);
  121.     Stream.Position := 0;
  122.     BStream := TReadBufStream.Create(Stream, 1024);
  123.     try
  124.       Write('Reading byte by byte using a TReadBufStream: ');
  125.       t := Now;
  126.       DoReadByByte(Stream);
  127.       WriteTimeDiff(t);
  128.     finally
  129.       BStream.Free;
  130.     end;
  131.   finally
  132.     Stream.Free;
  133.   end;
  134.   Stream := TFileMapRead.Create(FILENAME);
  135.   try
  136.     Write('Reading byte by byte using a TFileMapRead: ');
  137.     t := Now;
  138.     DoReadByByte(Stream);
  139.     WriteTimeDiff(t);
  140.   finally
  141.     Stream.Free;
  142.   end;
  143.   DeleteFile(FILENAME);
  144.   Readln;
  145. end.
Quote
Creating test file with 1000000 bytes... Done.
Reading byte by byte using a TMemorystream: 0.016 sec
Reading byte by byte using a TFileStream: 0.956 sec
Reading byte by byte using a TReadBufStream: 0.944 sec
Reading byte by byte using a TFileMapRead: 0.015 sec

glorfin

  • New member
  • *
  • Posts: 22
Re: Bug in TReadBufStream and proposed fix
« Reply #26 on: August 03, 2017, 04:26:28 am »
No, it does not. Because when you say that you read with the help of TBufStream, you actually read with TFilestream. Here is part of your code:
Code: Pascal  [Select]
  1.    
  2. BStream := TReadBufStream.Create(Stream, 1024);
  3.     try
  4.       Write('Reading byte by byte using a TReadBufStream: ');
  5.       t := Now;
  6.       DoReadByByte(Stream);  // <------------------- Here must be DoReadByByte(BStream)
  7.       WriteTimeDiff(t);
  8.     finally
  9.       BStream.Free;
  10.     end;
  11.  

SymbolicFrank

  • Sr. Member
  • ****
  • Posts: 319
Re: Bug in TReadBufStream and proposed fix
« Reply #27 on: August 03, 2017, 09:05:38 am »
I agree with Blestran, just memory-map it. Especially because most CPUs today are 64-bit.

Streams are so last century.

Thaddy

  • Hero Member
  • *****
  • Posts: 4779
Re: Bug in TReadBufStream and proposed fix
« Reply #28 on: August 03, 2017, 10:04:24 am »
I agree with Blestran, just memory-map it. Especially because most CPUs today are 64-bit.

Streams are so last century.
That is simply not true. Streams are core to modern computing. It is the thorough misunderstanding (stream vs stream filter) of what a proper stream is that also leads to a thorough misunderstanding of its application.
And this misunderstanding only seems the case amongst very average Pascal programmers.... >:D E.g.: A Pipe is also a stream and not seekable. An UDP or TCP transfer is a stream and not seekable. A stream of water from your tap is a stream and not seekable. A river is a stream and not seekable. You can collect an amount of it and release it back later at the correct spot if you employ a boey... That's a filter, in this case a buffered stream... Get it?

Streams are 1) stateless, 2) unidirectional 3) sequential and 4) not seekable. They can be made seekable by applying a filter: and that's a different beast.

The problem is that somebody in the pascal community introduced the misnomer. Which leads to expectations about a stream that can only be achieved by a stream filter. Which can manifest itself, but not necessarily -  as a buffer. A buffer is where you perform the seeks on, not the stream. A buffer is stateful. Note memory mapping is a buffer manifestation..... BASICS.
« Last Edit: August 03, 2017, 10:34:21 am by Thaddy »
"Logically, no number of positive outcomes at the level of experimental testing can confirm a scientific theory, but a single counterexample is logically decisive."

SymbolicFrank

  • Sr. Member
  • ****
  • Posts: 319
Re: Bug in TReadBufStream and proposed fix
« Reply #29 on: August 03, 2017, 11:00:36 am »
Stream buffers reside in main memory, so you would want to keep them as small as possible if they have to fit into cache memory (L1 cache for an i7 is 32kB).

Streams would be a good model if you skip the cache memory and only use DMA (like a Cell SPE CPU). In that case, it works as intended.

In most other cases, you might want to use paging to move data around between processes/threads, but that doesn't work on Windows (the OS won't let you do it).

For sockets, I would simply limit the max size of a message and read/write it in one go, bypassing the need for smaller buffers.

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus