Recent

Author Topic: Slow speed reading a binary file  (Read 2492 times)

DMStevenson

  • Newbie
  • Posts: 5
Slow speed reading a binary file
« on: June 24, 2024, 08:00:45 pm »
I hope this is the right area to ask this. I have noticed that reading binary files is very slow compared to other languages, at least using older pascal verbage. As an example this:

while not EOF(subject_file) do
    begin
      read(subject_file, b);
      TotalCount := TotalCount + 1;
    end;

is more than 10x slower than equivalent C language. I have wondered about this for years, and finally decided to ask for some input. Does anyone have an ideas about this?

Thaddy

  • Hero Member
  • *****
  • Posts: 15217
  • Censorship about opinions does not belong here.
Re: Slow speed reading a binary file
« Reply #1 on: June 24, 2024, 08:48:58 pm »
if you read in {$I-} state the code becomes much faster. That is because C does not check IO errors by default and Pascal does.
But it is also a case of buffering.
With equal buffers and equal IO checking the code speed should be neglicable.
Of course the national anthem of the U.S.A. was written by Jimi Hendrix, didn't you know that?

Thaddy

  • Hero Member
  • *****
  • Posts: 15217
  • Censorship about opinions does not belong here.
Re: Slow speed reading a binary file
« Reply #2 on: June 24, 2024, 09:26:14 pm »
Also note in Lazarus, debugging is on by default and Fpc has rather conservative optimization settings. Both affect code speed, but there should be no difference between C and Freepascal when properly compiled.
Of course the national anthem of the U.S.A. was written by Jimi Hendrix, didn't you know that?

DMStevenson

  • Newbie
  • Posts: 5
Re: Slow speed reading a binary file
« Reply #3 on: June 24, 2024, 10:39:06 pm »
Thanks Thaddy - Perhaps I am not compiling properly, still having trouble. Takes about 54 seconds to read a 13 GB file, but in C this is only about 5-10 seconds. I am sure there is a simple way around this that I am simply missing...

jamie

  • Hero Member
  • *****
  • Posts: 6394
Re: Slow speed reading a binary file
« Reply #4 on: June 24, 2024, 11:33:31 pm »
Looks like you are using Text style files? if so, I believe you can change the buffer size of the incoming text handler to point to a very load chunk of memory where it then can be read in fast.

Lookup TextRec type in the help.

 You can also use OS level reads yourself too via the API's etc.

The only true wisdom is knowing you know nothing

TRon

  • Hero Member
  • *****
  • Posts: 2922
Re: Slow speed reading a binary file
« Reply #5 on: June 25, 2024, 02:52:15 am »
Looks like you are using Text style files?
Nah, TS is referring to binary.

Quote
if so, I believe you can change the buffer size of the incoming text handler to point to a very load chunk of memory where it then can be read in fast.

Lookup TextRec type in the help.
For text files there is SetTextBuf

Quote
You can also use OS level reads yourself too via the API's etc.
Using plain old file style access for 13 GB files byte by byte is imho just plain stupid but since TS seem to persist in comparing apples to 'something'....  :)

@TS: FileSize from system or FileSize from fileutil (Lazarus) can be used to provide an instant answer of what totalcount is suppose to be.

Thaddy

  • Hero Member
  • *****
  • Posts: 15217
  • Censorship about opinions does not belong here.
Re: Slow speed reading a binary file
« Reply #6 on: June 25, 2024, 06:50:32 am »
If you read byte for byte instead of in blocks and you are compiling in {$I+} state, every byte you read cause an IO check call! You *must* use {$I-} state for speed. That is the same as C defaults to.
If sysutils is included it gets worse, because then the IO checking is wrapped in exceptions and that makes the code even slower. If you want to compare with plain C all these things need to be considered. Once you level the playing field, there is really hardly speed difference. Make sure the buffer size in FPC and Pascal are the same.
C is usually very bare bone code and that can also be done in FPC.
This:
Code: C  [Select][+][-]
  1. #include <stdio.h>
  2. int main() {
  3.     FILE *file;
  4.     char ch; // char is byte!!
  5.     file = fopen("filename.txt", "r"); // file must exist
  6.     ch = fgetc(file);// checks eof
  7.     while (ch != EOF) {
  8.         ch = fgetc(file);
  9.     }
  10.     fclose(file);
  11.     return 0;
  12. }
Has the same speed as this:
Code: Pascal  [Select][+][-]
  1. program inpascal;
  2. {$I-}
  3. var
  4.   f:file of byte;
  5.   b:byte;
  6. begin
  7.   Assign(f,'filename.txt');// file must exist
  8.   Reset(f);
  9.   Read(f,b);// checks eof
  10.   While not eof(f) do
  11.      read(f,b);
  12.   close(f);
  13. end.
Even the generated assembler is similar... GNU C and FPC trunk on win64. -O1
« Last Edit: June 25, 2024, 07:05:30 am by Thaddy »
Of course the national anthem of the U.S.A. was written by Jimi Hendrix, didn't you know that?

Jorg3000

  • Jr. Member
  • **
  • Posts: 66
Re: Slow speed reading a binary file
« Reply #7 on: June 25, 2024, 07:05:07 am »
Hi!
To read in a binary file quickly, you can work with a buffer yourself, in this case 32 kb at a time per read access.

Code: Pascal  [Select][+][-]
  1. function LoadBinFile(const FileName: String): Boolean;
  2. const
  3.   BufferSize = 32768;  // 32 KB
  4. var
  5.   FileHandle: SysUtils.THandle;
  6.   Buffer: array[0..BufferSize-1] of Byte;
  7.   BytesRead, i: Integer;
  8. begin
  9.   Result:=true;
  10.   FileHandle := SysUtils.FileOpen(FileName, fmOpenRead);  // fmOpenRead or fmShareDenyWrite
  11.   if FileHandle = feInvalidHandle then Exit(false);
  12.  
  13.   try
  14.     repeat
  15.       BytesRead := SysUtils.FileRead(FileHandle, Buffer, BufferSize);
  16.       if BytesRead=-1 then begin Result:=false; Break; end;
  17.  
  18.       for i := 0 to BytesRead-1 do
  19.       begin
  20.         // *** do something with Buffer[i]
  21.       end;
  22.  
  23.     until BytesRead < BufferSize;
  24.   finally
  25.     SysUtils.FileClose(FileHandle);
  26.   end;
  27. end;
  28.  
« Last Edit: June 25, 2024, 07:11:29 am by Jorg3000 »

Thaddy

  • Hero Member
  • *****
  • Posts: 15217
  • Censorship about opinions does not belong here.
Re: Slow speed reading a binary file
« Reply #8 on: June 25, 2024, 07:10:14 am »
You do not need sysutils, you can simply do blockreads. But he wondered why C is faster and it isn't. Using a fair comparison as above.
Of course the national anthem of the U.S.A. was written by Jimi Hendrix, didn't you know that?

TRon

  • Hero Member
  • *****
  • Posts: 2922
Re: Slow speed reading a binary file
« Reply #9 on: June 25, 2024, 07:15:28 am »
@Thaddy:
Initially TS stated "compared to other languages" and later specified it with c as an example or to sum it up "hot air"  :)

Thaddy

  • Hero Member
  • *****
  • Posts: 15217
  • Censorship about opinions does not belong here.
Re: Slow speed reading a binary file
« Reply #10 on: June 25, 2024, 07:33:30 am »
The point I actually wanted to make is that when you analyze the underlying techniques that a C compiler uses and you apply the same underlying techniques to Pascal the resulting code is similar. At the cost of giving up the advantages of inheritently safe language features.
Of course the national anthem of the U.S.A. was written by Jimi Hendrix, didn't you know that?

TRon

  • Hero Member
  • *****
  • Posts: 2922
Re: Slow speed reading a binary file
« Reply #11 on: June 25, 2024, 07:37:55 am »
Yes, I got that Thaddy.

My point being that TS did not show any example code in "other languages" or his specified c-example. Thus for now all we can consider is that we are comparing apples with "hot air"  :) (except for your comparison example ofc)
« Last Edit: June 25, 2024, 07:45:50 am by TRon »

loaded

  • Hero Member
  • *****
  • Posts: 846
Re: Slow speed reading a binary file
« Reply #12 on: June 25, 2024, 08:50:56 am »
The fastest method for reading binary files is TReadBufStream.
If you use it, you will find that it is faster than many other languages.
Check out  loaded on Strava
https://www.strava.com/athletes/109391137

Thaddy

  • Hero Member
  • *****
  • Posts: 15217
  • Censorship about opinions does not belong here.
Re: Slow speed reading a binary file
« Reply #13 on: June 25, 2024, 12:07:58 pm »
Class based approaches are usually slower than solving the same task in a procedural way.
Of course the national anthem of the U.S.A. was written by Jimi Hendrix, didn't you know that?

DMStevenson

  • Newbie
  • Posts: 5
Re: Slow speed reading a binary file
« Reply #14 on: June 25, 2024, 05:45:50 pm »
Hi - This is all very interesting. Just for fun I ran Thaddy's code (reply #6) with a counter added (just to give me some return). I used a 7.5MB text file. As before, the pascal version took about 32 seconds, while the c version was almost instantaneous (~5 sec). I am comparing older-style pascal on purpose, as I am simply interested as to why it is slower, as I see no reason why it should be (as stated by others)...

 

TinyPortal © 2005-2018