Recent

Author Topic: [SOLVED] Qword to BlockRead: Unexplained, Reversed, or Garbled Output  (Read 1898 times)

pazkal

  • Newbie
  • Posts: 2
I am trying to get BlockRead to accurately read file bytes into Dwords and Qwords. This is for a hash algorithm. I noticed that with some test files parts of the integer output did not match the values in the files, some Qword outputs were reversed, and some just garbled, and some Qword values appeared to be in the wrong order. When I reconstructed them in order on some tests I would get the original file in mixed or garbled order.

How do I remedy this? I need to perfectly read blocks of bytes into arrays of DWORDS and QWORDS. To me this seems the fastest way to rip through a file and feed the values to some exor and shift operations that operate on QWORDS instead of single bytes.

I'm not sure if I'm asking the right questions. I may need to edit the question as I get feedback ...

Here is the test code:

Code: Pascal  [Select][+][-]
  1. PROGRAM read_qwords;
  2.  
  3. // puzzled ... why are some QWORDS reversed or garbled on output?
  4.  
  5. VAR
  6.     FileInput   : File ;
  7.     QwordBytes  : array[0..256] of QWORD ;
  8.     BytesRead   : Int64 ;
  9.     i           : Int64 ;
  10.  
  11. BEGIN
  12.  
  13. Assign (FileInput, Paramstr(1)) ;
  14. Reset (FileInput, 8) ;
  15.  
  16. while not EOF (FileInput) do
  17.     begin
  18.         BlockRead (FileInput, QwordBytes, SizeOf(QwordBytes), BytesRead) ;
  19.         for i := 0 to BytesRead do
  20.             begin
  21.                 write(QwordBytes[i]) ;
  22.                 write(' ') ;
  23.             end ;
  24.     end;
  25.  
  26. close(FileInput) ;
  27.  
  28. END.
  29.  

Compilation and test setup:

Code: Bash  [Select][+][-]
  1. $ fpc -O4 read.qwords.pas
  2.  
  3. $ echo "Pascal is pretty neat. The proof of the pudding is under the meat." > junk.txt
  4.  
  5. $ ./read.qwords junk.txt
  6.  

Integer output in Gnome terminal:

Code: Bash  [Select][+][-]
  1. 7575173738773307728 8751747954948186227 6061896175825808928 7381240851581658472 2334386829830549280 2334956330749818224 8243105118545802089 7018135580234511392 667252

Converted output (to hex) then decoded:

Code: Bash  [Select][+][-]
  1. i lacsaP
  2. ytterp s
  3. T .taen
  4. foorp eh
  5.  eht fo
  6.  gniddup
  7. rednu si
  8. aem eht
  9. ��@

I am uncertain as to why everything is reversed and slightly garbled. What am I missing? Is something in my test code reversing the bits, or is there some feature (like endian, signing) I need to account for?

Be gentle, this is my first rodeo with Pascal. I'll report you to Guido if you are too unkind ;)
« Last Edit: December 07, 2019, 11:57:41 am by pazkal »

Thaddy

  • Hero Member
  • *****
  • Posts: 14198
  • Probably until I exterminate Putin.
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #1 on: December 07, 2019, 10:57:12 am »
Simple. For storage and i/o the array needs to be packed.
Code: Pascal  [Select][+][-]
  1. QwordBytes  : packed array[0..256] of QWORD ;
Reason: memory alignment can differ from storage alignment.

Also note: 0..255 has 256 elements, why do you use 0..256? which has 257 elements.....
« Last Edit: December 07, 2019, 11:01:32 am by Thaddy »
Specialize a type, not a var.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #2 on: December 07, 2019, 11:10:54 am »
Yes, it is indeed the endianess that is biting you here. Take a look at the first QWord:

7575173738773307728

In hex this is:

69 20 6C 61 63 73 61 50‬

That is ASCII for

i lacsaP

The file originally contained this:

50 61 73 63 61 6C 20 69

Which is the little endian representation of that. Hash algorithms usually work on Byte streams however. So either read the file Byte by Byte or convert the endianess using LEtoN[url=http://or [url=https://www.freepascal.org/docs-html/rtl/system/swapendian.html]SwapEndian] or [url=https://www.freepascal.org/docs-html/rtl/system/swapendian.html]SwapEndian (I'd need to test which one is the correct one so that it would also work on a Big Endian system)

Also there is another small mistake in your example source:
Code: Pascal  [Select][+][-]
  1. while not EOF (FileInput) do
  2.     begin
  3.         BlockRead (FileInput, QwordBytes, SizeOf(QwordBytes), BytesRead) ;
  4.         for i := 0 to BytesRead do
  5.             begin
  6.                 write(QwordBytes[i]) ;
  7.                 write(' ') ;
  8.             end ;
  9.     end;
  10.  
Aside from the variable BytesRead better being named BlocksRead your for-loop extends one element too far. It should be BytesRead - 1 instead.

Simple. For storage and i/o the array needs to be packed.
Code: Pascal  [Select][+][-]
  1. QwordBytes  : packed array[0..256] of QWORD ;
Reason: memory alignment can differ from storage alignment.

FPC ignores the packed modifier for arrays except if {$BITPACKING ON} is set, then it behaves the same as if the array had been declared as bitpacked. There is never any padding/alignment down between array elements.

wp

  • Hero Member
  • *****
  • Posts: 11854
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #3 on: December 07, 2019, 11:12:46 am »
You must understand what a zero-based array means. Suppose you have an array with four elements. If you set the index of the first element to 0 then the array elements can be accessed be the indexes 0, 1, 2, 3 -- note: the last index is 1 less than the count of elements!

So, looking at your variable declarations, I see "QwordBytes  : array[0..256] of QWORD ". Index 0 up to 256 means that there are 257 elements -- I guess this is not what you want, you probably want 256 elements, then the index range must go up to 255 only.

But I think this is not the problem in your code.

Further down your code there is, in the while loop, "for i := 0 to BytesRead do". The same issue again: you read n (=BytesRead) bytes, then i can only run from 0 to n-1, not to n.

Even more important, now not related with the zero-based arrays: BytesRead is the number of bytes read from the file. You read them into an array of QWord numbers. In the interior of the for loop you access the individual elements of the array, but the index runs over the array bytes. This is a factor 8 too much! Divide the BytesRead by 8 (or more general: SizeOf(QWord)) to obtain the number of QWords read from the file:

There is a final issue: What if the size of the file is not a multiple of SizeOf(QWordbytes)? Then the last chunk of data read will fill the QwordBytes array only partially. Since there are still the data from the previous read cylcle they may appear in the data of the last cycle. Therefore, erase the QwordBytes before BlockRead.

Code: Pascal  [Select][+][-]
  1. var
  2.   NumQWords: Integer;
  3. ....
  4.   while not EOF (FileInput) do
  5.   begin
  6.     FillChar(QwordBytes, 0, SizeOf(QWordBytes));   // <-- erase QwordBytes
  7.     BlockRead (FileInput, QwordBytes, SizeOf(QwordBytes), BytesRead) ;
  8.     NumQWords := BytesRead div SizeOf(QWord);   // <-- calculates the number of QwordBytes elements read
  9.     if BytesRead mod SizeOf(QWord) <> 0 then inc(NumQWords);  // <-- catch case that BytesRead is not divisible by 8
  10.     for i := 0 to NumQWords-1 do
  11.     begin
  12.       write(QwordBytes[i]) ;
  13.       write(' ') ;
  14.     end ;
  15.   end;
« Last Edit: December 07, 2019, 12:06:05 pm by wp »

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #4 on: December 07, 2019, 11:36:33 am »
Even more important, now not related with the zero-based arrays: BytesRead is the number of bytes read from the file. You read them into an array of QWord numbers. In the interior of the for loop you access the individual elements of the array, but the index runs over the array bytes. This is a factor 4 too much! Divide the BytesRead by 4 (or more general: SizeOf(QWord)) to obtain the number of QWords read from the file:
He opened the file with Reset(xxx, 8 ), thus the record size is set to 8, thus the count parameter of BlockRead returns the number of read 8-Byte blocks. Though your point with the not completed last element if the file is not a multiple of SizeOf(QWord) does indeed apply.

So, pazkal, you should read the file as a byte stream. That's safer and you won't need to deal with endianess. Alternatively look up TFileStream and stream based reading.

wp

  • Hero Member
  • *****
  • Posts: 11854
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #5 on: December 07, 2019, 11:47:48 am »
So, pazkal, you should read the file as a byte stream.
Right - I did not see that

pazkal

  • Newbie
  • Posts: 2
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #6 on: December 07, 2019, 11:51:17 am »
So, pazkal, you should read the file as a byte stream. That's safer and you won't need to deal with endianess. Alternatively look up TFileStream and stream based reading.

Thanks for pointing out the errors. It is solidifying and making sense. As a Python refugee it is taking a bit of re-wiring the synapses to grok how Pascal approaches the machine.

In this experiment speed is concern #1 above all other considerations. From a Python way of looking at it, operating on larger integer values is much cheaper, especially if pumping them from a large file into a hash machine. If I fed a file one byte at a time into the Python version of algorithm, any file over a few megs would take a long, long time to digest out. So that spawns a new question: which is faster for sustained read operations on a huge file--TFileStream or BlockRead?

I marked this thread as solved since the errors and concerns in my code have been explained to me quite well. However if you want to throw down some more options for me to study please don't hesitate.
« Last Edit: December 07, 2019, 12:00:00 pm by pazkal »

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #7 on: December 07, 2019, 12:52:30 pm »
[...] which is faster for sustained read operations on a huge file--TFileStream or BlockRead?

It depends a little on the kind of reading you're doing but both methods are roughly the same, speed-wise. Using a TFileStream incurs a little overhead due to it being an object but it's usually negligible; it's just the price one pays for easy of use (and very cheap at that).
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5446
  • Compiler Developer
Re: Qword to BlockRead: Unexplained, Reversed, or Garbled Output and Questions
« Reply #8 on: December 08, 2019, 11:44:08 am »
In this experiment speed is concern #1 above all other considerations.

The #1 concern should be "it works correctly", shouldn't it ;)

There is a saying among developers:
Quote
Make it work,
make it right,
make it fast

Also known as: premature optimization is the source of all evil.

First implement it in a way that works and only then check whether there are any performance problems at all. It could very well be that due to FPC's binary nature and no interpreter it could already meet your requirements.

If I fed a file one byte at a time into the Python version of algorithm, any file over a few megs would take a long, long time to digest out.

You should simply use bigger blocks then. And that is were TFileStream is better suitable than the old Pascal I/O:
Code: Pascal  [Select][+][-]
  1. const
  2.   BufferSize = $100000; // 1 MB is a nice size
  3. var
  4.   buf: array of Byte;
  5.   fs: TFileStream;
  6.   r: LongInt;
  7. begin
  8.   fs := TFileStream.Create(PathToFile, fmOpenRead);
  9.   try
  10.     SetLength(buf, BufferSize);
  11.     repeat
  12.       r := fs.Read(buf[0], BufferSize);
  13.       if r > 0 then
  14.         DoSomethingWithBuf(buf, r);
  15.     until r < BufferSize;
  16.   finally
  17.     fs.Free;
  18.   end;
  19. end.

Not tested (and it's not complete anyway), but it should give you a hint.

So that spawns a new question: which is faster for sustained read operations on a huge file--TFileStream or BlockRead?

For small reads there should be no real difference, but if you want to use large reads like shown above then you should use TFileStream (or FileRead and friends if you want to use a procedural approach).

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: [SOLVED] Qword to BlockRead: Unexplained, Reversed, or Garbled Output
« Reply #9 on: December 08, 2019, 01:31:08 pm »
Another approach when speed is of concern is to link the TFileStream to a TBufStream, setting the buffer length appropiately.

That allows you to do even byte-sized reads, if needed, with a considerable speed-up over just reading a TFileStream
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

 

TinyPortal © 2005-2018