Recent

Author Topic: [Code Review] SHA-256 of a file (hash verification tool for downloads)  (Read 2649 times)

Hi im Pascal

  • New member
  • *
  • Posts: 24
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #15 on: December 30, 2018, 09:23:07 pm »
@engkin,
thanks for the advice with the timer, I implemented it that way and it seems to be working. Is accessing the Thread object thread-safe? I mean, I could be writing the progress variable from inside the thread and reading it to update my progress bar from the main GUI thread at the same time.

For example, if I create a function 'GetProcess' which returns fileStream.position / Real(fileStream.Size), is it guaranteed that reading fileStream.position is thread safe e.g. what if I read from inside the thread at the same time and position is updated. I don't think updating fileStream.position would be an atomic operation by nature, unless TFileStream makes sure of it.

As far as the caching goes, are you referring to caching the operating system would do? Because I can't find any caching code inside HashLib4Pascal's IHash.TransformStream. Additionally, since some hashing algorithms run much faster than others, I wonder how you could effectively cache the reads. But I guess I'm not complaining, I used the following code to parallelize and everything seems to be working nicely:

Code: Pascal  [Select]
  1. //================================================================================================//
  2. //                                      TCalcFileHash                                             //                        
  3. //================================================================================================//
  4. type
  5.   TCalcFileHash = class(TThread)
  6.   public
  7.     constructor Create(createSuspended: Boolean; fileName: String; hashFn: IHash);
  8.    
  9.     function GetProgress: Real;
  10.    
  11.   protected
  12.     procedure Execute; override;
  13.    
  14.   private
  15.     fileStream: TFileStream;
  16.     filePath: String;
  17.     hash: IHash;
  18.     hashStr: String;
  19.    
  20.     procedure OnFinished;
  21.   end;
  22.  
  23. constructor TCalcFileHash.Create(createSuspended: Boolean; fileName: String; hashFn: IHash);
  24. begin
  25.   inherited Create(createSuspended);
  26.   FreeOnTerminate := true;
  27.   filePath := fileName;
  28.   hash := hashFn;
  29. end;
  30.  
  31. procedure TCalcFileHash.Execute;
  32. var
  33.   bytesRead: Integer;
  34.   buffer: TFileBuffer;
  35. begin
  36.   if FileExists(filePath) then begin
  37.     fileStream := TFileStream.Create(filePath, fmOpenRead or fmShareDenyWrite);
  38.     try
  39.       hash.Initialize();
  40.       while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  41.         bytesRead := fileStream.Read(buffer, Sizeof(buffer));
  42.         hash.TransformUntyped(buffer, bytesRead);
  43.       end;
  44.       hashStr := LowerCase(hash.TransformFinal.ToString());
  45.     finally
  46.       FreeAndNil(fileStream);
  47.     end;
  48.   end;
  49.   Synchronize(@OnFinished);
  50. end;
  51.  
  52. function TCalcFileHash.GetProgress: Real;
  53. begin
  54.   Result := 1;
  55.   if fileStream <> nil then
  56.     Result := fileStream.Position / Real(fileStream.Size);
  57. end;
« Last Edit: December 30, 2018, 09:28:43 pm by Hi im Pascal »

engkin

  • Hero Member
  • *****
  • Posts: 2299
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #16 on: January 01, 2019, 02:21:00 am »
Is accessing the Thread object thread-safe?
If you design it right, then it should be thread-safe.

I mean, I could be writing the progress variable from inside the thread and reading it to update my progress bar from the main GUI thread at the same time.

For example, if I create a function 'GetProcess' which returns fileStream.position / Real(fileStream.Size), is it guaranteed that reading fileStream.position is thread safe e.g. what if I read from inside the thread at the same time and position is updated. I don't think updating fileStream.position would be an atomic operation by nature, unless TFileStream makes sure of it.
It would be thread-safe to call GetProgress from within the thread loop itself to update a variable.  Simply retrieve the value using InterLockedExchange or a similar approach. There is an example of this here by ASerge. It is easy to expand it to get the overall progress if you have more than one thread, like one thread for sha1 and the other for sha256, or more than one file on a system with a few cores.

There is another method using QueueAsyncCall as in this example by Taazz.

As far as the caching goes, are you referring to caching the operating system would do? Because I can't find any caching code inside HashLib4Pascal's IHash.TransformStream.
Yes, the OS and the HD itself.

Additionally, since some hashing algorithms run much faster than others, I wonder how you could effectively cache the reads.
I doubt that you need to cache any reads in this specific case. To confirm, measure how long it takes to:
1-read one file without any processing.
2-process the data from memory.

Based on the results you can decide if you really benefit from caching.

Hi im Pascal

  • New member
  • *
  • Posts: 24
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #17 on: January 01, 2019, 02:47:10 am »
@engkin,

thanks once more for your reply. I will look into synchronizing my access to GetProgress(), because as of now I just read TFileStream.Position without any sort of "protection". I already know that reading the file is much faster than the actual hashing, so I guess I won't worry about it. It's probably not worth the effort.

P.S.:
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.). The only thing I think is better about Qt is the Horizontal/Vertical Layout System in QDesigner makes it really easy to lay out the controls, compared to specifying all the anchors manually. Basically creating the layout in Qt: 30 seconds, creating the layout with Lazarus: 5 minutes. lol.

Lazarus follows the GTK theme set perfectly, whereas Qt just draws its own thing and looks completely out of place... Another thing I really like about LCL so far.

engkin

  • Hero Member
  • *****
  • Posts: 2299
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #18 on: January 01, 2019, 03:09:02 am »
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.).
My guess it runs slow intentionally to make timing attacks difficult. You need to use non-cryptographic hashing if you seek speed.

Hi im Pascal

  • New member
  • *
  • Posts: 24
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #19 on: January 01, 2019, 03:24:28 am »
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.).
My guess it runs slow intentionally to make timing attacks difficult. You need to use non-cryptographic hashing if you seek speed.

Mmh that might be, but it's not that much slower, but about 20%-30% maybe. I thought that's what bcrypt, scrypt, argon2 and the likes are for. Usually the hashes are provided to verify file downloads of a certain hash kind, so the choice isn't really mine :)

Looking at the source I can't find any direct code causing slow up, but there seem to be a few memcopies and actually they overwrite the local buffer and variables with zeros after computing a portion of the hash, that might add some overhead (not sure how much). In case anyone is interested, here is the source I viewed: https://code.woboq.org/qt5/qtbase/src/corelib/tools/qcryptographichash.cpp.html
Code: C  [Select]
  1. void QCryptographicHash::addData(const char *data, int length)

engkin

  • Hero Member
  • *****
  • Posts: 2299
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #20 on: January 01, 2019, 04:48:40 am »
Yes, you are right. Like in sha1ProcessChunk. It repeats the process of copying to a local variable at the begining and zeroing at the end for every 64 bytes of the hashed data. So you really need to compare FPC with non-cryptographic lib.

Hashing, in addition to your usage to verify downloads, is also used in a step to process passwords before saving the result to a database. So I was wrong about assuming protection against timing attack, but still being a cryptographic lib it tries to not leave memory trace.

Xor-el

  • Sr. Member
  • ****
  • Posts: 265
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #21 on: January 01, 2019, 09:34:36 am »
Yes, you are right. Like in sha1ProcessChunk. It repeats the process of copying to a local variable at the begining and zeroing at the end for every 64 bytes of the hashed data. So you really need to compare FPC with non-cryptographic lib.

Hashing, in addition to your usage to verify downloads, is also used in a step to process passwords before saving the result to a database. So I was wrong about assuming protection against timing attack, but still being a cryptographic lib it tries to not leave memory trace.

Yes you are right. Cryptographic Lib try not to leave traces as seen there.
HashLib4Pascal does same as you can see here https://github.com/Xor-el/HashLib4Pascal/blob/master/HashLib/src/Crypto/HlpSHA2_256Base.pas#L944

Xor-el

  • Sr. Member
  • ****
  • Posts: 265
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #22 on: January 01, 2019, 09:42:08 am »

P.S.:
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.). The only thing I think is better about Qt is the Horizontal/Vertical Layout System in QDesigner makes it really easy to lay out the controls, compared to specifying all the anchors manually. Basically creating the layout in Qt: 30 seconds, creating the layout with Lazarus: 5 minutes. lol.

Lazarus follows the GTK theme set perfectly, whereas Qt just draws its own thing and looks completely out of place... Another thing I really like about LCL so far.

Am not really surprised to be sincere because HashLib4Pascal employes some few tricks up it's sleeves

1. Keep unnecessary copies to a minimum
2. Perform loop unrolling for some functions (rolled variants are still available and can be enabled using a switch)
3. Makes use of compiler intrinsics whenever possible (glad FPC has a number of these  :-* )

SideHint: the speed of the MD5 implementation in HashLib4Pascal still blows my mind till date  :)

Hi im Pascal

  • New member
  • *
  • Posts: 24
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #23 on: January 01, 2019, 06:07:52 pm »
Yeah I'm happy I made the switch. I don't think DCPcrypt is mainted much anymore, is it?

P.S.: I tried making GetProgress thread safe, unfortunately fileStream.Position won't work with InterlockedExchange, since it is a property. So my idea was to store it into a member variable after I read and hashed a block (inside the Execute main loop), and then in GetProgress I can retrieve it:

Code: Pascal  [Select]
  1. // Inside GetProgress
  2. Result := InterLockedExchangeAdd64(streamPos, 0) / Real(fileStream.Size);
I don't think fileStream.Size is an issue, since it should stay constant while the thread is running. I'm not really sure if I have to use the InterlockedExchange function when writing to streamPos, too, so it works correctly, or if it's enough to use it in GetProgress:
Code: Pascal  [Select]
  1. while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  2.   bytesRead := fileStream.Read(buffer, Sizeof(buffer));
  3.   hash.TransformUntyped(buffer, bytesRead);
  4.   InterLockedExchange64(streamPos, fileStream.Position);
  5.   // v.s.
  6.   // streamPos := fileStream.Position;
  7. end;

engkin

  • Hero Member
  • *****
  • Posts: 2299
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #24 on: January 01, 2019, 09:20:21 pm »
The code that runs inside the thread does not need to use interlocked function, but it needs to keep track of its progress:
Code: Pascal  [Select]
  1. while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  2. ...
  3.   FProgress := round(100*fileStream.Position/fileStream.size); or similar code

The thread should provide a procedure to retrieve the progress. This procedure is called from the main thread, so you need to use interlocked call:
Code: Pascal  [Select]
  1. type
  2.   TCalcFileHash = class(TThread)
  3.   private
  4.     FProgress: Cardinal;
  5. ...  
  6.   public
  7.     procedure GetProgress(out AProgress: Cardinal);
  8.  
  9. ...
  10. procedure TCalcFileHash.GetProgress(out AProgress: Cardinal);
  11. begin
  12.   InterLockedExchange(AProgress, FProgress);
  13. end;

This code is written using the forum editor so you get the idea.

Hi im Pascal

  • New member
  • *
  • Posts: 24
Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #25 on: January 02, 2019, 08:28:50 am »
@engkin ah OK I just changed progress from being a real [0;1] to being an Integer [0;100], then I can use InterLockedExchange. Thanks, got it!