Lazarus

Free Pascal => Beginners => Topic started by: Hi im Pascal on December 29, 2018, 01:24:32 am

Title: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on December 29, 2018, 01:24:32 am
Hello,

I'm wrote some code to compute the sha-256 hash of a file with some help from this StackOverflow question: https://stackoverflow.com/questions/553310/delphi-how-to-calculate-the-sha-hash-of-a-large-file

I was wondering if anyone had some improvements I could make. Please state the obvious, I am basically a Pascal noob. I'm interested if my code is solid and could be done more elegantly. How would you write this piece of code as a seasoned professional?

Code: Pascal  [Select][+][-]
  1. function HashFileSHA256(const fileName: String): String;
  2. var
  3.   sha256: TDCP_sha256;
  4.   buffer: array[0..1024*1024] of byte;
  5.   i, bytesRead: Integer;
  6.   streamIn: TFileStream;
  7.   hashBuf: array[0..31] of byte;
  8. begin
  9.   // Initialization
  10.   Result := '';
  11.   streamIn := TFileStream.Create(fileName, fmOpenRead);
  12.   sha256 := TDCP_sha256.Create(nil);
  13.   for i:=0 to Sizeof(buffer) - 1 do
  14.     buffer[i] := 0;
  15.   for i:=0 to Sizeof(hashBuf) - 1 do
  16.     hashBuf[i] := 0;
  17.   bytesRead := -1;
  18.  
  19.   // Compute
  20.   try
  21.     sha256.Init;
  22.     while bytesRead <> 0 do
  23.     begin
  24.       bytesRead := streamIn.Read(buffer[0], Sizeof(buffer));
  25.       sha256.Update(buffer[0], bytesRead);
  26.     end;
  27.     sha256.Final(hashBuf);
  28.     for I := 0 to Sizeof(hashBuf) - 1 do
  29.       Result := Result + IntToHex(hashBuf[i], 2);
  30.   finally
  31.     streamIn.Free;
  32.     sha256.Free;
  33.   end;
  34.  
  35.   Result := LowerCase(Result);
  36. end;

Thanks in advance!

Bonus Question: How to best generalize this to different hashing algorithms (e. g. SHA1) / different hash lengths?
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: lucamar on December 29, 2018, 03:56:46 am
One small change: Instead of
Code: Pascal  [Select][+][-]
  1.   for i:=0 to Sizeof(buffer) - 1 do
  2.     buffer[i] := 0;
  3.   for i:=0 to Sizeof(hashBuf) - 1 do
  4.     hashBuf[i] := 0;

you should use:

Code: Pascal  [Select][+][-]
  1.   FillChar(buffer, Sizeof(buffer), #00);
  2.   FillChar(hashBuf, Sizeof(hashBuf), #00);

FillChar() is quicker than iterating in a loop. IIRC there are other similar functions but I don't remember their name atm.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Xor-el on December 29, 2018, 10:04:19 am
One thing I find odd here is the buffer of 1kb you used, that is too small even for low end devices.
I suggest you use something like 32 or 64kb buffers at worst.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Thaddy on December 29, 2018, 10:20:38 am
For the bonus: you can use generics. May be Xor-el can add such ones to his excellent library which is already properly abstracted anyway?
Code: Pascal  [Select][+][-]
  1. //pseudo-code
  2.  result:= hashme<Tsha256>(const value:string):string;
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on December 29, 2018, 02:56:42 pm
you should use:

Code: Pascal  [Select][+][-]
  1.   FillChar(buffer, Sizeof(buffer), #00);
  2.   FillChar(hashBuf, Sizeof(hashBuf), #00);

FillChar() is quicker than iterating in a loop. IIRC there are other similar functions but I don't remember their name atm.
Thanks, I changed it. Is there any benefit to using the control string #00 as opposed to simply writing 0?

One thing I find odd here is the buffer of 1kb you used, that is too small even for low end devices.
I suggest you use something like 32 or 64kb buffers at worst.
Actually I'm using the size 1024², which should be around 1MB, or am I mistaking something? I do feel my code could be faster though, it takes a while to hash a few GB (several seconds). Not sure how fast that should actually be running.

As far as generics goes, I tried compiling the following:
Code: Pascal  [Select][+][-]
  1. generic function f<T>(a: T): T;
  2. begin
  3.   Result := a + a;
  4. end;
Which yields the error message
Code: [Select]
unit1.pas(85,1) Fatal: Syntax error, "BEGIN" expected but "identifier GENERIC" foundSo I think the compiler doesn't support that yet, since I copied that from another forum post.

I have an idea where I pass in an array of strings naming hash functions and it returns the hashes, like so:
Code: Pascal  [Select][+][-]
  1. var
  2.     hashes: Array of String[1..2];
  3. begin
  4.   hashes[1] = 'SHA256';
  5.   hashes[2] = 'SHA1';
  6.   ComputeFileHashes(EdFileName.FileName, hashes);
  7.   EdSHA256.Text := hashes[1];
  8.   EdSHA1.Text := hashes[2];
  9. end;
  10.  
Do you think this is bad design (because a function parameter is used as input as well as output)?

P.S.: Would you recommend HashLib4Pascal over what I am currently using (DCPCrypt)?
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: lucamar on December 29, 2018, 03:17:19 pm
Thanks, I changed it. Is there any benefit to using the control string #00 as opposed to simply writing 0?

No benefit ... except that the signature of FillChar() requires a char there. You could as well write:
  FillChar(Buffer, SizeOf(Buffer), chr(00));

Remember: #XX ~= Chr(XX)

ETA: Well, it so hapens I was wrong. There is an overload with a byte value and another with a boolean. So yes, you can write:
  FillChar(Buffer, SizeOf(Buffer), 0);
if you prefer it; or you can even use:
  FillByte(Buffer, SizeOf(Buffer), 0);
or if your buffer size is evenly divisible by two:
  FillWord(Buffer, SizeOf(Buffer) div 2, 0);
or if it's evenly divisible by four:
  FillFWord(Buffer, SizeOf(Buffer) div 4, 0);

So many options ...  :)


Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Xor-el on December 29, 2018, 03:28:32 pm
Sorry about the size issue, my bad.  ;)
As to whether HashLib4Pascal is more suitable for you want, that is left for you to decide but one secret I can let you in on is that HashLib4Pascal is designed with a fluent/fluid interface that helps you achieve what you want in one simple line of code.

An example below

Code: Pascal  [Select][+][-]
  1. uses
  2. HlpHashFactory;
  3.  
  4. ......
  5. ......
  6. ......
  7.  
  8. begin
  9.  result := THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
  10. end;
  11.  
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on December 29, 2018, 04:25:04 pm
@lucamar, thanks, I stuck with this, as I find it semantically most fitting:
Code: Pascal  [Select][+][-]
  1. FillByte(buffer, Sizeof(buffer), 0);

Code: Pascal  [Select][+][-]
  1. uses
  2. HlpHashFactory;
  3.  
  4. begin
  5.  result := THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
  6. end;
  7.  
Well that certainly looks much more attractive than rolling my own code... But how can I compute several hash functions simultaneously, without having to stream the file each time? So in my code I would do something like this:
Code: Pascal  [Select][+][-]
  1. while bytesRead <> 0 do
  2.     begin
  3.       bytesRead := streamIn.Read(buffer[0], Sizeof(buffer));
  4.       sha256.Update(buffer[0], bytesRead);
  5.       sha512.Update(buffer[0], bytesRead);
  6.       sha1.Update(buffer[0], bytesRead);
  7.       md5.Update(buffer[0], bytesRead);
  8.       // ...
  9.     end;
Does that make sense? Is there a way to compute multiple hash functions from a file with HashLib4Pascal? I assume with the following code, the file is read twice:
Code: Pascal  [Select][+][-]
  1.     THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
  2.     THashFactory.TCrypto.CreateSHA1().ComputeFile('filename.txt').ToString();
  3.  

Thanks!
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Xor-el on December 29, 2018, 05:21:07 pm

Well that certainly looks much more attractive than rolling my own code... But how can I compute several hash functions simultaneously, without having to stream the file each time? So in my code I would do something like this:
Code: Pascal  [Select][+][-]
  1. while bytesRead <> 0 do
  2.     begin
  3.       bytesRead := streamIn.Read(buffer[0], Sizeof(buffer));
  4.       sha256.Update(buffer[0], bytesRead);
  5.       sha512.Update(buffer[0], bytesRead);
  6.       sha1.Update(buffer[0], bytesRead);
  7.       md5.Update(buffer[0], bytesRead);
  8.       // ...
  9.     end;
Does that make sense? Is there a way to compute multiple hash functions from a file with HashLib4Pascal? I assume with the following code, the file is read twice:
Code: Pascal  [Select][+][-]
  1.     THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
  2.     THashFactory.TCrypto.CreateSHA1().ComputeFile('filename.txt').ToString();
  3.  

Thanks!

Here is a quick console code sample that does what you want.

Code: Pascal  [Select][+][-]
  1. program HashFile;
  2.  
  3. {$MODE DELPHI}
  4.  
  5. uses
  6.   Classes,
  7.   SysUtils,
  8.   HlpHashFactory,
  9.   HlpIHash;
  10.  
  11.   procedure DoHashFile(const AFileName: string);
  12.   var
  13.     LMD5, LSHA1, LSHA2_256, LSHA2_512: IHash;
  14.     LFileStream: TFileStream;
  15.     LBuffer: array[0 .. 1024 * 1024] of Byte;
  16.     LBytesRead: Int32;
  17.   begin
  18.     if FileExists(AFileName) then
  19.     begin
  20.       LFileStream := TFileStream.Create(AFileName, fmOpenRead or fmShareDenyWrite);
  21.       LFileStream.Position := 0;
  22.       System.FillChar(LBuffer, System.SizeOf(LBuffer), Byte(0));
  23.       LMD5 := THashFactory.TCrypto.CreateMD5();
  24.       LSHA1 := THashFactory.TCrypto.CreateSHA1();
  25.       LSHA2_256 := THashFactory.TCrypto.CreateSHA2_256();
  26.       LSHA2_512 := THashFactory.TCrypto.CreateSHA2_512();
  27.  
  28.       LMD5.Initialize();
  29.       LSHA1.Initialize();
  30.       LSHA2_256.Initialize();
  31.       LSHA2_512.Initialize();
  32.       try
  33.         LBytesRead := -1;
  34.         while LBytesRead <> 0 do
  35.         begin
  36.           LBytesRead := LFileStream.Read(LBuffer[0], SizeOf(LBuffer));
  37.  
  38.           LMD5.TransformUntyped(LBuffer, LBytesRead);
  39.           LSHA1.TransformUntyped(LBuffer, LBytesRead);
  40.           LSHA2_256.TransformUntyped(LBuffer, LBytesRead);
  41.           LSHA2_512.TransformUntyped(LBuffer, LBytesRead);
  42.         end;
  43.  
  44.         WriteLn(Format('MD5 Hash of "%s" is "%s" "%s"',
  45.           [AFileName, LMD5.TransformFinal().ToString(), SLineBreak]));
  46.  
  47.         WriteLn(Format('SHA1 Hash of "%s" is "%s" "%s"',
  48.           [AFileName, LSHA1.TransformFinal().ToString(), SLineBreak]));
  49.  
  50.         WriteLn(Format('SHA2_256 Hash of "%s" is "%s" "%s"',
  51.           [AFileName, LSHA2_256.TransformFinal().ToString(), SLineBreak]));
  52.  
  53.         WriteLn(Format('SHA2_512 Hash of "%s" is "%s" "%s"',
  54.           [AFileName, LSHA2_512.TransformFinal().ToString(), SLineBreak]));
  55.  
  56.       finally
  57.         LFileStream.Free;
  58.       end;
  59.     end
  60.     else
  61.     begin
  62.       WriteLn(Format('"%s" Not Found', [AFileName]));
  63.     end;
  64.   end;
  65.  
  66.  
  67.  
  68. begin
  69.   DoHashFile(ParamStr(0));
  70.   ReadLn;
  71. end.
  72.  
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on December 29, 2018, 08:51:08 pm
@ Xor-el, thanks very much, I can really use that code.

I played around a bit and realized that reading the file is only a tiny portion, most time is spent calculating the hashes. However, with the manual loop I can also add a progress bar, which would be hard to do with the one-liner solution.

I really should split each hash calculation into its own thread, but I'll leave that as an exercise to myself.

P.S.: I stumbled upon this: https://software.intel.com/en-us/articles/intel-sha-extensions, would it be possible to use inline assembly and apply those instructions to calculate the hashes (if supported)? Just an idea, this would probably be too complicated for me with my limited Pascal experience.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Xor-el on December 29, 2018, 10:08:19 pm

P.S.: I stumbled upon this: https://software.intel.com/en-us/articles/intel-sha-extensions, would it be possible to use inline assembly and apply those instructions to calculate the hashes (if supported)? Just an idea, this would probably be too complicated for me with my limited Pascal experience.

while this is possible (compiling the C/C++ Compiler Intrinsics and and embedding the "db" instructions or assembler in our code), this is not a field I want to venture in at the moment.
Also while you can get optimized implementations of some hashes using inline ASM from Mormot SynCrypto.pas, this will only benefit your program on platforms which support ASM codes (in this case, Intel).
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on December 30, 2018, 02:10:36 am
I think the instructions are available on AMD Ryzen as well.

I tried to gain performance by multi-threading, but it didn't work (worse performance). Too much you can do wrong, and I'm not really good with Pascal yet. So now I just do the computation in a single thread, and update a progress bar. It's not a big deal waiting several seconds to hash a several GB file IMHO. I am very happy with the end result. In case anyone is interested, here is the code:

Code: Pascal  [Select][+][-]
  1. const
  2.   BUFFER_SIZE = 1024 * 1024;
  3.  
  4. type
  5.   TFileBuffer = array[0 .. BUFFER_SIZE - 1] of byte;
  6.  
  7. { TCalcFileHashesThread }
  8. type
  9.   TCalcFileHashesThread = class(TThread)
  10.   public
  11.     constructor Create(createSuspended: boolean; const fileName: string);
  12.  
  13.   protected
  14.     procedure Execute; override;
  15.  
  16.   private
  17.     filePath: string;
  18.     fileStream: TFileStream;
  19.     sha1: IHash;
  20.     sha256: IHash;
  21.  
  22.     procedure UpdateProgress;
  23.     procedure ResetStatus;
  24.     procedure ExportHashes;
  25.   end;
  26.  
  27. { TCalcFileHashesThread }
  28.  
  29. constructor TCalcFileHashesThread.Create(createSuspended: boolean; const fileName: string);
  30. begin
  31.   inherited Create(createSuspended);
  32.   FreeOnTerminate := True;
  33.  
  34.   filePath := fileName;
  35.   fileStream := nil;
  36.   sha1 := nil;
  37.   sha256 := nil;
  38. end;
  39.  
  40. procedure TCalcFileHashesThread.Execute;
  41. var
  42.   buffer: TFileBuffer;
  43.   bytesRead: integer;
  44. begin
  45.   if FileExists(filePath) then begin
  46.     fileStream := TFileStream.Create(filePath, fmOpenRead or fmShareDenyWrite);
  47.     try
  48.       sha1 := THashFactory.TCrypto.CreateSHA1();
  49.       sha256 := THashFactory.TCrypto.CreateSHA2_256();
  50.       sha1.Initialize();
  51.       sha256.Initialize();
  52.       while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  53.         bytesRead := fileStream.Read(buffer, Sizeof(buffer));
  54.         sha1.TransformUntyped(buffer, bytesRead);
  55.         sha256.TransformUntyped(buffer, bytesRead);
  56.         Synchronize(@UpdateProgress);
  57.       end;
  58.       if not Terminated then
  59.         Synchronize(@ExportHashes);
  60.     finally
  61.       fileStream.Free;
  62.     end;
  63.   end;
  64.   Synchronize(@ResetStatus);
  65. end;
  66.  
  67. procedure TCalcFileHashesThread.ExportHashes;
  68. begin
  69.   Form1.EdSHA1.Text := LowerCase(sha1.TransformFinal.ToString());
  70.   Form1.EdSHA256.Text := LowerCase(sha256.TransformFinal.ToString());
  71. end;
  72.  
  73. procedure TCalcFileHashesThread.UpdateProgress;
  74. begin
  75.   Form1.ProgressBar.Position :=
  76.     Round(Form1.ProgressBar.Max * fileStream.Position / Real(fileStream.Size));
  77. end;
  78.  
  79. procedure TCalcFileHashesThread.ResetStatus;
  80. begin
  81.   Form1.Button1.Caption := 'Compute && Verify';
  82. end;
  83.  
  84. procedure TForm1.Button1Click(Sender: TObject);
  85. begin
  86.   if Button1.Caption = 'Cancel' then begin
  87.     thread.Terminate;
  88.     thread.WaitFor;
  89.     Progressbar.Position := 0;
  90.   end else begin
  91.     thread := TCalcFileHashesThread.Create(False, EdFile.FileName);
  92.     Button1.Caption := 'Cancel';
  93.   end;
  94. end;

Thanks for all the help!
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: engkin on December 30, 2018, 05:28:56 am
I think you are calling UpdateProgress too much. Also, using Synchronize defeats the purpose as the thread will wait for the main GUI thread to finish running UpdateProgress before it handles the next amount. Measure hashing performance without calling UpdateProgress.

I would also try each hash in a separate thread, since modern computers have a few cores.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on December 30, 2018, 02:01:49 pm
I think you are calling UpdateProgress too much. Also, using Synchronize defeats the purpose as the thread will wait for the main GUI thread to finish running UpdateProgress before it handles the next amount. Measure hashing performance without calling UpdateProgress.

I would also try each hash in a separate thread, since modern computers have a few cores.
The thread is just to keep the GUI responsive while the computation is running (Cancel Button). So I wouldn't say that it defeats the purpose, but I agree it might cost a little performance, which I am willing to sacrifice for a progress bar (I could not measure any significant difference, so I wouldn't say it's a major bottleneck). I don't know a way to measure the progress from within the main GUI thread, if the computation is happening in a different thread.

I tried creating a Thread for the calls to TransformUntyped, but if anything it ran slower, I guess creating and destroying the threads all the time might be the problem. I hope I wasn't copying any buffers, but then again I haven't really understood the copy/reference/pointer semantics in Pascal as I do in C++. It looked like this:

Code: Pascal  [Select][+][-]
  1.         while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  2.           bytesRead := fileStream.Read(buffer, Sizeof(buffer));
  3.           thread_SHA1 := TTransformHashThread.Create(sha1, buffer, bytesRead);  // starts unsuspended
  4.           thread_SHA256 := TTransformHashThread.Create(sha256, buffer, bytesRead);
  5.           thread_SHA1.WaitFor;
  6.           thread_SHA256.WaitFor;
  7.           //Synchronize(@UpdateProgress);
  8.         end;
  9.  
  10. { TTransformHashThread }
  11.  
  12. type
  13.  
  14.   TTransformHashThread = class(TThread)
  15.   public
  16.     constructor Create(hash: IHash, var buffer: TFileBuffer, count: integer);
  17.  
  18.   protected
  19.     procedure Execute; override;  // called hashFn.TransformUntyped(buf, numBytes);
  20.  
  21.   private
  22.     hashFn: IHash;
  23.     var buf: TFileBuffer;
  24.     numBytes: integer;
  25.  

What did gain some performance is doing the whole
Code: [Select]
ComputeFile('').ToString() in separate threads, albeit the file is read multiple times and I don't have my progress bar. I'm suprised anyway that's faster, but it is (Before 17 seconds for both hashes, now 11 seconds, which is how long the SHA256 takes).
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: engkin on December 30, 2018, 02:52:06 pm
The thread is just to keep the GUI responsive while the computation is running (Cancel Button).
So I wouldn't say that it defeats the purpose, but I agree it might cost a little performance, which I am willing to sacrifice for a progress bar (I could not measure any significant difference, so I wouldn't say it's a major bottleneck). I don't know a way to measure the progress from within the main GUI thread, if the computation is happening in a different thread.
I would add a property to the thread to hold the progress. The main thread can access the value using a timer.

I tried creating a Thread for the calls to TransformUntyped, but if anything it ran slower, I guess creating and destroying the threads all the time might be the problem.
Correct, creating/destroying threads takes a lot of time. Use one thread for each hash. Create it only once at the beginning.

I hope I wasn't copying any buffers, but then again I haven't really understood the copy/reference/pointer semantics in Pascal as I do in C++. It looked like this:

Code: Pascal  [Select][+][-]
  1.         while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  2.           bytesRead := fileStream.Read(buffer, Sizeof(buffer));
  3.           thread_SHA1 := TTransformHashThread.Create(sha1, buffer, bytesRead);  // starts unsuspended
  4.           thread_SHA256 := TTransformHashThread.Create(sha256, buffer, bytesRead);
  5.           thread_SHA1.WaitFor;
  6.           thread_SHA256.WaitFor;
  7.           //Synchronize(@UpdateProgress);
  8.         end;
  9.  
  10. { TTransformHashThread }
  11.  
  12. type
  13.  
  14.   TTransformHashThread = class(TThread)
  15.   public
  16.     constructor Create(hash: IHash, var buffer: TFileBuffer, count: integer);
  17.  
  18.   protected
  19.     procedure Execute; override;  // called hashFn.TransformUntyped(buf, numBytes);
  20.  
  21.   private
  22.     hashFn: IHash;
  23.     var buf: TFileBuffer;
  24.     numBytes: integer;
  25.  
Yes, creating the thread so many times, and also calling WaitFor makes it slow because the faster thread now has to wait for the slower thread before it process the next amount.

What did gain some performance is doing the whole
Code: [Select]
ComputeFile('').ToString() in separate threads, albeit the file is read multiple times and I don't have my progress bar. I'm suprised anyway that's faster, but it is (Before 17 seconds for both hashes, now 11 seconds, which is how long the SHA256 takes).
It is read one time, the second time it will be provided from the cache.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on December 30, 2018, 09:23:07 pm
@engkin,
thanks for the advice with the timer, I implemented it that way and it seems to be working. Is accessing the Thread object thread-safe? I mean, I could be writing the progress variable from inside the thread and reading it to update my progress bar from the main GUI thread at the same time.

For example, if I create a function 'GetProcess' which returns fileStream.position / Real(fileStream.Size), is it guaranteed that reading fileStream.position is thread safe e.g. what if I read from inside the thread at the same time and position is updated. I don't think updating fileStream.position would be an atomic operation by nature, unless TFileStream makes sure of it.

As far as the caching goes, are you referring to caching the operating system would do? Because I can't find any caching code inside HashLib4Pascal's IHash.TransformStream. Additionally, since some hashing algorithms run much faster than others, I wonder how you could effectively cache the reads. But I guess I'm not complaining, I used the following code to parallelize and everything seems to be working nicely:

Code: Pascal  [Select][+][-]
  1. //================================================================================================//
  2. //                                      TCalcFileHash                                             //                        
  3. //================================================================================================//
  4. type
  5.   TCalcFileHash = class(TThread)
  6.   public
  7.     constructor Create(createSuspended: Boolean; fileName: String; hashFn: IHash);
  8.    
  9.     function GetProgress: Real;
  10.    
  11.   protected
  12.     procedure Execute; override;
  13.    
  14.   private
  15.     fileStream: TFileStream;
  16.     filePath: String;
  17.     hash: IHash;
  18.     hashStr: String;
  19.    
  20.     procedure OnFinished;
  21.   end;
  22.  
  23. constructor TCalcFileHash.Create(createSuspended: Boolean; fileName: String; hashFn: IHash);
  24. begin
  25.   inherited Create(createSuspended);
  26.   FreeOnTerminate := true;
  27.   filePath := fileName;
  28.   hash := hashFn;
  29. end;
  30.  
  31. procedure TCalcFileHash.Execute;
  32. var
  33.   bytesRead: Integer;
  34.   buffer: TFileBuffer;
  35. begin
  36.   if FileExists(filePath) then begin
  37.     fileStream := TFileStream.Create(filePath, fmOpenRead or fmShareDenyWrite);
  38.     try
  39.       hash.Initialize();
  40.       while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  41.         bytesRead := fileStream.Read(buffer, Sizeof(buffer));
  42.         hash.TransformUntyped(buffer, bytesRead);
  43.       end;
  44.       hashStr := LowerCase(hash.TransformFinal.ToString());
  45.     finally
  46.       FreeAndNil(fileStream);
  47.     end;
  48.   end;
  49.   Synchronize(@OnFinished);
  50. end;
  51.  
  52. function TCalcFileHash.GetProgress: Real;
  53. begin
  54.   Result := 1;
  55.   if fileStream <> nil then
  56.     Result := fileStream.Position / Real(fileStream.Size);
  57. end;
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: engkin on January 01, 2019, 02:21:00 am
Is accessing the Thread object thread-safe?
If you design it right, then it should be thread-safe.

I mean, I could be writing the progress variable from inside the thread and reading it to update my progress bar from the main GUI thread at the same time.

For example, if I create a function 'GetProcess' which returns fileStream.position / Real(fileStream.Size), is it guaranteed that reading fileStream.position is thread safe e.g. what if I read from inside the thread at the same time and position is updated. I don't think updating fileStream.position would be an atomic operation by nature, unless TFileStream makes sure of it.
It would be thread-safe to call GetProgress from within the thread loop itself to update a variable.  Simply retrieve the value using InterLockedExchange or a similar approach. There is an example of this here (https://forum.lazarus.freepascal.org/index.php/topic,40163.msg277445.html#msg277445) by ASerge. It is easy to expand it to get the overall progress if you have more than one thread, like one thread for sha1 and the other for sha256, or more than one file on a system with a few cores.

There is another method using QueueAsyncCall as in this example (https://forum.lazarus.freepascal.org/index.php/topic,40163.msg277236.html#msg277236) by Taazz.

As far as the caching goes, are you referring to caching the operating system would do? Because I can't find any caching code inside HashLib4Pascal's IHash.TransformStream.
Yes, the OS and the HD itself.

Additionally, since some hashing algorithms run much faster than others, I wonder how you could effectively cache the reads.
I doubt that you need to cache any reads in this specific case. To confirm, measure how long it takes to:
1-read one file without any processing.
2-process the data from memory.

Based on the results you can decide if you really benefit from caching.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on January 01, 2019, 02:47:10 am
@engkin,

thanks once more for your reply. I will look into synchronizing my access to GetProgress(), because as of now I just read TFileStream.Position without any sort of "protection". I already know that reading the file is much faster than the actual hashing, so I guess I won't worry about it. It's probably not worth the effort.

P.S.:
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.). The only thing I think is better about Qt is the Horizontal/Vertical Layout System in QDesigner makes it really easy to lay out the controls, compared to specifying all the anchors manually. Basically creating the layout in Qt: 30 seconds, creating the layout with Lazarus: 5 minutes. lol.

Lazarus follows the GTK theme set perfectly, whereas Qt just draws its own thing and looks completely out of place... Another thing I really like about LCL so far.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: engkin on January 01, 2019, 03:09:02 am
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.).
My guess it runs slow intentionally to make timing attacks difficult. You need to use non-cryptographic hashing if you seek speed.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on January 01, 2019, 03:24:28 am
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.).
My guess it runs slow intentionally to make timing attacks difficult. You need to use non-cryptographic hashing if you seek speed.

Mmh that might be, but it's not that much slower, but about 20%-30% maybe. I thought that's what bcrypt, scrypt, argon2 and the likes are for. Usually the hashes are provided to verify file downloads of a certain hash kind, so the choice isn't really mine :)

Looking at the source I can't find any direct code causing slow up, but there seem to be a few memcopies and actually they overwrite the local buffer and variables with zeros after computing a portion of the hash, that might add some overhead (not sure how much). In case anyone is interested, here is the source I viewed: https://code.woboq.org/qt5/qtbase/src/corelib/tools/qcryptographichash.cpp.html
Code: C  [Select][+][-]
  1. void QCryptographicHash::addData(const char *data, int length)
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: engkin on January 01, 2019, 04:48:40 am
Yes, you are right. Like in sha1ProcessChunk (https://code.woboq.org/qt5/qtbase/src/3rdparty/sha1/sha1.cpp.html#141). It repeats the process of copying to a local variable at the begining and zeroing at the end for every 64 bytes of the hashed data. So you really need to compare FPC with non-cryptographic lib.

Hashing, in addition to your usage to verify downloads, is also used in a step to process passwords before saving the result to a database. So I was wrong about assuming protection against timing attack, but still being a cryptographic lib it tries to not leave memory trace.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Xor-el on January 01, 2019, 09:34:36 am
Yes, you are right. Like in sha1ProcessChunk (https://code.woboq.org/qt5/qtbase/src/3rdparty/sha1/sha1.cpp.html#141). It repeats the process of copying to a local variable at the begining and zeroing at the end for every 64 bytes of the hashed data. So you really need to compare FPC with non-cryptographic lib.

Hashing, in addition to your usage to verify downloads, is also used in a step to process passwords before saving the result to a database. So I was wrong about assuming protection against timing attack, but still being a cryptographic lib it tries to not leave memory trace.

Yes you are right. Cryptographic Lib try not to leave traces as seen there.
HashLib4Pascal does same as you can see here https://github.com/Xor-el/HashLib4Pascal/blob/master/HashLib/src/Crypto/HlpSHA2_256Base.pas#L944 (https://github.com/Xor-el/HashLib4Pascal/blob/master/HashLib/src/Crypto/HlpSHA2_256Base.pas#L944)
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Xor-el on January 01, 2019, 09:42:08 am

P.S.:
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.). The only thing I think is better about Qt is the Horizontal/Vertical Layout System in QDesigner makes it really easy to lay out the controls, compared to specifying all the anchors manually. Basically creating the layout in Qt: 30 seconds, creating the layout with Lazarus: 5 minutes. lol.

Lazarus follows the GTK theme set perfectly, whereas Qt just draws its own thing and looks completely out of place... Another thing I really like about LCL so far.

Am not really surprised to be sincere because HashLib4Pascal employes some few tricks up it's sleeves

1. Keep unnecessary copies to a minimum
2. Perform loop unrolling for some functions (rolled variants are still available and can be enabled using a switch)
3. Makes use of compiler intrinsics whenever possible (glad FPC has a number of these  :-* )

SideHint: the speed of the MD5 implementation in HashLib4Pascal still blows my mind till date  :)
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on January 01, 2019, 06:07:52 pm
Yeah I'm happy I made the switch. I don't think DCPcrypt is mainted much anymore, is it?

P.S.: I tried making GetProgress thread safe, unfortunately fileStream.Position won't work with InterlockedExchange, since it is a property. So my idea was to store it into a member variable after I read and hashed a block (inside the Execute main loop), and then in GetProgress I can retrieve it:

Code: Pascal  [Select][+][-]
  1. // Inside GetProgress
  2. Result := InterLockedExchangeAdd64(streamPos, 0) / Real(fileStream.Size);
I don't think fileStream.Size is an issue, since it should stay constant while the thread is running. I'm not really sure if I have to use the InterlockedExchange function when writing to streamPos, too, so it works correctly, or if it's enough to use it in GetProgress:
Code: Pascal  [Select][+][-]
  1. while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  2.   bytesRead := fileStream.Read(buffer, Sizeof(buffer));
  3.   hash.TransformUntyped(buffer, bytesRead);
  4.   InterLockedExchange64(streamPos, fileStream.Position);
  5.   // v.s.
  6.   // streamPos := fileStream.Position;
  7. end;
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: engkin on January 01, 2019, 09:20:21 pm
The code that runs inside the thread does not need to use interlocked function, but it needs to keep track of its progress:
Code: Pascal  [Select][+][-]
  1. while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
  2. ...
  3.   FProgress := round(100*fileStream.Position/fileStream.size); or similar code

The thread should provide a procedure to retrieve the progress. This procedure is called from the main thread, so you need to use interlocked call:
Code: Pascal  [Select][+][-]
  1. type
  2.   TCalcFileHash = class(TThread)
  3.   private
  4.     FProgress: Cardinal;
  5. ...  
  6.   public
  7.     procedure GetProgress(out AProgress: Cardinal);
  8.  
  9. ...
  10. procedure TCalcFileHash.GetProgress(out AProgress: Cardinal);
  11. begin
  12.   InterLockedExchange(AProgress, FProgress);
  13. end;

This code is written using the forum editor so you get the idea.
Title: Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
Post by: Hi im Pascal on January 02, 2019, 08:28:50 am
@engkin ah OK I just changed progress from being a real [0;1] to being an Integer [0;100], then I can use InterLockedExchange. Thanks, got it!
TinyPortal © 2005-2018