### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook (preview only)

### Author Topic: [Code Review] SHA-256 of a file (hash verification tool for downloads)  (Read 3504 times)

#### Hi im Pascal

• New member
• Posts: 29
##### [Code Review] SHA-256 of a file (hash verification tool for downloads)
« on: December 29, 2018, 01:24:32 am »
Hello,

I'm wrote some code to compute the sha-256 hash of a file with some help from this StackOverflow question: https://stackoverflow.com/questions/553310/delphi-how-to-calculate-the-sha-hash-of-a-large-file

I was wondering if anyone had some improvements I could make. Please state the obvious, I am basically a Pascal noob. I'm interested if my code is solid and could be done more elegantly. How would you write this piece of code as a seasoned professional?

Code: Pascal  [Select]
1. function HashFileSHA256(const fileName: String): String;
2. var
3.   sha256: TDCP_sha256;
4.   buffer: array[0..1024*1024] of byte;
6.   streamIn: TFileStream;
7.   hashBuf: array[0..31] of byte;
8. begin
9.   // Initialization
10.   Result := '';
12.   sha256 := TDCP_sha256.Create(nil);
13.   for i:=0 to Sizeof(buffer) - 1 do
14.     buffer[i] := 0;
15.   for i:=0 to Sizeof(hashBuf) - 1 do
16.     hashBuf[i] := 0;
18.
19.   // Compute
20.   try
21.     sha256.Init;
22.     while bytesRead <> 0 do
23.     begin
26.     end;
27.     sha256.Final(hashBuf);
28.     for I := 0 to Sizeof(hashBuf) - 1 do
29.       Result := Result + IntToHex(hashBuf[i], 2);
30.   finally
31.     streamIn.Free;
32.     sha256.Free;
33.   end;
34.
35.   Result := LowerCase(Result);
36. end;

Bonus Question: How to best generalize this to different hashing algorithms (e. g. SHA1) / different hash lengths?
« Last Edit: December 29, 2018, 02:01:02 am by Hi im Pascal »

#### lucamar

• Hero Member
• Posts: 1814
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #1 on: December 29, 2018, 03:56:46 am »
Code: Pascal  [Select]
1.   for i:=0 to Sizeof(buffer) - 1 do
2.     buffer[i] := 0;
3.   for i:=0 to Sizeof(hashBuf) - 1 do
4.     hashBuf[i] := 0;

you should use:

Code: Pascal  [Select]
1.   FillChar(buffer, Sizeof(buffer), #00);
2.   FillChar(hashBuf, Sizeof(hashBuf), #00);

FillChar() is quicker than iterating in a loop. IIRC there are other similar functions but I don't remember their name atm.
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)
Lazarus 1.8.4 & 2.0.2 w/FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

#### Xor-el

• Sr. Member
• Posts: 338
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #2 on: December 29, 2018, 10:04:19 am »
One thing I find odd here is the buffer of 1kb you used, that is too small even for low end devices.
I suggest you use something like 32 or 64kb buffers at worst.

• Hero Member
• Posts: 8182
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #3 on: December 29, 2018, 10:20:38 am »
For the bonus: you can use generics. May be Xor-el can add such ones to his excellent library which is already properly abstracted anyway?
Code: Pascal  [Select]
1. //pseudo-code
2.  result:= hashme<Tsha256>(const value:string):string;
« Last Edit: December 29, 2018, 10:23:42 am by Thaddy »
Read the manuals and if you are a professional get a proper education in computer science. Makes the forum a lot cleaner.

#### Hi im Pascal

• New member
• Posts: 29
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #4 on: December 29, 2018, 02:56:42 pm »
you should use:

Code: Pascal  [Select]
1.   FillChar(buffer, Sizeof(buffer), #00);
2.   FillChar(hashBuf, Sizeof(hashBuf), #00);

FillChar() is quicker than iterating in a loop. IIRC there are other similar functions but I don't remember their name atm.
Thanks, I changed it. Is there any benefit to using the control string #00 as opposed to simply writing 0?

One thing I find odd here is the buffer of 1kb you used, that is too small even for low end devices.
I suggest you use something like 32 or 64kb buffers at worst.
Actually I'm using the size 1024², which should be around 1MB, or am I mistaking something? I do feel my code could be faster though, it takes a while to hash a few GB (several seconds). Not sure how fast that should actually be running.

As far as generics goes, I tried compiling the following:
Code: Pascal  [Select]
1. generic function f<T>(a: T): T;
2. begin
3.   Result := a + a;
4. end;
Which yields the error message
Code: [Select]
`unit1.pas(85,1) Fatal: Syntax error, "BEGIN" expected but "identifier GENERIC" found`So I think the compiler doesn't support that yet, since I copied that from another forum post.

I have an idea where I pass in an array of strings naming hash functions and it returns the hashes, like so:
Code: Pascal  [Select]
1. var
2.     hashes: Array of String[1..2];
3. begin
4.   hashes[1] = 'SHA256';
5.   hashes[2] = 'SHA1';
6.   ComputeFileHashes(EdFileName.FileName, hashes);
7.   EdSHA256.Text := hashes[1];
8.   EdSHA1.Text := hashes[2];
9. end;
10.
Do you think this is bad design (because a function parameter is used as input as well as output)?

P.S.: Would you recommend HashLib4Pascal over what I am currently using (DCPCrypt)?

#### lucamar

• Hero Member
• Posts: 1814
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #5 on: December 29, 2018, 03:17:19 pm »
Thanks, I changed it. Is there any benefit to using the control string #00 as opposed to simply writing 0?

No benefit ... except that the signature of FillChar() requires a char there. You could as well write:
FillChar(Buffer, SizeOf(Buffer), chr(00));

Remember: #XX ~= Chr(XX)

ETA: Well, it so hapens I was wrong. There is an overload with a byte value and another with a boolean. So yes, you can write:
FillChar(Buffer, SizeOf(Buffer), 0);
if you prefer it; or you can even use:
FillByte(Buffer, SizeOf(Buffer), 0);
or if your buffer size is evenly divisible by two:
FillWord(Buffer, SizeOf(Buffer) div 2, 0);
or if it's evenly divisible by four:
FillFWord(Buffer, SizeOf(Buffer) div 4, 0);

So many options ...

« Last Edit: December 29, 2018, 03:28:07 pm by lucamar »
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!)
Lazarus 1.8.4 & 2.0.2 w/FPC 3.0.4 on:
(K|L)Ubuntu 12..16, Windows XP SP3, various DOSes.

#### Xor-el

• Sr. Member
• Posts: 338
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #6 on: December 29, 2018, 03:28:32 pm »
As to whether HashLib4Pascal is more suitable for you want, that is left for you to decide but one secret I can let you in on is that HashLib4Pascal is designed with a fluent/fluid interface that helps you achieve what you want in one simple line of code.

An example below

Code: Pascal  [Select]
1. uses
2. HlpHashFactory;
3.
4. ......
5. ......
6. ......
7.
8. begin
9.  result := THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
10. end;
11.

#### Hi im Pascal

• New member
• Posts: 29
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #7 on: December 29, 2018, 04:25:04 pm »
@lucamar, thanks, I stuck with this, as I find it semantically most fitting:
Code: Pascal  [Select]
1. FillByte(buffer, Sizeof(buffer), 0);

Code: Pascal  [Select]
1. uses
2. HlpHashFactory;
3.
4. begin
5.  result := THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
6. end;
7.
Well that certainly looks much more attractive than rolling my own code... But how can I compute several hash functions simultaneously, without having to stream the file each time? So in my code I would do something like this:
Code: Pascal  [Select]
1. while bytesRead <> 0 do
2.     begin
8.       // ...
9.     end;
Does that make sense? Is there a way to compute multiple hash functions from a file with HashLib4Pascal? I assume with the following code, the file is read twice:
Code: Pascal  [Select]
1.     THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
2.     THashFactory.TCrypto.CreateSHA1().ComputeFile('filename.txt').ToString();
3.

Thanks!
« Last Edit: December 29, 2018, 04:26:54 pm by Hi im Pascal »

#### Xor-el

• Sr. Member
• Posts: 338
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #8 on: December 29, 2018, 05:21:07 pm »

Well that certainly looks much more attractive than rolling my own code... But how can I compute several hash functions simultaneously, without having to stream the file each time? So in my code I would do something like this:
Code: Pascal  [Select]
1. while bytesRead <> 0 do
2.     begin
8.       // ...
9.     end;
Does that make sense? Is there a way to compute multiple hash functions from a file with HashLib4Pascal? I assume with the following code, the file is read twice:
Code: Pascal  [Select]
1.     THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString();
2.     THashFactory.TCrypto.CreateSHA1().ComputeFile('filename.txt').ToString();
3.

Thanks!

Here is a quick console code sample that does what you want.

Code: Pascal  [Select]
1. program HashFile;
2.
3. {\$MODE DELPHI}
4.
5. uses
6.   Classes,
7.   SysUtils,
8.   HlpHashFactory,
9.   HlpIHash;
10.
11.   procedure DoHashFile(const AFileName: string);
12.   var
13.     LMD5, LSHA1, LSHA2_256, LSHA2_512: IHash;
14.     LFileStream: TFileStream;
15.     LBuffer: array[0 .. 1024 * 1024] of Byte;
17.   begin
18.     if FileExists(AFileName) then
19.     begin
20.       LFileStream := TFileStream.Create(AFileName, fmOpenRead or fmShareDenyWrite);
21.       LFileStream.Position := 0;
22.       System.FillChar(LBuffer, System.SizeOf(LBuffer), Byte(0));
23.       LMD5 := THashFactory.TCrypto.CreateMD5();
24.       LSHA1 := THashFactory.TCrypto.CreateSHA1();
25.       LSHA2_256 := THashFactory.TCrypto.CreateSHA2_256();
26.       LSHA2_512 := THashFactory.TCrypto.CreateSHA2_512();
27.
28.       LMD5.Initialize();
29.       LSHA1.Initialize();
30.       LSHA2_256.Initialize();
31.       LSHA2_512.Initialize();
32.       try
34.         while LBytesRead <> 0 do
35.         begin
37.
42.         end;
43.
44.         WriteLn(Format('MD5 Hash of "%s" is "%s" "%s"',
45.           [AFileName, LMD5.TransformFinal().ToString(), SLineBreak]));
46.
47.         WriteLn(Format('SHA1 Hash of "%s" is "%s" "%s"',
48.           [AFileName, LSHA1.TransformFinal().ToString(), SLineBreak]));
49.
50.         WriteLn(Format('SHA2_256 Hash of "%s" is "%s" "%s"',
51.           [AFileName, LSHA2_256.TransformFinal().ToString(), SLineBreak]));
52.
53.         WriteLn(Format('SHA2_512 Hash of "%s" is "%s" "%s"',
54.           [AFileName, LSHA2_512.TransformFinal().ToString(), SLineBreak]));
55.
56.       finally
57.         LFileStream.Free;
58.       end;
59.     end
60.     else
61.     begin
63.     end;
64.   end;
65.
66.
67.
68. begin
69.   DoHashFile(ParamStr(0));
71. end.
72.

#### Hi im Pascal

• New member
• Posts: 29
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #9 on: December 29, 2018, 08:51:08 pm »
@ Xor-el, thanks very much, I can really use that code.

I played around a bit and realized that reading the file is only a tiny portion, most time is spent calculating the hashes. However, with the manual loop I can also add a progress bar, which would be hard to do with the one-liner solution.

I really should split each hash calculation into its own thread, but I'll leave that as an exercise to myself.

P.S.: I stumbled upon this: https://software.intel.com/en-us/articles/intel-sha-extensions, would it be possible to use inline assembly and apply those instructions to calculate the hashes (if supported)? Just an idea, this would probably be too complicated for me with my limited Pascal experience.

#### Xor-el

• Sr. Member
• Posts: 338
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #10 on: December 29, 2018, 10:08:19 pm »

P.S.: I stumbled upon this: https://software.intel.com/en-us/articles/intel-sha-extensions, would it be possible to use inline assembly and apply those instructions to calculate the hashes (if supported)? Just an idea, this would probably be too complicated for me with my limited Pascal experience.

while this is possible (compiling the C/C++ Compiler Intrinsics and and embedding the "db" instructions or assembler in our code), this is not a field I want to venture in at the moment.
Also while you can get optimized implementations of some hashes using inline ASM from Mormot SynCrypto.pas, this will only benefit your program on platforms which support ASM codes (in this case, Intel).

#### Hi im Pascal

• New member
• Posts: 29
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #11 on: December 30, 2018, 02:10:36 am »
I think the instructions are available on AMD Ryzen as well.

I tried to gain performance by multi-threading, but it didn't work (worse performance). Too much you can do wrong, and I'm not really good with Pascal yet. So now I just do the computation in a single thread, and update a progress bar. It's not a big deal waiting several seconds to hash a several GB file IMHO. I am very happy with the end result. In case anyone is interested, here is the code:

Code: Pascal  [Select]
1. const
2.   BUFFER_SIZE = 1024 * 1024;
3.
4. type
5.   TFileBuffer = array[0 .. BUFFER_SIZE - 1] of byte;
6.
8. type
10.   public
11.     constructor Create(createSuspended: boolean; const fileName: string);
12.
13.   protected
14.     procedure Execute; override;
15.
16.   private
17.     filePath: string;
18.     fileStream: TFileStream;
19.     sha1: IHash;
20.     sha256: IHash;
21.
22.     procedure UpdateProgress;
23.     procedure ResetStatus;
24.     procedure ExportHashes;
25.   end;
26.
28.
29. constructor TCalcFileHashesThread.Create(createSuspended: boolean; const fileName: string);
30. begin
31.   inherited Create(createSuspended);
32.   FreeOnTerminate := True;
33.
34.   filePath := fileName;
35.   fileStream := nil;
36.   sha1 := nil;
37.   sha256 := nil;
38. end;
39.
41. var
42.   buffer: TFileBuffer;
44. begin
45.   if FileExists(filePath) then begin
46.     fileStream := TFileStream.Create(filePath, fmOpenRead or fmShareDenyWrite);
47.     try
48.       sha1 := THashFactory.TCrypto.CreateSHA1();
49.       sha256 := THashFactory.TCrypto.CreateSHA2_256();
50.       sha1.Initialize();
51.       sha256.Initialize();
52.       while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
56.         Synchronize(@UpdateProgress);
57.       end;
58.       if not Terminated then
59.         Synchronize(@ExportHashes);
60.     finally
61.       fileStream.Free;
62.     end;
63.   end;
64.   Synchronize(@ResetStatus);
65. end;
66.
68. begin
69.   Form1.EdSHA1.Text := LowerCase(sha1.TransformFinal.ToString());
70.   Form1.EdSHA256.Text := LowerCase(sha256.TransformFinal.ToString());
71. end;
72.
74. begin
75.   Form1.ProgressBar.Position :=
76.     Round(Form1.ProgressBar.Max * fileStream.Position / Real(fileStream.Size));
77. end;
78.
80. begin
81.   Form1.Button1.Caption := 'Compute && Verify';
82. end;
83.
84. procedure TForm1.Button1Click(Sender: TObject);
85. begin
86.   if Button1.Caption = 'Cancel' then begin
89.     Progressbar.Position := 0;
90.   end else begin
92.     Button1.Caption := 'Cancel';
93.   end;
94. end;

Thanks for all the help!

#### engkin

• Hero Member
• Posts: 2513
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #12 on: December 30, 2018, 05:28:56 am »
I think you are calling UpdateProgress too much. Also, using Synchronize defeats the purpose as the thread will wait for the main GUI thread to finish running UpdateProgress before it handles the next amount. Measure hashing performance without calling UpdateProgress.

I would also try each hash in a separate thread, since modern computers have a few cores.

#### Hi im Pascal

• New member
• Posts: 29
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #13 on: December 30, 2018, 02:01:49 pm »
I think you are calling UpdateProgress too much. Also, using Synchronize defeats the purpose as the thread will wait for the main GUI thread to finish running UpdateProgress before it handles the next amount. Measure hashing performance without calling UpdateProgress.

I would also try each hash in a separate thread, since modern computers have a few cores.
The thread is just to keep the GUI responsive while the computation is running (Cancel Button). So I wouldn't say that it defeats the purpose, but I agree it might cost a little performance, which I am willing to sacrifice for a progress bar (I could not measure any significant difference, so I wouldn't say it's a major bottleneck). I don't know a way to measure the progress from within the main GUI thread, if the computation is happening in a different thread.

I tried creating a Thread for the calls to TransformUntyped, but if anything it ran slower, I guess creating and destroying the threads all the time might be the problem. I hope I wasn't copying any buffers, but then again I haven't really understood the copy/reference/pointer semantics in Pascal as I do in C++. It looked like this:

Code: Pascal  [Select]
1.         while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
7.           //Synchronize(@UpdateProgress);
8.         end;
9.
11.
12. type
13.
15.   public
16.     constructor Create(hash: IHash, var buffer: TFileBuffer, count: integer);
17.
18.   protected
19.     procedure Execute; override;  // called hashFn.TransformUntyped(buf, numBytes);
20.
21.   private
22.     hashFn: IHash;
23.     var buf: TFileBuffer;
24.     numBytes: integer;
25.

What did gain some performance is doing the whole
Code: [Select]
`ComputeFile('').ToString()` in separate threads, albeit the file is read multiple times and I don't have my progress bar. I'm suprised anyway that's faster, but it is (Before 17 seconds for both hashes, now 11 seconds, which is how long the SHA256 takes).
« Last Edit: December 30, 2018, 02:15:49 pm by Hi im Pascal »

#### engkin

• Hero Member
• Posts: 2513
##### Re: [Code Review] SHA-256 of a file (hash verification tool for downloads)
« Reply #14 on: December 30, 2018, 02:52:06 pm »
The thread is just to keep the GUI responsive while the computation is running (Cancel Button).
So I wouldn't say that it defeats the purpose, but I agree it might cost a little performance, which I am willing to sacrifice for a progress bar (I could not measure any significant difference, so I wouldn't say it's a major bottleneck). I don't know a way to measure the progress from within the main GUI thread, if the computation is happening in a different thread.
I would add a property to the thread to hold the progress. The main thread can access the value using a timer.

I tried creating a Thread for the calls to TransformUntyped, but if anything it ran slower, I guess creating and destroying the threads all the time might be the problem.
Correct, creating/destroying threads takes a lot of time. Use one thread for each hash. Create it only once at the beginning.

I hope I wasn't copying any buffers, but then again I haven't really understood the copy/reference/pointer semantics in Pascal as I do in C++. It looked like this:

Code: Pascal  [Select]
1.         while (not Terminated) and (fileStream.Position < fileStream.Size) do begin
7.           //Synchronize(@UpdateProgress);
8.         end;
9.
11.
12. type
13.
15.   public
16.     constructor Create(hash: IHash, var buffer: TFileBuffer, count: integer);
17.
18.   protected
19.     procedure Execute; override;  // called hashFn.TransformUntyped(buf, numBytes);
20.
21.   private
22.     hashFn: IHash;
23.     var buf: TFileBuffer;
24.     numBytes: integer;
25.
Yes, creating the thread so many times, and also calling WaitFor makes it slow because the faster thread now has to wait for the slower thread before it process the next amount.

What did gain some performance is doing the whole
Code: [Select]
`ComputeFile('').ToString()` in separate threads, albeit the file is read multiple times and I don't have my progress bar. I'm suprised anyway that's faster, but it is (Before 17 seconds for both hashes, now 11 seconds, which is how long the SHA256 takes).
It is read one time, the second time it will be provided from the cache.