you should use:Thanks, I changed it. Is there any benefit to using the control string #00 as opposed to simply writing 0?
FillChar(buffer, Sizeof(buffer), #00); FillChar(hashBuf, Sizeof(hashBuf), #00);
FillChar() is quicker than iterating in a loop. IIRC there are other similar functions but I don't remember their name atm.
One thing I find odd here is the buffer of 1kb you used, that is too small even for low end devices.Actually I'm using the size 1024², which should be around 1MB, or am I mistaking something? I do feel my code could be faster though, it takes a while to hash a few GB (several seconds). Not sure how fast that should actually be running.
I suggest you use something like 32 or 64kb buffers at worst.
unit1.pas(85,1) Fatal: Syntax error, "BEGIN" expected but "identifier GENERIC" found
So I think the compiler doesn't support that yet, since I copied that from another forum post.Thanks, I changed it. Is there any benefit to using the control string #00 as opposed to simply writing 0?
Well that certainly looks much more attractive than rolling my own code... But how can I compute several hash functions simultaneously, without having to stream the file each time? So in my code I would do something like this:
uses HlpHashFactory; begin result := THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString(); end;
Well that certainly looks much more attractive than rolling my own code... But how can I compute several hash functions simultaneously, without having to stream the file each time? So in my code I would do something like this:Does that make sense? Is there a way to compute multiple hash functions from a file with HashLib4Pascal? I assume with the following code, the file is read twice:
while bytesRead <> 0 do begin bytesRead := streamIn.Read(buffer[0], Sizeof(buffer)); sha256.Update(buffer[0], bytesRead); sha512.Update(buffer[0], bytesRead); sha1.Update(buffer[0], bytesRead); md5.Update(buffer[0], bytesRead); // ... end;
THashFactory.TCrypto.CreateSHA2_256().ComputeFile('filename.txt').ToString(); THashFactory.TCrypto.CreateSHA1().ComputeFile('filename.txt').ToString();
Thanks!
P.S.: I stumbled upon this: https://software.intel.com/en-us/articles/intel-sha-extensions, would it be possible to use inline assembly and apply those instructions to calculate the hashes (if supported)? Just an idea, this would probably be too complicated for me with my limited Pascal experience.
I think you are calling UpdateProgress too much. Also, using Synchronize defeats the purpose as the thread will wait for the main GUI thread to finish running UpdateProgress before it handles the next amount. Measure hashing performance without calling UpdateProgress.The thread is just to keep the GUI responsive while the computation is running (Cancel Button). So I wouldn't say that it defeats the purpose, but I agree it might cost a little performance, which I am willing to sacrifice for a progress bar (I could not measure any significant difference, so I wouldn't say it's a major bottleneck). I don't know a way to measure the progress from within the main GUI thread, if the computation is happening in a different thread.
I would also try each hash in a separate thread, since modern computers have a few cores.
ComputeFile('').ToString()
in separate threads, albeit the file is read multiple times and I don't have my progress bar. I'm suprised anyway that's faster, but it is (Before 17 seconds for both hashes, now 11 seconds, which is how long the SHA256 takes).
The thread is just to keep the GUI responsive while the computation is running (Cancel Button).I would add a property to the thread to hold the progress. The main thread can access the value using a timer.
So I wouldn't say that it defeats the purpose, but I agree it might cost a little performance, which I am willing to sacrifice for a progress bar (I could not measure any significant difference, so I wouldn't say it's a major bottleneck). I don't know a way to measure the progress from within the main GUI thread, if the computation is happening in a different thread.
I tried creating a Thread for the calls to TransformUntyped, but if anything it ran slower, I guess creating and destroying the threads all the time might be the problem.Correct, creating/destroying threads takes a lot of time. Use one thread for each hash. Create it only once at the beginning.
I hope I wasn't copying any buffers, but then again I haven't really understood the copy/reference/pointer semantics in Pascal as I do in C++. It looked like this:Yes, creating the thread so many times, and also calling WaitFor makes it slow because the faster thread now has to wait for the slower thread before it process the next amount.
while (not Terminated) and (fileStream.Position < fileStream.Size) do begin bytesRead := fileStream.Read(buffer, Sizeof(buffer)); thread_SHA1 := TTransformHashThread.Create(sha1, buffer, bytesRead); // starts unsuspended thread_SHA256 := TTransformHashThread.Create(sha256, buffer, bytesRead); thread_SHA1.WaitFor; thread_SHA256.WaitFor; //Synchronize(@UpdateProgress); end; { TTransformHashThread } type TTransformHashThread = class(TThread) public constructor Create(hash: IHash, var buffer: TFileBuffer, count: integer); protected procedure Execute; override; // called hashFn.TransformUntyped(buf, numBytes); private hashFn: IHash; var buf: TFileBuffer; numBytes: integer;
What did gain some performance is doing the wholeIt is read one time, the second time it will be provided from the cache.Code: [Select]ComputeFile('').ToString()
in separate threads, albeit the file is read multiple times and I don't have my progress bar. I'm suprised anyway that's faster, but it is (Before 17 seconds for both hashes, now 11 seconds, which is how long the SHA256 takes).
Is accessing the Thread object thread-safe?If you design it right, then it should be thread-safe.
I mean, I could be writing the progress variable from inside the thread and reading it to update my progress bar from the main GUI thread at the same time.It would be thread-safe to call GetProgress from within the thread loop itself to update a variable. Simply retrieve the value using InterLockedExchange or a similar approach. There is an example of this here (https://forum.lazarus.freepascal.org/index.php/topic,40163.msg277445.html#msg277445) by ASerge. It is easy to expand it to get the overall progress if you have more than one thread, like one thread for sha1 and the other for sha256, or more than one file on a system with a few cores.
For example, if I create a function 'GetProcess' which returns fileStream.position / Real(fileStream.Size), is it guaranteed that reading fileStream.position is thread safe e.g. what if I read from inside the thread at the same time and position is updated. I don't think updating fileStream.position would be an atomic operation by nature, unless TFileStream makes sure of it.
As far as the caching goes, are you referring to caching the operating system would do? Because I can't find any caching code inside HashLib4Pascal's IHash.TransformStream.Yes, the OS and the HD itself.
Additionally, since some hashing algorithms run much faster than others, I wonder how you could effectively cache the reads.I doubt that you need to cache any reads in this specific case. To confirm, measure how long it takes to:
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.).My guess it runs slow intentionally to make timing attacks difficult. You need to use non-cryptographic hashing if you seek speed.
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.).My guess it runs slow intentionally to make timing attacks difficult. You need to use non-cryptographic hashing if you seek speed.
Yes, you are right. Like in sha1ProcessChunk (https://code.woboq.org/qt5/qtbase/src/3rdparty/sha1/sha1.cpp.html#141). It repeats the process of copying to a local variable at the begining and zeroing at the end for every 64 bytes of the hashed data. So you really need to compare FPC with non-cryptographic lib.
Hashing, in addition to your usage to verify downloads, is also used in a step to process passwords before saving the result to a database. So I was wrong about assuming protection against timing attack, but still being a cryptographic lib it tries to not leave memory trace.
P.S.:
Since I have experience with C++ and Qt, I threw together the same application using QCryptographicHash with QtCreator and C++... It runs slower than the FPC/Lazarus app :o) I used the same block size (1024*1024) and my loop looks basically the same, except using QFile::read and QCryptographicHash. I'm guessing the difference comes from the hash function implementation, I doub't there is a significant difference in the file reading or the concurrency (std::async vs TThread), but perhaps I overlooked something (could be a mistake in concurrency etc.). The only thing I think is better about Qt is the Horizontal/Vertical Layout System in QDesigner makes it really easy to lay out the controls, compared to specifying all the anchors manually. Basically creating the layout in Qt: 30 seconds, creating the layout with Lazarus: 5 minutes. lol.
Lazarus follows the GTK theme set perfectly, whereas Qt just draws its own thing and looks completely out of place... Another thing I really like about LCL so far.