Recent

Author Topic: [SOLVED] Comparing two TFPHashLists  (Read 2976 times)

Gizmo

  • Hero Member
  • *****
  • Posts: 831
[SOLVED] Comparing two TFPHashLists
« on: October 23, 2017, 06:19:37 pm »
Hi

One feature of a program I make is that it allows the user to compare the content of two folders based on hashes. Currently it does this by using stringlists of filenames and hashes. It works OK but I want to make it better - faster, more efficient.

I've looked at TFPHashList which I know is famed for being very fast and it's shortstring limitation is fine for hash algorithms with even SHA512 being 128 hex chars, but I can't quite work out how I can very quickly check if HashListA = HashlistB? Using my current implementation of stringlists, slHashListA contains all the hashes of folderA, and slHashListB contains all the hashes of FolderB. So to compare if one equals the other, I am simply passing slHashListA.Text and slHashlistB.Text to the FPC sha hasher, and comparing the result of the returned hash values. But TFPHashList has no such text property, so I'm curious to know how\if I can say "Is HashListA the same as HashListB?". And although syntacically it seems that the following is OK

Code: Pascal  [Select][+][-]
  1. HashListA.Add(HashValA, Pointer(HashValA));
  2. HashListB.Add(HashValB, Pointer(HashValB));
  3. if HashListA = HashListB then...
  4.  


I'm not so sure this is quite the same thing, because in a quick test of 3 files all having the same hash in FolderA and FolderB, it returned false even though all the hashes were the same. I'm guessing this is because each list has a key system, so each value is called by a key. So I guess the keys used in ListA will be different to the keys in ListB? That being the case, how can I quickly check if one matches the other?

There seems to very little documentation or examples for TFPHashList. There's a few threads like this one (http://forum.lazarus.freepascal.org/index.php/topic,17433.15.html) and there's the official docs (https://www.freepascal.org/docs-html/fcl/contnrs/tfphashlist.html) but examples are very few. It's hard to get my head round what values are stored in them, and in what form, and how you itterate and compare values in them etc without some basic examples. Does anyone know of any?
« Last Edit: October 24, 2017, 04:52:37 pm by Gizmo »

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Comparing two TFPHashLists
« Reply #1 on: October 23, 2017, 09:20:10 pm »
You have to write your own function to test for equality between two such lists. e.g.

Code: Pascal  [Select][+][-]
  1. function ListsHaveIdenticalStringsAndAssignedData(aHashList1, aHashlist2: TFPHashList): Boolean;
  2. var
  3.   i: Integer;
  4. begin
  5.   Result := False;
  6.   if (aHashList1.Count <> aHashlist2.Count) then
  7.     Exit;
  8.   for i := 0 to aHashList1.Count-1 do
  9.     if (aHashlist2.FindIndexOf(aHashList1.NameOfIndex(i)) < 0) then
  10.       Exit;
  11.   Result := True;
  12. end;


Note that this will only return True if the string contents of the lists are identical, and the Data pointers associated with each string are not Nil.
In your case I think you may be adding (aFilename, Nil) to the list for each of the files in your folders?
If so, you need to add a dummy Data value in the Add() procedure. It does not matter what it is, provided it is not Nil (remember to free it afterwards).

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Comparing two TFPHashLists
« Reply #2 on: October 23, 2017, 11:49:04 pm »
One feature of a program I make is that it allows the user to compare the content of two folders based on hashes. Currently it does this by using stringlists of filenames and hashes. It works OK but I want to make it better - faster, more efficient.

Don't use string representation of the hashes, use the digests themselves. As mentioned a few years ago.

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: Comparing two TFPHashLists
« Reply #3 on: October 24, 2017, 08:45:01 am »
You’re right Engkin Perhaps I need to take bull by horns and do as you originally suggested!

HoawardPC : incidentally, your suggestion worked perfectly, thanks. I just need to do some tests as to whether this is quicker than my current StringList implementation, and if it is, whether I should then look into using the digests as Engkin has suggested more than once. I'm just not entirely sure how to do that yet.
« Last Edit: October 24, 2017, 01:57:10 pm by Gizmo »

 

TinyPortal © 2005-2018