Recent

Author Topic: How to get hash of entire TStringList?  (Read 1178 times)

AlexTP

  • Hero Member
  • *****
  • Posts: 2519
    • UVviewsoft
How to get hash of entire TStringList?
« on: December 07, 2024, 09:15:25 pm »
In CudaText, I want to calc some hash (any good algo, stringlist usually has 10...1000 items) for entire StringList. If hash is the same on the next call of menu-like-dialog, editor can keep old filter text in the menu-like-dialog. If hash is different, dialog will clear last filter-text.

Pls show an example how to calc the hash?
THashedStringList is not ok here - it calcs hash per each stringlist line.
Getting of hash of List.Text is not nice, better not to calculate List.Text.
« Last Edit: December 07, 2024, 09:17:40 pm by AlexTP »

Warfley

  • Hero Member
  • *****
  • Posts: 1850
Re: How to get hash of entire TStringList?
« Reply #1 on: December 07, 2024, 09:30:24 pm »
Using the hash function that java uses for strings:
Code: Pascal  [Select][+][-]
  1. function AddToHash(AHash: Integer; AValue: Integer): Integer; inline;
  2. begin
  3.   Result := AHash*31 + AValue;
  4. end;
  5.  
  6. function HashString(const AString: String): Integer;
  7. var
  8.   i: Integer;
  9. begin
  10.   Result := 0;
  11.   for i := Length(AString) downto 1 do
  12.     Result := AddToHash(Result, Ord(AString[i]));
  13. end;
  14.  
  15. function HashStrings(AStrings: TStrings): Integer;
  16. var
  17.   i: Integer;
  18. begin
  19.   Result := 0;
  20.   for i := AStrings.Count-1 downto 0 do
  21.     Result := AddToHash(Result, HashString(AStrings[i]));
  22. end;

AlexTP

  • Hero Member
  • *****
  • Posts: 2519
    • UVviewsoft
Re: How to get hash of entire TStringList?
« Reply #2 on: December 08, 2024, 09:58:38 am »
Thanks, it is good.
I only need to add AddToHash(#10) to respect EOLs.
Are you sure the algo is so simple? x*31+y.

And, it is 32-bit int. How to use Int64?

Thaddy

  • Hero Member
  • *****
  • Posts: 16349
  • Censorship about opinions does not belong here.
Re: How to get hash of entire TStringList?
« Reply #3 on: December 08, 2024, 10:54:10 am »
because it is a multiplication it is likely that it gets expanded to 64 bit anyway.
Otherwise it would raise a rangecheck error and/or overflow.
There are other simple hashes, like shift/mods, but this is indeed used in java.
Here is a simple hash without multiplication, stays 32 bit:
Code: Pascal  [Select][+][-]
  1. function SimpleHash(const S: string): Integer;inline;
  2. var
  3.   i: Integer;
  4. begin
  5.   result := 0;
  6.   for i := 1 to Length(S) do
  7.   begin
  8.     result := (result shl 5) or (result shr 27);
  9.     result := result + Ord(S[i]);
  10.   end;
  11. end;
In most cases this is also faster and has similar characteristics.
I wonder why you do not want to use the text property, because that may still be faster than per line hashing + addition.
« Last Edit: December 08, 2024, 12:10:34 pm by Thaddy »
There is nothing wrong with being blunt. At a minimum it is also honest.

AlexTP

  • Hero Member
  • *****
  • Posts: 2519
    • UVviewsoft
Re: How to get hash of entire TStringList?
« Reply #4 on: December 08, 2024, 11:49:32 am »
@Thaddy,
thanks. I like that I won't have overflow. I can adapt your code to stringlist.

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: How to get hash of entire TStringList?
« Reply #5 on: December 08, 2024, 12:07:42 pm »
This may be even better, and similar speed

Code: Pascal  [Select][+][-]
  1. function fletcher32(data: pointer; len: dword): dword;
  2. var
  3.   s0, s1: word;
  4.   i: integer;
  5. begin
  6.   s0 := 0;
  7.   s1 := 0;
  8.   i := 0;
  9.   while i < len do begin
  10.     if i+2 > len then s0 := (s0+pbyte(data+i)^) mod 65535
  11.     else s0 := (s0+pword(data+i)^) mod 65535;
  12.     s1 := (s0+s1) mod 65535;
  13.     inc(i, 2);
  14.   end;
  15.   result := s1 shl 16 or s0;
  16. end;

Thaddy

  • Hero Member
  • *****
  • Posts: 16349
  • Censorship about opinions does not belong here.
Re: How to get hash of entire TStringList?
« Reply #6 on: December 08, 2024, 12:12:00 pm »
No, because of the modulo operation it is always slower than 2 shifts + 1 or + 1 addition. These are all i cycle operations.
Mod is extremely costly and depending on cpu can take between 7 and 14 cycles. If you see a mod, it means it is slow.
Btw, forgot to add inline. Done.
« Last Edit: December 08, 2024, 12:17:36 pm by Thaddy »
There is nothing wrong with being blunt. At a minimum it is also honest.

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: How to get hash of entire TStringList?
« Reply #7 on: December 08, 2024, 12:16:47 pm »
No, because of the modulo operation it is always slower than 2 shifts + 1 or + 1 addition.
Mod is extremely costly.

Code: Pascal  [Select][+][-]
  1.   message := 'test';
  2.  
  3.   q := GetTickCount64;
  4.   for i := 1 to 100000000 do d := fletcher32(@message[1], length(message));
  5.   writeln('fletcher32 took ms ', GetTickCount64-q);
  6.  
  7.   q := GetTickCount64;
  8.   for i := 1 to 100000000 do d := SimpleHash(message);
  9.   writeln('SimpleHash took ms ', GetTickCount64-q);
  10.  
  11.   writeln;
  12.  
  13.   message := 'a little longer test string';
  14.  
  15.   q := GetTickCount64;
  16.   for i := 1 to 100000000 do d := fletcher32(@message[1], length(message));
  17.   writeln('fletcher32 took ms ', GetTickCount64-q);
  18.  
  19.   q := GetTickCount64;
  20.   for i := 1 to 100000000 do d := SimpleHash(message);
  21.   writeln('SimpleHash took ms ', GetTickCount64-q);

fletcher32 took ms 672
SimpleHash took ms 625

fletcher32 took ms 4359
SimpleHash took ms 7078



Btw, forgot to add inline. Done.

Both inlined

fletcher32 took ms 625
SimpleHash took ms 328

fletcher32 took ms 4219
SimpleHash took ms 3812

Warfley

  • Hero Member
  • *****
  • Posts: 1850
Re: How to get hash of entire TStringList?
« Reply #8 on: December 08, 2024, 12:22:11 pm »
This may be even better, and similar speed
And you get a free buffer overflow if len is not even  :D

because it is a multiplication it is likely that it gets expanded to 64 bit anyway.
Otherwise it would raise a rangecheck error and/or overflow.
There are other simple hashes, like shift/mods, but this is indeed used in java.
Here is a simple hash without multiplication, stays 32 bit:
Code: Pascal  [Select][+][-]
  1. function SimpleHash(const S: string): Integer;inline;
  2. var
  3.   i: Integer;
  4. begin
  5.   result := 0;
  6.   for i := 1 to Length(S) do
  7.   begin
  8.     result := (result shl 5) or (result shr 27);
  9.     result := result + Ord(S[i]);
  10.   end;
  11. end;
In most cases this is also faster and has similar characteristics.
I wonder why you do not want to use the text property, because that may still be faster than per line hashing + addition.
Well you also get an overflow here (or over shift to be precise). But yeah this is also a very common and easy hashing algorithm, I like the java one because it's easy to remember. In both cases overflows are expected (it's a hashing algorithm so information loss is inevitable)
« Last Edit: December 08, 2024, 12:25:25 pm by Warfley »

Thaddy

  • Hero Member
  • *****
  • Posts: 16349
  • Censorship about opinions does not belong here.
Re: How to get hash of entire TStringList?
« Reply #9 on: December 08, 2024, 12:27:33 pm »
I misread. The mod is not per line.
There is nothing wrong with being blunt. At a minimum it is also honest.

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: How to get hash of entire TStringList?
« Reply #10 on: December 08, 2024, 12:28:06 pm »
And you get a free buffer overflow if len is not even  :D

Where? :o

Warfley

  • Hero Member
  • *****
  • Posts: 1850
Re: How to get hash of entire TStringList?
« Reply #11 on: December 08, 2024, 12:29:43 pm »
Oh mistead the if statement, my bad 😅

Fibonacci

  • Hero Member
  • *****
  • Posts: 643
  • Internal Error Hunter
Re: How to get hash of entire TStringList?
« Reply #12 on: December 08, 2024, 12:30:11 pm »
 >:D

bytebites

  • Hero Member
  • *****
  • Posts: 694
Re: How to get hash of entire TStringList?
« Reply #13 on: December 08, 2024, 12:32:56 pm »
Replace mod with and?

Thaddy

  • Hero Member
  • *****
  • Posts: 16349
  • Censorship about opinions does not belong here.
Re: How to get hash of entire TStringList?
« Reply #14 on: December 08, 2024, 12:35:39 pm »
From the literature I see that Fletcher is a checksum and not a hash. I will check out the collsion probabilities as compared to simplehash, which is a hash and not a checksum.
There is nothing wrong with being blunt. At a minimum it is also honest.

 

TinyPortal © 2005-2018