Recent

Author Topic: Generics.Collections.TDictionary  (Read 3114 times)

Tony Stone

  • Sr. Member
  • ****
  • Posts: 269
Generics.Collections.TDictionary
« on: December 30, 2024, 02:11:35 am »

Hello all,

I'm working on a project using latest trunk version of Lazaraus anf FPC and Generics.Collections.TDictionary on Linux for hash-based lookups. While benchmarking, I noticed some counter-intuitive performance behavior that I’d like clarification on.  ChatGPT gave me some explanations that really don't have me convinced such as fewer buckets on larger datasets which may make sense but I would like to hear from developers here.

Setup:
I'm loading datasets of different sizes into a TDictionary<string, integer> to simulate lookups.
My test repeatedly searches for a constant key (HashMapLookup function).


Observations:
With 18 million items in the dictionary, I achieve 37 million searches per second.
With 39,000 items, the search speed drops significantly to around 12 million searches per second.

Additional Notes:
I'm using a multi-threaded test loop for benchmarking.
The search key does not exist in the dictionary.
Increasing the dictionary's initial capacity with
Code: Pascal  [Select][+][-]
  1. SearchItemsDict := TStringIntDictionary.Create(1024 * 1024);
does not seem to have an impact on the results.

The search function (HashMapLookup) relies on TDictionary.TryGetValue:

Code: Pascal  [Select][+][-]
  1. if Map.TryGetValue(Key, FoundIndex) then
  2.   Result := FoundIndex
  3. else
  4.   Result := -1;
  5.  

Questions:
  • Why would a smaller dataset result in slower search speeds?
  • Is there something about TDictionary's internal hash function or bucket allocation that could explain this?
  • Are there any best practices or alternative hash table implementations in Free Pascal for high-performance lookups?
  • Any insights would be greatly appreciated. Thanks in advance!

ALLIGATOR

  • Full Member
  • ***
  • Posts: 148
Re: Generics.Collections.TDictionary
« Reply #1 on: December 30, 2024, 03:54:26 am »
It would be handy - if you prepared a small test project, to test what you are talking about

cdbc

  • Hero Member
  • *****
  • Posts: 2105
    • http://www.cdbc.dk
Re: Generics.Collections.TDictionary
« Reply #2 on: December 30, 2024, 09:36:40 am »
Hi
Did you take into account, the fact that hash-tables LOVE 'Prime-numbers' as a /capacity-high/...?!?
Maybe the quick one is a prime and the slow one isn't...  %)
Also, there's 'LGenerics' library which can be found on 'Github' and the /not-generics/ one to be found in 'contnrs' unit...
By the way, see if you can buy/lend a copy of Julian Bucknall's excellent book: "Tomes of Delphi - Algorithms and Data-structures", great stuff in there, about hashing amo.
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE5 -> FPC 3.2.2 -> Lazarus 3.6 up until Jan 2024 from then on it's both above &: KDE5/QT5 -> FPC 3.3.1 -> Lazarus 4.99

Tony Stone

  • Sr. Member
  • ****
  • Posts: 269
Re: Generics.Collections.TDictionary
« Reply #3 on: December 30, 2024, 03:21:53 pm »
So I am trying to make sense of your suggestion on using a prime number as a capacity.  That is really interesting.  So there is less chances of collisions if it is a prime number is what I have gathered?  I will experiment with that tonight.
Hi
Did you take into account, the fact that hash-tables LOVE 'Prime-numbers' as a /capacity-high/...?!?
Maybe the quick one is a prime and the slow one isn't...  %)
Also, there's 'LGenerics' library which can be found on 'Github' and the /not-generics/ one to be found in 'contnrs' unit...
By the way, see if you can buy/lend a copy of Julian Bucknall's excellent book: "Tomes of Delphi - Algorithms and Data-structures", great stuff in there, about hashing amo.
Regards Benny

cdbc

  • Hero Member
  • *****
  • Posts: 2105
    • http://www.cdbc.dk
Re: Generics.Collections.TDictionary
« Reply #4 on: December 30, 2024, 04:54:26 pm »
Hi
The hash-implementations in 'contnrs' unit, has this modus operandi:
When you have to select an upper capacity, you take the user's wish and find the nearest higher number in this table:
Code: Pascal  [Select][+][-]
  1. const
  2.   NPRIMES = 28;
  3.  
  4.   PRIMELIST: array[0 .. NPRIMES-1] of Longword =
  5.   ( 53,         97,         193,       389,       769,
  6.     1543,       3079,       6151,      12289,     24593,
  7.     49157,      98317,      196613,    393241,    786433,
  8.     1572869,    3145739,    6291469,   12582917,  25165843,
  9.     50331653,   100663319,  201326611, 402653189, 805306457,
  10.     1610612741, 3221225473, 4294967291 );  
...and use that as your capacity...
The above table is an excerpt from the 'contnrs' unit!
I use the classes from 'contnrs' and they're pretty fast.
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE5 -> FPC 3.2.2 -> Lazarus 3.6 up until Jan 2024 from then on it's both above &: KDE5/QT5 -> FPC 3.3.1 -> Lazarus 4.99

 

TinyPortal © 2005-2018