Lazarus

Free Pascal => General => Topic started by: Dibo on December 12, 2014, 07:34:23 pm

Title: [SOLVED] Hash list with integer as key
Post by: Dibo on December 12, 2014, 07:34:23 pm
Hi,

I need a list where I add some integers. Size of list can be very
small but also very big (>10k). I need fast solution to quickly check
if integer already exists. TList is too slow. Im using now TFPHashList
with key as integer converted to string. Is exist something with key
as integer? Because converting integer to string cost CPU, not much if
used periodically but I also need compare two big hashed lists.

Regards
Title: Re: Hash list with integer as key
Post by: Blaazen on December 12, 2014, 08:27:39 pm
Maybe TFPGMap from unit FGL?
Title: Re: Hash list with integer as key
Post by: Dibo on December 12, 2014, 08:54:12 pm
Yes I thought about this too but I saw fight on news group that it is not hashed, just sequence search like in TList. Didn't check speed in my own tests yet
Title: Re: Hash list with integer as key
Post by: MathMan on December 12, 2014, 09:29:03 pm
Hi,

I need a list where I add some integers. Size of list can be very
small but also very big (>10k). I need fast solution to quickly check
if integer already exists. TList is too slow. Im using now TFPHashList
with key as integer converted to string. Is exist something with key
as integer? Because converting integer to string cost CPU, not much if
used periodically but I also need compare two big hashed lists.

Regards

Hi Dibo,

Just to check that I got it right - the items in the list are integers (32 / 64 bit?), you need to check if an integer is in the list and you need to compare two lists. And all this is time critical so you are looking for the fastest solution.

First I have a question regarding the comparison you mention. Shall the comparison only state if two lists are equal or do you need a comparison that shows if elements are in one list but not the other?

Regarding your approach - calculation of a hash value (at least a reasonable one that does not generate too many collisions) is time consuming. The values you maintain in the list can easily be compared directly - so you have a fast comparison of items. In this case I would guess that a balanced binary tree is faster - even for the sizes you mention (>10k entries). The comparison of two balanced binary trees is slower than comparing two hash lists - so the decision which way to go also depends on how often you need to compare full lists.

Kind regards,
MathMan
Title: Re: Hash list with integer as key
Post by: Windsurfer on December 12, 2014, 10:34:11 pm
If the list is sorted, a binary search is fastest.

If this is the case, I have a routine to search a sorted array of strings TDateTime values that could be modified.
Title: Re: Hash list with integer as key
Post by: Leledumbo on December 13, 2014, 08:37:32 am
Maybe TFPGMap from unit FGL?
It's a list with map-like behavior, not a hash based map.

Take a look at fcl-stl's ghashmap unit (need fpc trunk for best result, latest stable is not fully supported).
Title: Re: Hash list with integer as key
Post by: mse on December 13, 2014, 01:12:39 pm
MSEgui has tintegerhashdatalist, lib/common/kernel/msehash.pas.
Title: Re: Hash list with integer as key
Post by: Blaazen on December 13, 2014, 01:37:13 pm
Quote
It's a list with map-like behavior, not a hash based map.

I guess the only difference is that you have to hash keys yourself (and sort them, otherwise binary search fails).
Title: Re: Hash list with integer as key
Post by: marcov on December 13, 2014, 07:38:34 pm
Quote
It's a list with map-like behavior, not a hash based map.

I guess the only difference is that you have to hash keys yourself (and sort them, otherwise binary search fails).

And that makes insertion performance going down the drain.
Title: Re: Hash list with integer as key
Post by: Blaazen on December 13, 2014, 08:03:18 pm
OK, instead of
Code: [Select]
Add();
Sort();
you should do
Code: [Select]
Find();
InsertKeyData();
The hashmap has to do it too (internally).
But yes, if fcl-stl has implemented hash-map, then it's the choice.
Title: Re: Hash list with integer as key
Post by: MathMan on December 13, 2014, 08:06:32 pm
Quote
It's a list with map-like behavior, not a hash based map.

I guess the only difference is that you have to hash keys yourself (and sort them, otherwise binary search fails).

And that makes insertion performance going down the drain.

Exact - and that's why I initially talked about a "balanced binary tree" as alternative. Seems this has somehow magically turned to "sorted list with binary search" on the way - yeah I love chinese whispers even on a written forum  ;)
Title: Re: Hash list with integer as key
Post by: Dibo on December 16, 2014, 10:40:48 pm
Sorry for delay. I finally used TFPGMap<Int64, Byte> . I compared speed and memory use with TList, TStringList and TFPHashList. TFPGMap wins in all tests. Filling 1 million items is same quick as TList and check if exists beat all other lists. Memory use also low. Very handy class. Problem solved
Title: Re: Hash list with integer as key
Post by: Leledumbo on December 16, 2014, 11:08:47 pm
Sorry for delay. I finally used TFPGMap<Int64, Byte> . I compared speed and memory use with TList, TStringList and TFPHashList. TFPGMap wins in all tests. Filling 1 million items is same quick as TList and check if exists beat all other lists. Memory use also low. Very handy class. Problem solved
Interesting! Would you mind sharing how you compare them? I see you didn't put THashMap to the list, any reason?
Title: Re: [SOLVED] Hash list with integer as key
Post by: Dibo on December 17, 2014, 01:12:24 am
Well, I just used EpickTimer. This is my whole unit, nothing amazing, just connect events to buttons:
Code: Pascal  [Select][+][-]
  1. unit Unit1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls, EpikTimer,
  9.   contnrs, fgl;
  10.  
  11. type
  12.  
  13.   TMyMap = specialize TFPGMap<Int64, Byte>;
  14.  
  15.   { TForm1 }
  16.  
  17.   TForm1 = class(TForm)
  18.     Button1: TButton;
  19.     Button2: TButton;
  20.     Button3: TButton;
  21.     Button4: TButton;
  22.     procedure Button1Click(Sender: TObject);
  23.     procedure Button2Click(Sender: TObject);
  24.     procedure Button3Click(Sender: TObject);
  25.     procedure Button4Click(Sender: TObject);
  26.   private
  27.     { private declarations }
  28.     tm: TEpikTimer;
  29.     l1: TList;
  30.     l2: TStringList;
  31.     l3: TFPHashList;
  32.     l4: TMyMap;
  33.   public
  34.     { public declarations }
  35.     procedure AfterConstruction; override;
  36.     procedure BeforeDestruction; override;
  37.   end;
  38.  
  39. var
  40.   Form1: TForm1;
  41.  
  42. implementation
  43.  
  44. {$R *.lfm}
  45.  
  46. const
  47.   MAX_ITEMS = 1000000;
  48.   FIND_ITEM = 500000;
  49.  
  50. { TForm1 }
  51.  
  52. procedure TForm1.Button1Click(Sender: TObject);
  53. var
  54.   i: Integer;
  55. begin
  56.   tm.Clear();
  57.   tm.Start();
  58.   for i:=1 to MAX_ITEMS do
  59.     l1.Add(Pointer(i));
  60.   tm.Stop();
  61.   WriteLn('List fill time: ', tm.ElapsedStr);
  62.   tm.Clear();
  63.   tm.Start();
  64.   l1.IndexOf(pointer(FIND_ITEM));
  65.   tm.Stop();
  66.   WriteLn('List find time: ', tm.ElapsedStr);
  67. end;
  68.  
  69. procedure TForm1.Button2Click(Sender: TObject);
  70. var
  71.   i: Integer;
  72. begin
  73.   tm.Clear();
  74.   tm.Start();
  75.   for i:=1 to MAX_ITEMS do
  76.     l2.Add(IntToStr(i));
  77.   tm.Stop();
  78.   WriteLn('String fill time: ', tm.ElapsedStr);
  79.   tm.Clear();
  80.   tm.Start();
  81.   l2.Find(IntToStr(FIND_ITEM), i);
  82.   tm.Stop();
  83.   WriteLn('String find time: ', tm.ElapsedStr);
  84. end;
  85.  
  86. procedure TForm1.Button3Click(Sender: TObject);
  87. var
  88.   i: Integer;
  89. begin
  90.   tm.Clear();
  91.   tm.Start();
  92.   for i:=1 to MAX_ITEMS do
  93.     l3.Add(IntToStr(i), nil);
  94.   tm.Stop();
  95.   WriteLn('Hash fill time: ', tm.ElapsedStr);
  96.   tm.Clear();
  97.   tm.Start();
  98.   l3.Find(IntToStr(FIND_ITEM));
  99.   tm.Stop();
  100.   WriteLn('Hash find time: ', tm.ElapsedStr);
  101. end;
  102.  
  103. procedure TForm1.Button4Click(Sender: TObject);
  104. var
  105.   i: Integer;
  106. begin
  107.   tm.Clear();
  108.   tm.Start();
  109.   for i:=1 to MAX_ITEMS do
  110.     l4.Add(i, 0);
  111.   tm.Stop();
  112.   WriteLn('Map fill time: ', tm.ElapsedStr);
  113.   tm.Clear();
  114.   tm.Start();
  115.   Writeln(l4.Find(FIND_ITEM+1, i));
  116.   writeln(i);
  117.   tm.Stop();
  118.   WriteLn('Map find time: ', tm.ElapsedStr);
  119. end;
  120.  
  121. procedure TForm1.AfterConstruction;
  122. begin
  123.   inherited AfterConstruction;
  124.   tm := TEpikTimer.Create(nil);
  125.   l1 := TList.Create;
  126.   l2 := TStringList.Create;
  127.   l3 := TFPHashList.Create;
  128.   l4 := TMyMap.Create;
  129. end;
  130.  
  131. procedure TForm1.BeforeDestruction;
  132. begin
  133.   l1.Free;
  134.   l2.Free;
  135.   l3.Free;
  136.   l4.Free;
  137.   tm.Free;
  138.   inherited BeforeDestruction;
  139. end;
  140.  
  141. end.
  142.  

For RAM, I just checked task manager after button click
Title: Re: [SOLVED] Hash list with integer as key
Post by: Leledumbo on December 17, 2014, 04:56:50 am
Well, I just used EpickTimer. This is my whole unit, nothing amazing, just connect events to buttons:
Tested myself, it's amazing to know how TFPGMap can beat all other maps though this might be key type specific. THashmap depends much on the hash function, but even a simple `a mod n` is not enough to beat TFPGMap. The final result is a tight gap on insertion, on searching both are equal.
Title: Re: [SOLVED] Hash list with integer as key
Post by: MathMan on December 17, 2014, 11:50:08 pm
Well, I just used EpickTimer. This is my whole unit, nothing amazing, just connect events to buttons:
Tested myself, it's amazing to know how TFPGMap can beat all other maps though this might be key type specific. THashmap depends much on the hash function, but even a simple `a mod n` is not enough to beat TFPGMap. The final result is a tight gap on insertion, on searching both are equal.

Hi Leledumbo,

I got intriqued and wanted to take a look. Only found the sources in fgl.pp which are unfortunately "reluctantly commented" and an old post from 2010 when there wasn't one available. Has that changed since?

Regards,
MathMan
Title: Re: [SOLVED] Hash list with integer as key
Post by: Leledumbo on December 18, 2014, 05:41:28 am
I got intriqued and wanted to take a look. Only found the sources in fgl.pp which are unfortunately "reluctantly commented" and an old post from 2010 when there wasn't one available. Has that changed since?
Nope I guess. The latest changes to fgl unit seems to be the addition of enumerator to some classes but that's all AFAIR. No documentation yet, and the unit status seems still the same: testing unit for fpc generics support
Title: Re: [SOLVED] Hash list with integer as key
Post by: Khumarahn on August 06, 2015, 10:50:18 pm
Hi. I am a novice.

I need a reasonably fast map<string, integer>. I tried TFPGMap on a very simple example:
Code: [Select]
Program Lesson1_Program1;   

uses fgl, sysutils;

type
  TMap = Specialize TFPGMap<string, integer>;

var
  Map: TMap;
  i: longint;
Begin
 Map := TMap.Create;
 for i:=0 to 100000 do
   Map['item #' + IntToStr(i)] := i;
End.

and running it takes 1 minute and 30 seconds on a relatively new i5 processor. It looks very slow. Did I make a mistake somewhere? Or I should look for some other data structure?
Title: Re: [SOLVED] Hash list with integer as key
Post by: Leledumbo on August 07, 2015, 03:56:17 am
Did I make a mistake somewhere?
Nope, it's just not a map in the sense of hash map. TFPGMap is a list with map-like behavior. ghashmap from fcl-stl package should give what you want, but currently only available and working correctly in either 3.0 fixes branch or trunk. 2.6.4 won't do.
Title: Re: [SOLVED] Hash list with integer as key
Post by: Khumarahn on August 07, 2015, 10:25:36 am
ok, thank you much. It is very weird that map is still not implemented - pascal is an old language...
Title: Re: [SOLVED] Hash list with integer as key
Post by: marcov on August 07, 2015, 10:40:41 am
Well, it is specially the scripting languages that are big on maps because stringly typed code is the norm there.

But even considering that Free Pascal does have a back log there. As said there are various options, but none made default state yet.
Title: Re: [SOLVED] Hash list with integer as key
Post by: derek.john.evans on August 07, 2015, 10:48:13 am
There are a few <string, pointer> maps you can try.

TStringHashList in StringHashList
TFPObjectHashTable in contnrs (must set AOwnsObjects to False to store integers)

lazfglhash has this which looks interesting.
generic TLazFPGHashTable<T> = Class(TFPCustomHashTable) 

A search for 'hash' of the lazarus libraries brings up lots of options. I'd stick with the FP stuff.

This seems super fast, but allocates an object per integer.
TMap = specialize TLazFPGHashTable<Integer>;

Typecasting to a Pointer would be better/faster but then may not be classed as 100% correct coding.
Title: Re: [SOLVED] Hash list with integer as key
Post by: marcov on August 07, 2015, 11:21:12 am

and running it takes 1 minute and 30 seconds on a relatively new i5 processor. It looks very slow. Did I make a mistake somewhere? Or I should look for some other data structure?

That's slower than needed btw, even for non hashes, I do 400000 in .4s

(output of my unit test)
Code: [Select]
using blocksize: 256
Generate 400000 integers of unique random data
1 creating lightstrmap with 400000 elements containing x=random  <x as string,x as pointer> mappings      0.4070 seconds

Check: items in map:400000
2 Testing contents of randomly filled string map     0.4840 seconds
3 Removing all even elements, and checking result     0.7660 seconds
Check: items in map:200000
99 destroying lightmap

Test for 400000 items took     1.6880 seconds

(this code is an optimalization for code that used a tstringlist before, and being ordered was very important)
Title: Re: [SOLVED] Hash list with integer as key
Post by: derek.john.evans on August 07, 2015, 11:34:38 am
This speeds up the element adding.

Map.Sorted:=True;     

But, TLazFPGHashTable is still faster. TFPDataHashTable works if you typecast to Pointer.
Title: Re: [SOLVED] Hash list with integer as key
Post by: Khumarahn on August 14, 2015, 10:39:55 am
Thank you all for your advice. I found an implementation of red-black trees for pascal here:
http://www.vanwal.nl/rbtree/
This is a translation from c++ stl implementation, as far as I understood.

I used it to write a custom version of map, which works for my purposes. So far everything is very fast on tens of millions of elements.
TinyPortal © 2005-2018