Recent

Author Topic: (hash)maps: the final benchmark  (Read 13202 times)

circular

  • Hero Member
  • *****
  • Posts: 3058
    • Personal webpage
Re: (hash)maps: the final benchmark
« Reply #15 on: October 21, 2016, 03:29:24 pm »
Of course there can be multiple entries with the same hashcode even if it is rare. I suppose you misunderstood what I said.

What I am saying is that hash can be used to sort strings in order to have a list of all strings.

See: https://en.wikipedia.org/wiki/Hash_function
Quote
Typically, the domain of a hash function (the set of possible keys) is larger than its range (the number of different table indices), and so it will map several different keys to the same index. Therefore, each slot of a hash table is associated with (implicitly or explicitly) a set of records, rather than a single record. For this reason, each slot of a hash table is often called a bucket, and hash values are also called bucket indices.
« Last Edit: October 21, 2016, 03:36:07 pm by circular »
Conscience is the debugger of the mind

Thaddy

  • Hero Member
  • *****
  • Posts: 9303
Re: (hash)maps: the final benchmark
« Reply #16 on: October 21, 2016, 04:04:52 pm »
@Circular
That is correct. The FPC compiler uses that mechanism already indeed ;) when it generates code for immutables.
I merely want to refer (and warn) that a hash is not capable of precise uniqueness.
That is a mistake that loads of programmers make and simply misguided. If you want, consider It is an approximation of uniqueness.
also related to equus asinus.

circular

  • Hero Member
  • *****
  • Posts: 3058
    • Personal webpage
Re: (hash)maps: the final benchmark
« Reply #17 on: October 21, 2016, 07:31:34 pm »
Yes, it is obvious to me that a hash is not unique.

I hear you're worried about programmers considering hashcodes to be unique.
Conscience is the debugger of the mind

BeniBela

  • Hero Member
  • *****
  • Posts: 689
    • homepage
Re: (hash)maps: the final benchmark
« Reply #18 on: October 22, 2016, 01:19:31 am »

I can't remember you suggested a better algorithm to solve that Issue?.....
I seem to remember you just compared some?

http://bugs.freepascal.org/view.php?id=30664

Quote from: circular
In fact, having a hashmap for all strings could be a way to ensure a single address for each string.

I was thinking about using these hashmaps just for that. But it seems hard to make it thread safe, and to expire the strings, i.e. remove old unused strings

circular

  • Hero Member
  • *****
  • Posts: 3058
    • Personal webpage
Re: (hash)maps: the final benchmark
« Reply #19 on: October 22, 2016, 08:37:06 am »
Hmm.. maybe using critical sections for reference counting and garbage collecting?
Conscience is the debugger of the mind

heX

  • New member
  • *
  • Posts: 6
Re: (hash)maps: the final benchmark
« Reply #20 on: August 22, 2018, 12:05:05 pm »
My result of compare HashMap/Map.
Testing: fcl-stl: THashMap, TMap. fcl-base: TFPGMap, TFPHashList.

Result:
Code: [Select]
Test key: string
Count: 100000

THashMap:
Create.   Used mem: 640
Adding. Time: 141  Used mem: 23634176
Read.   Time: 780
Delete. Time: 125  Used mem: 17234336
Free.   Time: 31  Used mem: 64

TMap:
Create.   Used mem: 0
Adding. Time: 421  Used mem: 12799936
Read.   Time: 3479
Delete. Time: 1139  Used mem: 0
Free.   Time: 0  Used mem: -64

TFPGMap:
Create.   Used mem: 64
Adding. Time: 2480  Used mem: 8076384
Read.   Time: 1607
Delete. Time: 5179  Used mem: 3392
Free.   Time: 0  Used mem: -64

TFPHashList:
Create.   Used mem: 32
Adding. Time: 47  Used mem: 5211456
Read.   Time: 406
Delete. Time: 32620  Used mem: 2239200
Free.   Time: 0  Used mem: -64

Code: [Select]
Test key: int64 (or string[8] for TFPHashList)
Count: 100000

THashMap:
Create.   Used mem: 640
Adding. Time: 47  Used mem: 16832096
Read.   Time: 156
Delete. Time: 31  Used mem: 16832192
Free.   Time: 47  Used mem: 64

TMap:
Create.   Used mem: 0
Adding. Time: 47  Used mem: 6400000
Read.   Time: 312
Delete. Time: 125  Used mem: 0
Free.   Time: 0  Used mem: -64

TFPGMap:
Create.   Used mem: 64
Adding. Time: 2324  Used mem: 1676448
Read.   Time: 359
Delete. Time: 4992  Used mem: 3392
Free.   Time: 0  Used mem: -64

TFPHashList:
Create.   Used mem: 32
Adding. Time: 15  Used mem: 4063296
Read.   Time: 141
Delete. Time: 32760  Used mem: 1091040
Free.   Time: 0  Used mem: -64
End.

Code: Pascal  [Select]
  1. program project1;
  2.  
  3. {.$DEFINE TestStr}
  4.  
  5. {$mode objfpc}{$H+}
  6.  
  7. uses
  8.   {$IFDEF UNIX}{$IFDEF UseCThreads}
  9.   cthreads,
  10.   {$ENDIF}{$ENDIF}
  11.   Classes, SysUtils, CustApp,
  12.   { you can add units after this }
  13.   CRC,
  14.   fgl,
  15.   gmap,
  16.   ghashmap,
  17.   contnrs;
  18.  
  19. const
  20.   DefValue = pointer(1);
  21.  
  22. type
  23.  
  24.   { TMyApplication }
  25.  
  26.   TValue = pointer;
  27.  
  28.   {$IFDEF TestStr}
  29.   TKey = ansistring;
  30.   {$ELSE}
  31.   TKey = int64;
  32.   {$ENDIF}
  33.  
  34.   { THashDirectFunc }
  35.  
  36.   THashDirectInt64 = class
  37.     // return uniformly distributed i value in range <0,n-1> base only on arguments,
  38.     // n will be always power of 2
  39.     class function hash(a: int64; n: SizeUInt): SizeUInt;
  40.  
  41.     // [when STL_INTERFACE_EXT is defined]
  42.     // return the boolean test for equality of the two keys.  Typically this is operator=,
  43.     // but it doesn't have to be (e.g. case-insensitive string comparison)
  44.     class function equal(const AKey1, AKey2: int64): Boolean;
  45.   end;
  46.  
  47.   THashFuncString = class
  48.     // return uniformly distributed i value in range <0,n-1> base only on arguments,
  49.     // n will be always power of 2
  50.     class function hash(a: string; n: SizeUInt): SizeUInt;
  51.  
  52.     // [when STL_INTERFACE_EXT is defined]
  53.     // return the boolean test for equality of the two keys.  Typically this is operator=,
  54.     // but it doesn't have to be (e.g. case-insensitive string comparison)
  55.     class function equal(const AKey1, AKey2: string): Boolean;
  56.   end;
  57.  
  58.   { TMapCompare }
  59.  
  60.   {$IFDEF TestStr}
  61.   TTestGHashMap = specialize THashMap<TKey, TValue, THashFuncString>;
  62.   {$ELSE}
  63.   TTestGHashMap = specialize THashMap<TKey, TValue, THashDirectInt64>;
  64.   {$ENDIF}
  65.  
  66.   TMapCompare=class
  67.     class function c(a,b :TKey):boolean;
  68.   end;
  69.  
  70.   TTestGMap = specialize TMap<TKey, TValue, TMapCompare>;
  71.  
  72.   TTestFPGMap = specialize TFPGMap<TKey, TValue>;
  73.  
  74.   TMyApplication = class(TCustomApplication)
  75.   protected
  76.     procedure DoRun; override;
  77.   public
  78.     list1: TFPHashList;
  79.     map1: TTestGHashMap;
  80.     map2: TTestFPGMap;
  81.     map3: TTestGMap;
  82.     constructor Create(TheOwner: TComponent); override;
  83.     destructor Destroy; override;
  84.     procedure WriteHelp; virtual;
  85.   end;
  86.  
  87. //// //// //// //// //// //// //// //// //// //// //// //// //// //// //// ////
  88.  
  89. function RandomOf(RandKey: LongInt): integer;
  90. Begin
  91.  Result := RandKey * 1103515245 + 12345;
  92. enD;
  93.  
  94. {$IFDEF TestStr}
  95. function RndKeyStr(i: LongInt): ansistring;
  96. begin
  97.   Result := IntToStr(Int64( Abs(RandomOf(i)) or $1000000000000000) );
  98. end;
  99. {$ELSE}
  100. function RndKeyStr(i: LongInt): ansistring;
  101. begin
  102.   // create strong 8 byte string
  103.   Result := IntToHex(RandomOf(i), 8);
  104. end;
  105. {$ENDIF}
  106.  
  107. {$IFDEF TestStr}
  108. function RndKey(i: LongInt): ansistring;
  109. begin
  110.   Result := RndKeyStr(i);
  111. end;
  112. {$ELSE}
  113. function RndKey(i: LongInt): int64;
  114. begin
  115.   Result := RandomOf(i);
  116. end;
  117. {$ENDIF}
  118.  
  119. { TMapCompare }
  120.  
  121. class function TMapCompare.c(a, b: TKey): boolean;
  122. begin
  123.   {$IFDEF TestStr}
  124.   Result := CompareStr(a, b) > 0;
  125.   {$ELSE}
  126.   Result := a > b;
  127.   {$ENDIF}
  128. end;
  129.  
  130.  
  131. //// //// //// //// //// //// //// //// //// //// //// //// //// //// //// ////
  132.  
  133. class function THashFuncString.hash(a: string; n: SizeUInt): SizeUInt;
  134. begin
  135.   Result := CRC.CRC32(n, @a[1], Length(a)) mod n;
  136. end;
  137.  
  138. class function THashFuncString.equal(const AKey1, AKey2: string): Boolean;
  139. begin
  140.   Result := AKey1 = AKey2;
  141. end;
  142.  
  143. class function THashDirectInt64.hash(a: int64; n: SizeUInt): SizeUInt;
  144. begin
  145.   Result := SizeUInt(RandomOf(a)) mod n;
  146. end;
  147.  
  148. class function THashDirectInt64.equal(const AKey1, AKey2: int64): Boolean;
  149. begin
  150.   Result := AKey1 = AKey2;
  151. end;
  152.  
  153. //// //// //// //// //// //// //// //// //// //// //// //// //// //// //// ////
  154.  
  155. function UsedMem(): integer;
  156. var mem: TFPCHeapStatus;
  157. begin
  158.   mem := SysGetFPCHeapStatus();
  159.   Result := mem.CurrHeapUsed;
  160. end;
  161.  
  162. { TMyApplication }
  163.  
  164. procedure TMyApplication.DoRun;
  165. var
  166.   i, i2, itmp: integer;
  167.   t: int64;
  168.   c: int64;
  169.   value: TValue;
  170.   p: pointer;
  171.   basemem: integer;
  172. begin
  173.   c := StrToInt64Def(self.GetOptionValue('c', 'count'), 100000);
  174.  
  175.   {$IFDEF TestStr}
  176.   writeln('Test key: string');
  177.   {$ELSE}
  178.   writeln('Test key: int64 (or string[8] for TFPHashList)');
  179.   {$ENDIF}
  180.   writeln('Count: ' + IntToStr(c));
  181.  
  182.   writeln('');
  183.   writeln('THashMap:');
  184.   basemem := UsedMem();
  185.   map1 := TTestGHashMap.create();
  186.   writeln('Create. ' + '  Used mem: ' + IntToStr(UsedMem()-basemem));
  187.  
  188.   t := GetTickCount64();
  189.  
  190.   for i:=1 to c do
  191.   begin
  192.     map1.insert(RndKey(i), DefValue);
  193.   end;
  194.   writeln('Adding. Time: ' + IntToStr(GetTickCount64() - t) +
  195.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  196.   t := GetTickCount64();
  197.  
  198.   for i2:=1 to 10 do
  199.     for i:=1 to c do
  200.     begin
  201.       if not map1.GetValue(RndKey(i), value) then
  202.         raise EKeyNotFound.Create('THashMap find error') else
  203.         if value <> DefValue then
  204.           raise EKeyNotFound.Create('THashMap find error 2') else
  205.     end;
  206.   for i:=1 to c do
  207.   begin
  208.     map1.GetValue(RndKey(c+i), value);
  209.   end;
  210.   writeln('Read.   Time: ' + IntToStr(GetTickCount64() - t));
  211.   t := GetTickCount64();
  212.  
  213.   for i:=1 to c do
  214.   begin
  215.     if map1.contains(RndKey(i)) then
  216.       map1.delete(RndKey(i));
  217.   end;
  218.   writeln('Delete. Time: ' + IntToStr(GetTickCount64() - t) +
  219.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  220.   t := GetTickCount64();
  221.  
  222.   FreeAndNil(map1);
  223.   writeln('Free.   Time: ' + IntToStr(GetTickCount64() - t) +
  224.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  225.   t := GetTickCount64();
  226.  
  227.  
  228.  
  229.   writeln('');
  230.   writeln('TMap:');
  231.   basemem := UsedMem();
  232.   map3 := TTestGMap.create();
  233.   writeln('Create. ' + '  Used mem: ' + IntToStr(UsedMem()-basemem));
  234.  
  235.   t := GetTickCount64();
  236.  
  237.   for i:=1 to c do
  238.   begin
  239.     map3.insert(RndKey(i), DefValue);
  240.   end;
  241.   writeln('Adding. Time: ' + IntToStr(GetTickCount64() - t) +
  242.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  243.   t := GetTickCount64();
  244.  
  245.   for i2:=1 to 10 do
  246.     for i:=1 to c do
  247.     begin
  248.       if not map3.TryGetValue(RndKey(i), value) then
  249.         raise EKeyNotFound.Create('THashMap find error') else
  250.         if value <> DefValue then
  251.           raise EKeyNotFound.Create('THashMap find error 2') else
  252.     end;
  253.   for i:=1 to c do
  254.   begin
  255.     map3.TryGetValue(RndKey(c+i), value);
  256.   end;
  257.   writeln('Read.   Time: ' + IntToStr(GetTickCount64() - t));
  258.   t := GetTickCount64();
  259.  
  260.   for i:=1 to c do
  261.   begin
  262.     if map3.TryGetValue(RndKey(i), value) then
  263.       map3.delete(RndKey(i));
  264.   end;
  265.   writeln('Delete. Time: ' + IntToStr(GetTickCount64() - t) +
  266.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  267.   t := GetTickCount64();
  268.  
  269.   FreeAndNil(map3);
  270.   writeln('Free.   Time: ' + IntToStr(GetTickCount64() - t) +
  271.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  272.   t := GetTickCount64();
  273.  
  274.  
  275.  
  276.   writeln('');
  277.   writeln('TFPGMap:');
  278.   basemem := UsedMem();
  279.   map2 := TTestFPGMap.Create();
  280.   map2.Sorted:=true;
  281.   writeln('Create. ' + '  Used mem: ' + IntToStr(UsedMem()-basemem));
  282.  
  283.   t := GetTickCount64();
  284.  
  285.   for i:=1 to c do
  286.   begin
  287.     map2.Add(RndKey(i), DefValue);
  288.   end;
  289.   writeln('Adding. Time: ' + IntToStr(GetTickCount64() - t) +
  290.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  291.   t := GetTickCount64();
  292.  
  293.   for i2:=1 to 10 do
  294.     for i:=1 to c do
  295.     begin
  296.       map2.Find(RndKey(i), itmp);
  297.       if itmp = -1 then
  298.         raise EKeyNotFound.Create('TFPGMap find error');
  299.     end;
  300.   for i:=1 to c do
  301.   begin
  302.     map2.Find(RndKey(c+i), itmp);
  303.   end;
  304.   writeln('Read.   Time: ' + IntToStr(GetTickCount64() - t));
  305.   t := GetTickCount64();
  306.  
  307.   for i:=1 to c do
  308.   begin
  309.     map2.Find(RndKey(i), itmp);
  310.     if itmp<>-1 then
  311.       map2.Delete(itmp);
  312.   end;
  313.   writeln('Delete. Time: ' + IntToStr(GetTickCount64() - t) +
  314.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  315.   t := GetTickCount64();
  316.  
  317.   FreeAndNil(map2);
  318.   writeln('Free.   Time: ' + IntToStr(GetTickCount64() - t) +
  319.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  320.   t := GetTickCount64();
  321.  
  322.  
  323.  
  324.   writeln('');
  325.   writeln('TFPHashList:');
  326.   basemem := UsedMem();
  327.   list1 := TFPHashList.Create();
  328.   writeln('Create. ' + '  Used mem: ' + IntToStr(UsedMem()-basemem));
  329.  
  330.   t := GetTickCount64();
  331.  
  332.   for i:=1 to c do
  333.   begin
  334.     list1.Add(RndKeyStr(i), DefValue);
  335.   end;
  336.   writeln('Adding. Time: ' + IntToStr(GetTickCount64() - t) +
  337.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  338.   t := GetTickCount64();
  339.  
  340.   for i2:=1 to 10 do
  341.     for i:=1 to c do
  342.     begin
  343.       if list1.Find(RndKeyStr(i)) = nil then
  344.         raise EKeyNotFound.Create('TFPHashList find error');
  345.     end;
  346.   for i:=1 to c do
  347.   begin
  348.     list1.Find(RndKeyStr(c+i));
  349.   end;
  350.   writeln('Read.   Time: ' + IntToStr(GetTickCount64() - t));
  351.   t := GetTickCount64();
  352.  
  353.   for i:=1 to c do
  354.   begin
  355.     itmp := list1.FindIndexOf(RndKeyStr(i));
  356.     if itmp <> -1 then
  357.       list1.Delete(itmp);
  358.   end;
  359.   writeln('Delete. Time: ' + IntToStr(GetTickCount64() - t) +
  360.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  361.   t := GetTickCount64();
  362.  
  363.   FreeAndNil(list1);
  364.   writeln('Free.   Time: ' + IntToStr(GetTickCount64() - t) +
  365.     '  Used mem: ' + IntToStr(UsedMem()-basemem));
  366.   t := GetTickCount64();
  367.  
  368.  
  369.  
  370.   writeln('End.');
  371.   ReadLn();
  372.   // stop program loop
  373.   Terminate;
  374. end;
  375.  
  376. constructor TMyApplication.Create(TheOwner: TComponent);
  377. begin
  378.   inherited Create(TheOwner);
  379.   StopOnException:=True;
  380. end;
  381.  
  382. destructor TMyApplication.Destroy;
  383. begin
  384.   inherited Destroy;
  385. end;
  386.  
  387. procedure TMyApplication.WriteHelp;
  388. begin
  389.   { add your help code here }
  390.   writeln('Usage: ', ExeName, ' -h');
  391. end;
  392.  
  393. var
  394.   Application: TMyApplication;
  395. begin
  396.   Application:=TMyApplication.Create(nil);
  397.   Application.Title:='My Application';
  398.   Application.Run;
  399.   Application.Free;
  400. end.
  401.