Recent

Author Topic: Using generics  (Read 8255 times)

jmm72

  • Jr. Member
  • **
  • Posts: 79
  • Very experienced in being a beginner...
Using generics
« on: January 08, 2015, 11:03:50 am »
Greetings

I want to move a bit deeper in FPC with maps and hashlists. I've seen this thing of generics but I don't see much documentation around, and reading the sources doesn't really help me much at my stage, probably because I can't grasp certain concepts specifically. Is a map something like a hashed array of elements where each element has another element associated with it?

My intention for now is to use something as a dictionary, an associative array of key elements containing other value elements. I think the best way would be fgl and:
TMyMap = specialize TFPGMap<string,integer>

But the scarce examples I find around don't set the light on all the things I want to do:
* Add a new element with its value (well actually I think I have this one covered).
* Check whether a certain element is in the map (important), and if so, get its value.
* Change the value of a certain element to another element.
* Delete a certain element from the map.
* Iterate through all the elements in the map.

I'm not really sure if I can do some of these, like maybe I check for a key 'whatever' in the map and I don't know if it was there or not, all I get is a 0. And maybe I can't delete an element, just set its value to 0 and know that 0 means 'key not set'? And maybe I'm just not getting all these things about maps and hashsets and voodoo chickens, because I have no idea if I can iterate through the elements. I would do with a real example showing these five actions, if you can point me to one, because I feel really lost.

Also, is there some kind of fast & easy & painless "double map"? Something like I have the key element, I can get the value element; and if I have the value element, I can get the key element. I guess I could do it using two synchronized TFPGMap where one stores key,value and the other stores value,key, but that would consume double the memory and the overhead of keeping them in sync. Maybe the fgl unit has something like this but I'm not seeing it?

And one last question: the lack of documentation is because the fgl is not fully completed yet and it's still being finshed?

Regards, JMM.
« Last Edit: January 08, 2015, 11:11:01 am by jmm72 »
Lazarus 1.6.4 + FPC 3.0.2 64bits under Windows 7 64bits
Only as a hobby nowadays
Current proyect release: TBA

Leledumbo

  • Hero Member
  • *****
  • Posts: 8111
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Using generics
« Reply #1 on: January 08, 2015, 11:48:21 am »
For an undocumented unit without examples, certainly reading the source is the only way to find out how to use. I know this is difficult for most people, but the unit status itself is unclear.

Is a map something like a hashed array of elements where each element has another element associated with it?
For TFPGMap, it's just a list with map-like behavior.

* Add a new element with its value (well actually I think I have this one covered).
Using your TMyMap definition:
Code: [Select]
ATMyMapInstance['a key'] := 127;
* Check whether a certain element is in the map (important), and if so, get its value.
to check:
Code: [Select]
function Find(const AKey: TKey; out Index: Integer): Boolean; {$ifdef CLASSESINLINE} inline; {$endif}
function IndexOf(const AKey: TKey): Integer; {$ifdef CLASSESINLINE} inline; {$endif}
to get:
Code: [Select]
property Data[Index: Integer]: TData read GetData write PutData;

* Change the value of a certain element to another element.
No different than adding.

* Delete a certain element from the map.
Code: [Select]
function Remove(const AKey: TKey): Integer;

* Iterate through all the elements in the map.
Code: [Select]
for i := 0 to ATMyMapInstance.Count - 1 do
// get key with ATMyMapInstance.Keys[i]
// get data with ATMyMapInstance.Data[i]

jmm72

  • Jr. Member
  • **
  • Posts: 79
  • Very experienced in being a beginner...
Re: Using generics
« Reply #2 on: January 08, 2015, 12:06:24 pm »
I was doing my tests and got most of it, but your explanation made it all fit nicely. Specially I was not being aware that a map should have something like Find or IndexOf. Thank you very much.

I'm going to do some tests with IndexOfData to see if it fits my need for a "reverse dictionary". However I'm unsure whether the data is also searched quickly or sequentially. Would that usage make it too slow when the number of keys is too large?
Lazarus 1.6.4 + FPC 3.0.2 64bits under Windows 7 64bits
Only as a hobby nowadays
Current proyect release: TBA

Leledumbo

  • Hero Member
  • *****
  • Posts: 8111
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Using generics
« Reply #3 on: January 08, 2015, 12:15:55 pm »
Would that usage make it too slow when the number of keys is too large?
I believe so. If you don't set Sorted to true, then sequential search is performed. Otherwise, it's binary search.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7493
Re: Using generics
« Reply #4 on: January 08, 2015, 12:20:56 pm »
Would that usage make it too slow when the number of keys is too large?
I believe so. If you don't set Sorted to true, then sequential search is performed. Otherwise, it's binary search.

The ordered insertion of a large number of items will be slowing though. Just like TStringlist the fgl types are basically wrappers around an array.

The only exceptions are the classes hash list (used in various FPC parts) and the fcl-base binary tree implementation, used a lot by Lazarus.

jmm72

  • Jr. Member
  • **
  • Posts: 79
  • Very experienced in being a beginner...
Re: Using generics
« Reply #5 on: January 08, 2015, 01:24:04 pm »
I think I got it already, and even without being optimized it's already faster than the implementation I did with TStringList. (And forget the initial implementation "let's-make-it-work" which was like 2 minutes long, now it takes 3 seconds). I think I would have bumped my head against the wall too many times without your help.

However I'm unsure about ordered insertion of a lot of items will be slow. If it behaves like TStringList in that aspect, adding lots of items and then sorting is much more slower than adding items while the list is already sorted. I might do my test to assert this, tho.

The question about being slow is about TFPGMap.IndexOfData. I believe that even when the map is sorted, the data is not, thus it's always a sequential search when using IndexOfData. Some looks at the source code seems to follow this. If I'm right, data iteration for "reverse lookup" would be awfully slow. I don't mean doing an iteration through all the elements, but just searching a set of all the data. Then how could I implement a two-way dictionary using fgl?
Lazarus 1.6.4 + FPC 3.0.2 64bits under Windows 7 64bits
Only as a hobby nowadays
Current proyect release: TBA

howardpc

  • Hero Member
  • *****
  • Posts: 3176
Re: Using generics
« Reply #6 on: January 08, 2015, 01:33:36 pm »
Why not share the code you have so far? There are forum members who enjoy suggesting ways of enhancing code (whether to make it faster, or more reliable, or simpler etc.). It might also be instructive for people who are unfamiliar with generics, and who could learn about the usefulness and widespread applicability of an emerging FCL library, for which good examples are still rather scarce.
In answering your question, it would be easier to comment on specific visible code, than to talk in generalities.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8111
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Using generics
« Reply #7 on: January 08, 2015, 02:22:19 pm »
However I'm unsure about ordered insertion of a lot of items will be slow. If it behaves like TStringList in that aspect, adding lots of items and then sorting is much more slower than adding items while the list is already sorted. I might do my test to assert this, tho.
Let me help you with that:
Code: [Select]
{$mode objfpc}{$H+}

uses
  Classes,SysUtils,DateUtils,fgl;

var
  c1,c2,c3,c4,c5: Integer;

function GetKey(i: Integer): String;

  function CharOf(c: Integer): Char; inline;
  begin
    Result := Chr(c + Ord('A'));
  end;

  procedure UpdateCValues; inline;
  const
    MaxC = 26;
  begin
    Inc(c5);
    if c5 >= MaxC then begin
      c5 := 0;
      Inc(c4);
      if c4 >= MaxC then begin
        c4 := 0;
        Inc(c3);
        if c3 >= MaxC then begin
          c3 := 0;
          Inc(c2);
          if c2 >= MaxC then begin
            c2 := 0;
            Inc(c1);
            if c1 >= MaxC then begin
              c1 := 0;
            end;
          end;
        end;
      end;
    end;
  end;

begin
  Result := CharOf(c1) + CharOf(c2) + CharOf(c3) + CharOf(c4) + CharOf(c5);
  UpdateCValues;
end;

procedure ResetCValues;
begin
  c1 := 0;
  c2 := 0;
  c3 := 0;
  c4 := 0;
  c5 := 0;
end;

const
  MAX = 10000;
type
  TStringIntMap = specialize TFPGMap<String,Integer>;
var
  m1,m2: TStringIntMap;
  st,et: TTime;
  i: Integer;
  Same: Boolean;
begin
  ResetCValues;
  m1 := TStringIntMap.Create;
  m1.Sorted := false;
  st := Now;
  for i := 0 to MAX - 1 do
    m1[GetKey(i)] := 0;
  m1.Sort;
  et := Now;
  WriteLn('Time elapsed: ',SecondSpan(st,et):1:6,' second(s)');

  ResetCValues;
  m2 := TStringIntMap.Create;
  m2.Sorted := true;
  st := Now;
  for i := 0 to MAX - 1 do
    m2[GetKey(i)] := 0;
  et := Now;
  WriteLn('Time elapsed: ',SecondSpan(st,et):1:6,' second(s)');

  Same := true;
  for i := 0 to m1.Count - 1 do
    if m1.Keys[i] <> m2.Keys[i] then begin
      Same := false;
      WriteLn(i,' ',m1.Keys[i] + ' ' + m2.Keys[i]);
      WriteLn('Different output, invalid benchmark');
      break;
    end;
  if Same then WriteLn('Same output, valid benchmark');
end.
The question about being slow is about TFPGMap.IndexOfData. I believe that even when the map is sorted, the data is not, thus it's always a sequential search when using IndexOfData. Some looks at the source code seems to follow this. If I'm right, data iteration for "reverse lookup" would be awfully slow. I don't mean doing an iteration through all the elements, but just searching a set of all the data. Then how could I implement a two-way dictionary using fgl?
Create a descendant with nested type specialized in the reverse order from outer map. e.g.:
Code: [Select]
generic TTwoWayMap<TKey,TData> = class(specialize TFPGMap<TKey,TData>)
private type
  TMirrorMap = specialize TFPGMap<TData,TKey>;
private var
  FMirrorMap: TMirrorMap;
...
end;
then override all required methods (including constructor and destructor) to make it, perhaps PutKeyData and IndexOfData are enough. Well, it's up to your implementation.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7493
Re: Using generics
« Reply #8 on: January 08, 2015, 02:41:36 pm »
(Note that in 2.7/3.x versions it might also be worthwhile to have a look at gmap,ghash etc in packages/fcl-stl )

jmm72

  • Jr. Member
  • **
  • Posts: 79
  • Very experienced in being a beginner...
Re: Using generics
« Reply #9 on: January 08, 2015, 04:14:42 pm »
I did a quick benchmark of making ordered and unordered insertion in a TFPGMap, both sorted and unsorted, and while ordered insertion is reasonably slow (IIRC a sorted list is the worst scenario for quicksort), both ordered and unordered insertions are so much faster when sorted.

@ Leledumbo: That TwoWayMap is such a great approach of what I'd need. I don't think I could write a custom two way map where both keys and data are hashed or sorted in some way, but that usage of nested maps and overriding the needed methods is genius. I think I can live with the extra memory requirement, because once the implementation is done, using it will be so easy.

I didn't want to put any code because the way I use it is not exactly <string, integer>, and the code is actually spread in several functions. What Leledumbo said is enough for anyone to start using a TFPGMap, so putting my code as an example is not needed IMHO (and it's a PITA to translate all the identifiers). However I think I could write down a quick example and post it, so if anyone searching for "how to use fgl TFPGMap" hits this thread can have real code to read.

And I don't think I can optimize it anymore without making it unreadable/uncomprehensible. I got it down to 3 seconds and I think I got the TFPGMap thing right and straight. There is another place where I can change a TStringListEx with this TFPGMap thingy, so when my code advances to make heavy use of that part, it'll be faster than it could have been without the map.

If anyone is curious, it's a custom file renamer to maintain a large collection of files, using certain words in the filename to categorize and rename accordingly in a certain way. For now it's 26,000 files, so making this as a pet/personal project was not a bad idea. Last time I counted there were like 13,000 of those keywords , and probably have some thousands more to find and categorize. The renamer also detects uncategorized keywords to help in maintaining the collection with the best possible names using that information.

Thanks everyone for all the help.

edit: Here you have some test code using the most common features of TFPGMap.

Code: [Select]
program testmap;

uses
  fgl;

type
  TDictionary = specialize TFPGMap<string, integer>;

var
  i: integer;
  dict: TDictionary;

begin
  dict := TDictionary.Create;
  dict.Sorted := True; // for fast lookup of keys
  // assign initial values
  dict['one'] := 1;
  dict['two'] := 2;
  dict['three'] := 3;

  // get data back
  writeln('The value of ''one'' is ', dict['one']);

  // if it's not known if the key is in there
  // use Find for sorted maps, but use IndexOf otherwise
  if dict.Find('four', i) then
  begin
    writeln('The value of ''four'' is ', dict['four']);
    writeln('The value of ''four'' is ', dict.Data[i]);
  end
  else
  begin
    writeln('Can''t find key four, adding it');
    dict['four'] := 4;
  end;

  // change one of the numbers
  dict['one'] := 5;
  writeln('The value of ''one'' is now ', dict['one']);

  // iterate with the keys and/or data
  writeln('All the keys and their values from the current map');
  for i := 0 to dict.Count - 1 do
    writeln(dict.Keys[i], '=', dict.Data[i]);

  // delete one key (with its value) from the map
  writeln('Removing ''two''');
  dict.Remove('two');
  writeln('All the keys and their values from the current map');
  for i := 0 to dict.Count - 1 do
    writeln(dict.Keys[i], '=', dict.Data[i]);

  // find which key has a certain data (always sequential search, slow)
  i := dict.IndexOfData(3);
  writeln('Value 3 was found to have the keyname ', dict.Keys[i]);
  dict.Free;
end.
« Last Edit: January 08, 2015, 06:06:22 pm by jmm72 »
Lazarus 1.6.4 + FPC 3.0.2 64bits under Windows 7 64bits
Only as a hobby nowadays
Current proyect release: TBA

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
Re: Using generics
« Reply #10 on: January 09, 2015, 01:37:36 pm »
Out of curiosity is the TFPGMap faster than key/value or rather string/objects properties of a TStringList? And if so to what extend for ~100k items?
Lazarus 1.7 (SVN) FPC 3.0.0

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: Using generics
« Reply #11 on: January 09, 2015, 04:30:42 pm »
Maybe TDictionaryStringList can help you, it is on LazUtils folder and has an example project. It is optimized for lookup, IndexOf was overriden to use a public Contains function, which uses the map directly instead of a linear search.

http://wiki.lazarus.freepascal.org/LazUtils
« Last Edit: January 09, 2015, 05:11:21 pm by typo »