Recent

Author Topic: TStrings.IndexOfName / TStringList  (Read 17441 times)

marcodor

  • New Member
  • *
  • Posts: 17
TStrings.IndexOfName / TStringList
« on: July 04, 2015, 01:37:35 pm »
I think speed of IndexOfName function can be greatly improved in TStringList descendant if list is sorted.
Instead doing a natural scan of all items we may go with a quick find in a sorted list.

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: TStrings.IndexOfName / TStringList
« Reply #1 on: July 04, 2015, 03:02:38 pm »
I think speed of IndexOfName function can be greatly improved in TStringList descendant if list is sorted.
Instead doing a natural scan of all items we may go with a quick find in a sorted list.
I think for this you could use other descendents of TStrings.
Like ... ? (I think there was something like TSortedList or TBucketlist but I could'nt find any of these in the Wiki )
But I remember a Demo, comparing these three (Maybe it's something tied to Delphi)
But each of this has it Up's and Down's (mem-usage, speed of specific activity e.G: inserting, rearranging, generating, reading, ...  )

The TStringList itself should be left alone.
e.G: if you want to load some source-code into TStringlist, it's definitly NOT a good Idea to have this sorted.
but using 'IndexOfName' is normaly no good option either on that.
« Last Edit: July 04, 2015, 03:41:26 pm by jc99 »
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

typo

  • Hero Member
  • *****
  • Posts: 3051
Re: TStrings.IndexOfName / TStringList
« Reply #2 on: July 04, 2015, 03:22:58 pm »
If you need fast lookup without the need of a sorted list, have a look at TDictionaryStringList and see if it helps.

http://wiki.lazarus.freepascal.org/LazUtils#TDictionaryStringList
« Last Edit: July 04, 2015, 03:24:49 pm by typo »

marcodor

  • New Member
  • *
  • Posts: 17
Re: TStrings.IndexOfName / TStringList
« Reply #3 on: July 04, 2015, 04:07:26 pm »
JC99, Why TStringList.IndexOf perform a binary search while IndexOfName not?

I asked this, because TStrings.Values[ internally uses IndexOfName.
Reading a large config section with TIniFile.ReadSectionValues( and after that sorting it will speedup later access of Strings.Values['option']

I really see no problems implementing indexed search for IndexOfName in TStringList especially most code is already done for IndexOf.
Also I suppose IndexOfName is used in many parts/projects (maybe Lazarus itself), so it should improve things.

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: TStrings.IndexOfName / TStringList
« Reply #4 on: July 04, 2015, 04:34:13 pm »
Maybe someone could help me out, but if i remember correctly TStringList does NOT do a binary-search what it does, it goes through a hash-list which is nearly as effective (maybe even more) as a binary search. But this can only be done with a full string. So you need to use a class that does (also) a hashing of the KeyNames.
And as typo suggested TDictionaryStringList might help you in this.
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

marcodor

  • New Member
  • *
  • Posts: 17
Re: TStrings.IndexOfName / TStringList
« Reply #5 on: July 04, 2015, 04:43:49 pm »
Jc99, I just analyzed sources, stringl.inc.

In sourcecode comments there is // Use binary search
List is divided in 2 segments, looks for bounds where searched string may be and "good" segment is divided again until find matched item.

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: TStrings.IndexOfName / TStringList
« Reply #6 on: July 05, 2015, 02:07:19 am »
Jc99, I just analyzed sources, stringl.inc.

In sourcecode comments there is // Use binary search
List is divided in 2 segments, looks for bounds where searched string may be and "good" segment is divided again until find matched item.
The normal stringlist does this only If the List is sorted.
With sorted=false it Calls the inherited function.
Which does a linear search.
But You are right, If the List is sorted a binary search for the keyname is theoretically posible.
IndexOfName is passed to Tstrings which performs a linear search.
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

marcodor

  • New Member
  • *
  • Posts: 17
Re: TStrings.IndexOfName / TStringList
« Reply #7 on: July 05, 2015, 10:50:36 am »
I'm ready to contribute with a patch if core-devs agree.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: TStrings.IndexOfName / TStringList
« Reply #8 on: July 05, 2015, 03:17:26 pm »
I'm ready to contribute with a patch if core-devs agree.

I personally think it is a good idea but I am not a FPC developer. You can open a ticket in bug tracker with a patch and see what happens. Search the existing tickets first.

In any case this is NOT an important feature because TStringList is a wrong class for fast Key-Value lookups.
THashedStringList (unit IniFiles) uses the same API but is fast.
Then there is TStringToStringTree (LazUtils, unit AvgLvlTree). It is fast balanced tree and more memory efficient than a hash map. And there are more hash maps...

There was wrong information about TDictionaryStringList given by "typo" which I find strange because he is the co-author of that component.
It does not support fast Key-Value lookups, although I admit the class name is misleading. I realized it only afterwards. If somebody comes up with a better name, we can still change it.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11446
  • FPC developer.
Re: TStrings.IndexOfName / TStringList
« Reply #9 on: July 05, 2015, 03:36:15 pm »
I'm ready to contribute with a patch if core-devs agree.

Maybe it is smartest to first check how Delphi handles this. TStringlist is a complex beast because of compatibility.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: TStrings.IndexOfName / TStringList
« Reply #10 on: July 05, 2015, 04:22:21 pm »
I actually had to modify TStrings for key-lookup purposes.  To make it act as associated array.

However, I never implemented it to be dependent on the list being sorted by names. As well as it would work for "named" values as well as non-named values. So both "indexof" and "IndexofName" work the in same manner.

Yes, two hash lists have to be maintained accurately in order for this to work, that does some impact on memory consumption.

JuhaManninen

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 4467
  • I like bugs.
Re: TStrings.IndexOfName / TStringList
« Reply #11 on: July 05, 2015, 04:33:19 pm »
I actually had to modify TStrings for key-lookup purposes.  To make it act as associated array.

THashedStringList in IniFiles already does that.
Mostly Lazarus trunk and FPC 3.2 on Manjaro Linux 64-bit.

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2770
    • havefunsoft.com
Re: TStrings.IndexOfName / TStringList
« Reply #12 on: July 05, 2015, 05:10:01 pm »
indeed. What an improper place for THashedStringList

[Edit]
Actually, it's probably the class name that's not quite accurate. THashedStringList is based on the TFPHashList. FPHashList has a limitation of using shortstring as look-up key. Using only 256 ansi characters string is totally fine for .ini files (unless .ini file is machine-generated for whatever crazy tasks), but on the wide scope, one might need to lookup for values that are more than 256 characters. Especially with UTF8 used for LCL.
« Last Edit: July 05, 2015, 05:43:51 pm by skalogryz »

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: TStrings.IndexOfName / TStringList
« Reply #13 on: July 05, 2015, 07:00:18 pm »
indeed. What an improper place for THashedStringList
+1
At the moment I work with ValueListEditor. It uses a TStringlist-descendent named TValueListStrings which uses the TString.IndexOfName-method to derive the keynames and values.
even GetItemProp uses the TString.IndexOfName. Making the TValueListStrings a  THashedStringList -descendent would be much better.
But then the unit ValEdit would be a descendent of inifiles what is not a good thing and hardly understandable.
I'd vote for an independent unit e.G: HashStrLst so that inifiles, valedit and any other component that needed this could descend from that.
 
[Edit]
Actually, it's probably the class name that's not quite accurate. THashedStringList is based on the TFPHashList. FPHashList has a limitation of using shortstring as look-up key. Using only 256 ansi characters string is totally fine for .ini files (unless .ini file is machine-generated for whatever crazy tasks), but on the wide scope, one might need to lookup for values that are more than 256 characters. Especially with UTF8 used for LCL.
Using only the first 256 character to generate the hash would only be a problem if you have lot of keynames that have the same start and only differ after the 256 character.
The only case i could think of that is if you want to hash the place of files in a full path, to give them some extra property. But in that case other components are much more appropriate.
In the normal-ini-file-case i see  no need to change something.
@marcov:
I just looked in the Delphi-RTL, TStrings and TStringlist act quite like their Lazarus-siblings. TStrings does a linear search on both indexof and IndexOfName. TStringlist only does a binary-search in indexof via the find-method when the list is sorted.
So it's delphi-compatibility vs. better code. ( I go for the better code )
 :-[ So sorry about the wrong info.
@marcodor:
I think you try something like:
Code: [Select]
// classesh.inc (Line 757 ff)
    function IndexOf(const S: string): Integer; override;
    function IndexOfName(const S: string): Integer; override; // <----
    procedure Insert(Index: Integer; const S: string); override;
 
// stringl.inc (line 1336 ff new)
Function TStringList.IndexOfName(const S: string): Integer;

begin
  If Not Sorted then
    Result:=Inherited indexOfName(S)
  else
    // faster using binary search...
    begin
       Find (S,Result); // This should work since Find outputs the L-Value.
       if (Result >= Count) or (Names[Result] <> S) then
           Result:= -1;
    end;
end;
« Last Edit: July 05, 2015, 09:25:17 pm by jc99 »
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

jc99

  • Hero Member
  • *****
  • Posts: 553
    • My private Site
Re: TStrings.IndexOfName / TStringList
« Reply #14 on: July 05, 2015, 08:35:31 pm »
BTW, TStringList.find always does a binary search even in an unsorted list. Is this really supposed to be that way ?
[Edit]
Could someone verify this ?
Code: [Select]
procedure TryFind(const S: string; const sl: TStringList);
var
  no: integer;
begin
  Write('Test "' + S);
  if sl.find(S, no) then
    writeln('" found: ' + IntToStr(no))
  else
  if no < sl.Count then
    writeln('" not found: ' + IntToStr(no) + ' --> ' + sl.Names[no])
  else
    writeln('" not found: ' + IntToStr(no) + ' --! XXX ');
end;

procedure Execute;

var
  sl: TStringList;

begin
  sl := TStringList.Create;
  try
    sl.add('1=Test1');
    sl.add('3=Test3');
    sl.add('2=Test2');
    TryFind('2=Test2', sl); // !!! string is not found !!!
    sl.Sorted := True;
    TryFind('2=Test2', sl); // Now it's found
    TryFind('2', sl);
    TryFind('0', sl);
    TryFind('1', sl);
    TryFind('2', sl);
    TryFind('3', sl);
    TryFind('4', sl);  // Result in sl.count
  finally
    FreeAndNil(sl);
  end;
  readln;
end;
« Last Edit: July 05, 2015, 09:36:04 pm by jc99 »
OS: Win XP x64, Win 7, Win 7 x64, Win 10, Win 10 x64, Suse Linux 13.2
Laz: 1.4 - 1.8.4, 2.0
https://github.com/joecare99/public
'~|    /''
,_|oe \_,are
If you want to do something for the environment: Twitter: #reduceCO2 or
https://www.betterplace.me/klimawandel-stoppen-co-ueber-preis-reduzieren

 

TinyPortal © 2005-2018