Lazarus

Free Pascal => General => Topic started by: EganSolo on October 03, 2021, 10:24:32 am

Title: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: EganSolo on October 03, 2021, 10:24:32 am
So I need an object list that can stay sorted.

Start Paren:
In case you were like me at some point and you didn't know about TFPObjectList and TFPStringList, these are cousins to TObjectList and TStringList but without event-related overhead: they don't notify when an action is taken.
End Paren.

I need an object list that can stay sorted. I could be lazy and use TFPStringList and add a field to my objects to act as a unique string, or I could write a descendant to TFPObjectList that has a sorted property and in case sorted is true, IndexOf would rely on a Compare function to do a binary search over the list.

The reason why I'm hesitant to do that is that when I googled "free pascal Lazarus TFPSortedObjectList TSortedFPObjectList TSortedObjectList" Mr. Google came back empty-handed. Am I the _only_ one who needs a sorted object list?

Suddenly I feel like the guy who looks up and finds out that he's alone in a saloon....

So is it then the case that everyone else is just using TStringList?

What am I missing?

Egan
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: MarkMLl on October 03, 2021, 10:48:31 am
So why are you assuming that somebody who'd done the work would use the names you've apparently pulled out of a hat, and that all of those names would appear on the same page (which is what you've asked for)?

Your curiosity is entirely valid, but you're assuming that Google is an oracle that can divine your intention: it's not, it's there to sell advertising services, and there's more than enough fuzziness in it already.

MarkMLl
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Thaddy on October 03, 2021, 12:40:40 pm
You can maybe use a TFPGMap or TFpgMapObject,
Example:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2. {$mode objfpc}{$H+}{$warn 3123 off}{$warn 6058 off}
  3. Uses fgl;
  4. type
  5.   TMySortedObjectList = specialize TFpgMapObject<string,TObject>;
  6.  
  7. var
  8.   MyList:TMySortedObjectList;  
  9. begin
  10.   MyList := TMySortedObjectList.Create(true);
  11.   MyList.Sorted:=true;
  12.   try
  13.     MyList.AddOrSetData('test',nil); // using this prevents duplicate keys.
  14.     Writeln(MyList.Keys[0]); // writes 'test'
  15.     MyList.AddOrSetData('best',nil);
  16.     Writeln(MyList.Keys[0]); // since it is sorted, now prints 'best'
  17.   finally
  18.     MyList.Free;
  19.   end;
  20. end.
I guess this already does what you want.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 03, 2021, 01:45:06 pm
@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Thaddy on October 03, 2021, 01:46:04 pm
Slight variation:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}{$warn 3123 off}{$warn 6058 off}
  2. {$macro on}
  3. // do not allow duplicate keys macro}
  4. {$define add:=AddOrSetData}
  5. Uses fgl;
  6. type
  7.   TMySortedObjectList = specialize TFpgMapObject<string,TObject>;
  8.  
  9. var
  10.   MyList:TMySortedObjectList;  
  11. begin
  12.   MyList := TMySortedObjectList.Create(true);
  13.   MyList.Sorted:=true;
  14.   try
  15.     MyList.Add('test',TObject.create); // using this prevents duplicate keys.
  16.     Writeln(MyList.Keys[0]); // writes 'test'
  17.     MyList.Add('best',TObject.create);
  18.     Writeln(MyList.Keys[0]); // since it is sorted, now prints 'best'
  19.   finally
  20.     MyList.Free;
  21.   end;
  22. end.

If you want that...
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Thaddy on October 03, 2021, 01:51:16 pm
@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).
That is true for worst case. Best case is O(n*log2(n) ( given a quicksort in both cases).
Average case is O(n log n);

Theory is well explained here: https://iq.opengenus.org/time-and-space-complexity-of-quick-sort

Anyway you can expect a penalty as opposed unsorted.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 03, 2021, 02:10:18 pm
If I need a sorted list I usually take TSet from the GSet unit. It makes use of an Red-Black Tree structure on an Array for storing the data. It has some advantages over most other implementations, first it can give you pointer access to the array (avoiding copies, very neat for large records) and it takes the sorting function as generic parameter, making it inlinable (which the function pointer based approaches of structures like TStringList can't). Also it is very lightweight, meaning it doesn't have so much additional functionality like events and customization options, which of course also take their time to be checked and implemented
But it does not contain functionality to automatically free the objects like TObjectList has.

Also, what exactly do you need it for, maybe some other structure (like a min-heap) can serve you just as well.

@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).
But thanks to the large caches modern CPUs today have (mine has 40mb), it is usually (that means when the list is rather small, like only a few pages of size) still much faster than linked non continous datastructures.
One could create more complex datastructures that are linked and still make use of cache locality (e.g. LinkedLists in Java usually are implemented as a linked list of arrays whose size is optimized for the CPU to maximize cache locality). But as far as I know there aren't many such implementations for FreePascal around. So an Array based list is probably the best you get.
Asymptotic complexity is only half the story, often an asymptotically bad algorithm can in real world scenarios, thanks to low static/implementation based overhead, outperform a asymptotically better algorithm.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 03, 2021, 03:00:48 pm
Just wondering at what point the sorted list turned into a sorted set? And the reasoning about processor caches seems to require at least some confirmation.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 03, 2021, 05:21:21 pm
Just wondering at what point the sorted list turned into a sorted set. And the reasoning about processor caches seems to require at least some confirmation.
TSet is part of the FCL-STL package and is basically a reimplementation of the C++ set datatype. It's just the name C++ gave the sorted datastructure. It can not hold duplicate values, for this the multiset type exists. It has nothing to do with the (bit) sets like in pascal.

About the caching, there is this great talk by Bjarne Stroustrup that discusses this: Link (https://www.youtube.com/watch?v=YQs6IC-vgmo)
Basically as the linear search time through the list dominates the runtime, arrays which make use of caching are much faster at that and therefore overall faster

And if we are talking about sorted lists, arrays are even better, because each binary search iteration can be done in constant time (as random access on arrays is in constant time), while all the distances need to be traversed on linked lists (and if the linked list isn't double linked, this it always needs to search from the root, making it O(n) rather than O(logn))

Linked lists are usually only found in places where large continous allocations are not possible like in the kernel. When you still find linked lists in the wild they are usually highly optimized (e.g. by having them allocate using a stack allocator with swap delete in the background, or linking small chunks of multiple objects together, e.g. in Java)

Also arrays have another massive advantage, you can preallocate the memory or perform geometric growth, which can result in massive speedups if the heap is highly fragmented, and many small allocations lead to long search times for the memory allocation
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 03, 2021, 05:49:00 pm
TSet is part of the FCL-STL package and is basically a reimplementation of the C++ set datatype. It's just the name C++ gave the sorted datastructure. It can not hold duplicate values, for this the multiset type exists. It has nothing to do with the (bit) sets like in pascal.
...

So what do you find it looks like a sorted list?

...
About the caching, there is this great talk by Bjarne Stroustrup that discusses this: Link
Basically as the linear search time through the list dominates the runtime, arrays which make use of caching are much faster at that and therefore overall faster

And if we are talking about sorted lists, arrays are even better, because each binary search iteration can be done in constant time (as random access on arrays is in constant time), while all the distances need to be traversed on linked lists (and if the linked list isn't double linked, this it always needs to search from the root, making it O(n) rather than O(logn))
...

Finding items is great, but I think I was talking about adding, not finding items.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 03, 2021, 06:02:56 pm
So what do you find it looks like a sorted list?
It is a sorted list in everything but name.

Finding items is great, but I think I was talking about adding, not finding items.
But to add you need to first do a search. Let's say I have a linked list and want to add an item at position n, then I need to traverse from the root to the (n-1)th element to insert the newly created element right after it. Adding in linked lists is only faster, if you already have an element and want to add an element after (or before in case of a double linked list) that. If you just want to add an element to an arbitrary position, you first need to find that position.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 03, 2021, 06:26:05 pm
...
It is a sorted list in everything but name.
...

Do you think list items should have a "name" attribute?

...
But to add you need to first do a search. Let's say I have a linked list and want to add an item at position n, then I need to traverse from the root to the (n-1)th element to insert the newly created element right after it. Adding in linked lists is only faster, if you already have an element and want to add an element after (or before in case of a double linked list) that. If you just want to add an element to an arbitrary position, you first need to find that position.

Well, and what is it in the end? Without a doubt, searching in a sorted array is O(LogN), but I was talking about adding and this is O(N), and which will win?
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 03, 2021, 06:49:50 pm
Well, and what is it in the end? Without a doubt, searching in a sorted array is O(LogN), but I was talking about adding and this is O(N), and which will win?
Asymptotic notation does not tell you how fast an operation will be, it tells you about the runtime growth with increasing input size. O(N) will for some input size be slower than O(logN) but if that input size threshold is 2^800 bytes, we are a long way before there will be any PC that has even enough memory to have such an input.

Take searching an index in a sorted list. Using binary search you need to do logarithmic many lookups. A lookup on an array is a single operation, while a lookup on a linked list is a traversal from the root to the element. So you have in a linked list a lookup that is in O(logN) the same asymptotic runtime you have in an array, but the static overhead of each access is much larger for the linked list than for the array.
When adding an object to the list you need to potentially move all the elements of the list backwards, which is in O(N).
So the runtime is c1*logN + c2 for adding in the linked list and c3*logN + c4*N for the array.
While for every c1, c2, c3 and c4 there will be an N such that c1*logN + c2 < c3*logN + c4*N, this does not mean that this holds for all N.

The question is not how efficient an algorithm is asymptotically but how efficient it is in a real world scenario. If you only have lists that do not exceed 100 elements, it doesn't matter how well a datastructure performs for arbitrary N, it only matters how efficient it is on N <= 100
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 03, 2021, 08:02:39 pm
...
The question is not how efficient an algorithm is asymptotically but how efficient it is in a real world scenario. If you only have lists that do not exceed 100 elements, it doesn't matter how well a datastructure performs for arbitrary N, it only matters how efficient it is on N <= 100

Whose question is this? Did someone ask it?
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 03, 2021, 08:52:41 pm
If the question is what datastructure to use, the question what is the most efficient in a real world scenario is pretty much the most important question. What is most efficient asymptotically is only of relevance with respect of answering the former question
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: MarkMLl on October 03, 2021, 09:01:01 pm
In fairness, and with respect to everybody, the question is what components/classes are already available lest OP ends up duplicating existing work.

Focussing on the Real World: earlier Warfley said "But to add you need to first do a search". I'd suggest that that is not necessarily a foregone conclusion: the addition could go into a journal reconciled by a background thread, and a search would only need to be done at retrieval (with different performance depending on whether the background thread had had a chance to do its thing).

MarkMLl
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: EganSolo on October 04, 2021, 07:00:48 am
Wow!
  You guys are amazing! I love this forum!
  My starting point was motivated by a nagging feeling that I shouldn't be using TFPStringList JUST because I want objects to remain sorted. It felt like a kludge and I wanted to know if there were alternatives, so to summarize:
Thank you, everyone, for your participation. I found it extremely helpful and instructive.

Egan.

Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 04, 2021, 08:00:22 am
Yes Egan, TSet sounds interesting. What a pity is not documented anywhere in FPC space.

A red-black tree is for many situations, very appropriate.

I did find this http://history.ioinformatics.org/conf/c6_boza.pdf - looks like a 2012 conference talk. Enough to spark a bit of interest.

Davo

Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: PascalDragon on October 04, 2021, 08:57:49 am
Yes Egan, TSet sounds interesting. What a pity is not documented anywhere in FPC space.

They were a third party contribution (by the author of the presentation you linked). Feel free to provide patches to document them.

I did find this http://history.ioinformatics.org/conf/c6_boza.pdf - looks like a 2012 conference talk. Enough to spark a bit of interest.

There's also this (https://ioinformatics.org/journal/INFOL101.pdf) paper by the same author.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 04, 2021, 12:42:30 pm
....
There's also this (https://ioinformatics.org/journal/INFOL101.pdf) paper by the same author.

Ah, thats heaps more informative. If I can make sense of it, and find some time, I will write something, initially as a wiki page, and then if people like yourself consider it not comple rubbish, as a doc patch.

Thanks

Davo
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 08, 2021, 08:44:39 am
Ah, thats heaps more informative. If I can make sense of it, and find some time, I will write something, initially as a wiki page, and then if people like yourself consider it not comple rubbish, as a doc patch.

Well, I did have a quick look at it and it seems TSet is far from what I expected. The following code compiles and works but its apparently not possible for me replace the longint that TSet likes as its data with, for example a pointer to a record. So, is TSet really just capapable of managing longints (or integer, shortints) ?  Thats all ?   

I expected to be able to take pointers to a record, have its sort method redefined, much like can be done with AvgLvlTree.

Would be very good if someone could tell me I am wrong.... 

( I guess I will still write it up as a wiki page with the very basic info I have collected but I don't see it being very useful.)

Code: Pascal  [Select][+][-]
  1. program TSetDemo;
  2.  
  3. {$mode objfpc}{$H+}
  4. uses
  5.     SysUtils, gvector, gset, gutil;        
  6.    
  7. type
  8.       PNote=^TNote;
  9.       TNote = record
  10.             Title : string;
  11.             ID    : string;
  12.       end;  
  13.      
  14. type
  15.     iLess = specialize TLess<longint>;
  16.     iVector = specialize TVector<longint>;
  17.     iSet = specialize TSet<longint, iLess>;
  18.    
  19.     // iNoteSet = specialize TSet<ShortInt, iLess>;     // that compiles    
  20.     // iNoteSet = specialize TSet<PNote, iLess>;        // that does not compile
  21.     // iNoteSet = specialize TSet<pointer, iLess>;      // that does not compile
  22.     // iNoteSet = specialize TSet<extended, iLess>;     // that does not compile
  23.    
  24. procedure TestTSet();
  25. var S : iSet;
  26.     it : iSet.TIterator;
  27. begin
  28.     S := iSet.Create();
  29.     S.Insert(1760);
  30.     S.Insert(69);
  31.     S.Insert(1024);  
  32.     it := S.Min();
  33.     repeat
  34.         writeln(it.GetData);
  35.     until not it.Next();
  36.     it.Free;
  37.     S.free;
  38. end;            
  39.  
  40. begin
  41.     TestTSet();
  42. end.

Ref -
https://ioinformatics.org/journal/INFOL101.pdf
http://oldwww.dcs.fmph.uniba.sk/bakalarky/obhajene/getfile.php/boza11306397896881.pdf?id=159&fid=315&type=application%2Fpdf

Davo

Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 08, 2021, 09:58:44 am
...
  So, is TSet really just capapable of managing longints (or integer, shortints) ?  Thats all ?   
...

You can consider this version:
Code: Pascal  [Select][+][-]
  1. program TSetDemo;
  2. {$mode objfpc}{$H+}
  3. {$modeswitch advancedrecords}
  4. uses
  5.   SysUtils, gvector, gset, gutil;
  6.  
  7. type
  8.   PNote = ^TNote;
  9.   TNote = record
  10.     Title : string;
  11.     ID    : string;
  12.   end;
  13.  
  14.   TLess = record
  15.     class function c(L, R: PNote): Boolean; static;
  16.   end;
  17.  
  18. class function TLess.c(L, R: PNote): Boolean;
  19. begin
  20.   Result := L^.ID < R^.ID;
  21. end;
  22.  
  23. type
  24.   TNoteSet = specialize TSet<PNote, TLess>;
  25.  
  26. procedure TestTSet;
  27. var
  28.   S: TNoteSet;
  29.   it: TNoteSet.TIterator;
  30.   p: PNote;
  31. const
  32.   Notes: array of TNote = (
  33.     (Title: 'Title1'; ID: '111'),
  34.     (Title: 'Title2'; ID: '222'),
  35.     (Title: 'Title3'; ID: '333'),
  36.     (Title: 'Title4'; ID: '444'));
  37. begin
  38.     S := TNoteSet.Create;
  39.     S.Insert(@Notes[3]);
  40.     S.Insert(@Notes[2]);
  41.     S.Insert(@Notes[1]);
  42.     S.Insert(@Notes[0]);
  43.     it := S.Min;
  44.     repeat
  45.       p := it.GetData;
  46.       writeln('Title: ', p^.Title, ', ID: ', p^.ID);
  47.     until not it.Next;
  48.     it.Free;
  49.     S.free;
  50. end;
  51.  
  52. begin
  53.   TestTSet;
  54. end.
  55.  
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 08, 2021, 11:28:19 am
Yes AVK, you are right, indeed, I should consider !

I did not realise that the TLess is the compare function !  That all makes a lot of sense. Yep, it works, so, I will see if I can get a comparison between TSet and the AVL model.

Thanks !

Davo
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 08, 2021, 12:02:57 pm
IIRC, TSet is built on top of the LLRbTree (https://en.wikipedia.org/wiki/Left-leaning_red–black_tree) implementation, which is always slower than AvlTree. And yes, TSet is a Set, not a SortedList.
Also, it is a rather old library and in my opinion too STL-centric.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 08, 2021, 12:25:30 pm
Hmm, that interesting.  I was attracted to the Red-Black Tree's reputation of being faster when being frequently updated. But I guess that is a comparison between two similarly developed models.

I get the impression that the FPC AVL is pretty finely tuned.....

But this started with me complaining that TSet was undocumented. So, I'll at least have a play and create a wiki page !

Thanks AVK

Davo
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Thaddy on October 08, 2021, 12:29:13 pm
IIRC, TSet is built on top of the LLRbTree (https://en.wikipedia.org/wiki/Left-leaning_red–black_tree) implementation, which is always slower than AvlTree. And yes, TSet is a Set, not a SortedList.
Also, it is a rather old library and in my opinion too STL-centric.
With the latter I agree, with the former, no. A set (computer theory, not Pascal sets perse, however Pascal adheres to those properties) is characterized by in-order. ( and ultimately with no duplicates)
See my example that I gave previously in this discussion. In effect that is a "set" and demonstrates its properties well, I guess.

https://en.wikipedia.org/wiki/Set_(abstract_data_type)

Anyway it is one of the easiest to understand.
I also wrote a wiki entry for huge sets that conforms too.

In essence a sorted list with no duplcates allowed is a "set".
Some literature claim unsorted. That has been proven wrong - time space -. Sets are by theory in effect and in implementation - low to high.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 08, 2021, 01:15:31 pm
...
In essence a sorted list with no duplcates allowed is a "set".
...

It seems yes.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 08, 2021, 02:03:26 pm
You can consider this version:
Code: Pascal  [Select][+][-]
  1. program TSetDemo;
  2. {$mode objfpc}{$H+}
  3. {$modeswitch advancedrecords}
  4. uses
  5.   SysUtils, gvector, gset, gutil;
  6.  
  7. type
  8.   PNote = ^TNote;
  9.   TNote = record
  10.     Title : string;
  11.     ID    : string;
  12.   end;
  13.  
  14.   TLess = record
  15.     class function c(L, R: PNote): Boolean; static;
  16.   end;
  17.  
  18. class function TLess.c(L, R: PNote): Boolean;
  19. begin
  20.   Result := L^.ID < R^.ID;
  21. end;
  22.  
  23. type
  24.   TNoteSet = specialize TSet<PNote, TLess>;
  25.  
  26. procedure TestTSet;
  27. var
  28.   S: TNoteSet;
  29.   it: TNoteSet.TIterator;
  30.   p: PNote;
  31. const
  32.   Notes: array of TNote = (
  33.     (Title: 'Title1'; ID: '111'),
  34.     (Title: 'Title2'; ID: '222'),
  35.     (Title: 'Title3'; ID: '333'),
  36.     (Title: 'Title4'; ID: '444'));
  37. begin
  38.     S := TNoteSet.Create;
  39.     S.Insert(@Notes[3]);
  40.     S.Insert(@Notes[2]);
  41.     S.Insert(@Notes[1]);
  42.     S.Insert(@Notes[0]);
  43.     it := S.Min;
  44.     repeat
  45.       p := it.GetData;
  46.       writeln('Title: ', p^.Title, ', ID: ', p^.ID);
  47.     until not it.Next;
  48.     it.Free;
  49.     S.free;
  50. end;
  51.  
  52. begin
  53.   TestTSet;
  54. end.
  55.  

If iterating through it, one can also use the enumerator:
Code: Pascal  [Select][+][-]
  1. [...]
  2.     S := TNoteSet.Create;
  3.     S.Insert(@Notes[3]);
  4.     S.Insert(@Notes[2]);
  5.     S.Insert(@Notes[1]);
  6.     S.Insert(@Notes[0]);
  7.     for p in S do
  8.       writeln('Title: ', p^.Title, ', ID: ', p^.ID);
  9.     S.free;
  10. [...]
  11.  
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 08, 2021, 02:36:40 pm
Are you sure?
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 08, 2021, 02:43:00 pm
...
If iterating through it, one can also use the enumerator:


Not for what i found -

Code: Pascal  [Select][+][-]
  1.       NS := TNoteSet.Create;
  2.       NS.Insert(AddItem('1111', 'Title1'));
  3.       NS.Insert(AddItem('2222', 'Title2'));
  4.       NS.Insert(AddItem('3333', 'Title3'));
  5.       NS.Insert(AddItem('4444', 'Title3'));
  6.       for p in NS do
  7.             writeln('Title: ', p^.Title, ', ID: ', p^.ID);
  8.  

generates -

unit1.pas(95,16) Error: Cannot find an enumerator for the type "TSet$2$crc2218E6FD"

And the iterator is sure needed at dispose time.

Code: Pascal  [Select][+][-]
  1.       it := NS.Min;
  2.       repeat
  3.         dispose(it.GetData);
  4.       until not it.Next;
  5.  

Davo
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 08, 2021, 02:55:06 pm
My bad, the iterator is the enumerator so it must be:
Code: Pascal  [Select][+][-]
  1.     for p in s.Min do
  2.     begin
  3.       WriteLn('Title: ', p^.Title, ', ID: ', p^.ID);
  4.       Dispose(p);
  5.     end;
But this actually has a bug, it skips the first element (because the enumerator next gets called before the first value is read). So yeah, while this theoretically works, better to use a repeat until loop
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 09, 2021, 01:17:06 am
Thats curious Warfley, I am reluctant to alter something while it has a for loop working around it. So, prefer a repeat or while. This makes more sense (to me), but crashes (on one occasion for me did not crash but leaked memory). Don't know why ...

Code: Pascal  [Select][+][-]
  1. it := S.Min;          
  2. repeat
  3.         dispose(S.Min.getdata);
  4. until not it.Next;
  5. it.free;
  6.  

Following works fine, no crash, no memory leak
Code: Pascal  [Select][+][-]
  1. it := S.Min;
  2. repeat
  3.         p := it.GetData;
  4.         dispose(p);
  5. until not it.Next;;
  6. it.Free;

Likewise, as you would expect, the following works fine -

Code: Pascal  [Select][+][-]
  1. it := S.Min;
  2. repeat
  3.         dispose(it.GetData);
  4. until not it.Next;;
  5. it.Free;

But I also found the following leaks memory, just assigning Min to a pointer, not using it in any way. Apparently setting p to point to first element allocates some memory, that does not make sense at all ! Unless S.Min.GetData makes a copy of the data ??

Code: Pascal  [Select][+][-]
  1.      
  2. it := S.Min;
  3. repeat
  4.         p := S.Min.GetData;        // this causes a leak ??
  5.         dispose(it.GetData);      
  6. until not it.Next;;
  7. it.Free;
  8. NS.free;    
 

Starting to look like very fragile code ...

Davo
         
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 09, 2021, 08:31:38 am
...
But this actually has a bug, it skips the first element (because the enumerator next gets called before the first value is read). So yeah, while this theoretically works, better to use a repeat until loop

I already said that FCL-STL is a pretty old library. IIRC, some time ago, TSetIterator did not contain the MoveNext, GetEnumerator and Current methods. Apparently they were added later and in such a clumsy way.
It would be possible to fix it like this:
Code: Pascal  [Select][+][-]
  1. ...
  2. type
  3.  
  4.   { TSetIterator }
  5.  
  6.   generic TSetIterator<T, TNode>=class
  7.   public
  8.   type
  9.     PNode=^TNode;
  10.   var
  11.     FNode:PNode;
  12.   private
  13.     FInLoop: Boolean;
  14.   public
  15.     function GetData:T; Inline;
  16.     function Next:boolean;
  17.     function MoveNext:boolean; Inline;
  18.     function GetEnumerator : TSetIterator; Inline;
  19.     function Prev:boolean;
  20.     property Data:T read GetData;
  21.     property Current:T read GetData;
  22.   end;
  23. ...
  24. implementation
  25. ...
  26. function TSetIterator.GetData:T;
  27. begin
  28.   GetData:= FNode^.Data;
  29. end;
  30.  
  31. function TSetIterator.Next:boolean;
  32. var temp:PNode;
  33. begin
  34.   if(FNode=nil) then exit(false);
  35.   if(FNode^.Right<>nil) then begin
  36.     temp:=FNode^.Right;
  37.     while(temp^.Left<>nil) do temp:=temp^.Left;
  38.   end
  39.   else begin
  40.     temp:=FNode;
  41.     while(true) do begin
  42.       if(temp^.Parent=nil) then begin temp:=temp^.Parent; break; end;
  43.       if(temp^.Parent^.Left=temp) then begin temp:=temp^.Parent; break; end;
  44.       temp:=temp^.Parent;
  45.     end;
  46.   end;
  47.   if (temp = nil) then exit(false);
  48.   FNode:=temp;
  49.   Result:=true;
  50. end;
  51.  
  52. function TSetIterator.MoveNext: boolean;
  53. begin
  54.   if(FNode=nil) then exit(false);
  55.   if FInLoop then
  56.     exit(Next);
  57.   FInLoop := True;
  58.   Result:=true;
  59. end;
  60.  
  61. function TSetIterator.GetEnumerator: TSetIterator;
  62. begin
  63.   result:=self;
  64. end;
  65.  
  66. function TSetIterator.Prev:boolean;
  67. var temp:PNode;
  68. begin
  69.   if(FNode=nil) then exit(false);
  70.   if(FNode^.Left<>nil) then begin
  71.     temp:=FNode^.Left;
  72.     while(temp^.Right<>nil) do temp:=temp^.Right;
  73.   end
  74.   else begin
  75.     temp:=FNode;
  76.     while(true) do begin
  77.       if(temp^.Parent=nil) then begin temp:=temp^.Parent; break; end;
  78.       if(temp^.Parent^.Right=temp) then begin temp:=temp^.Parent; break; end;
  79.       temp:=temp^.Parent;
  80.     end;
  81.   end;
  82.   if (temp = nil) then exit(false);
  83.   FNode:=temp;
  84.   Prev:=true;
  85. end;
  86.  

but what will the TSet.Max enumerator do? In addition, all methods that return an iterator return nil on an empty TSet.

@dbannon, before freeing memory allocated for an item, it should be removed from the container.
It seems that this code does not cause memory leaks:
Code: Pascal  [Select][+][-]
  1. ...
  2. function NewNote(const aTitle, aID: string): PNote;
  3. begin
  4.   Result := New(PNote);
  5.   Result^.Title := aTitle;
  6.   Result^.ID := aID;
  7. end;
  8.  
  9. procedure TestTSet;
  10. var
  11.   S: TNoteSet;
  12.   it: TNoteSet.TIterator;
  13.   p: PNote;
  14.   Done: Boolean;
  15. begin
  16.   S := TNoteSet.Create;
  17.   S.Insert(NewNote('Title4', '444'));
  18.   S.Insert(NewNote('Title3', '333'));
  19.   S.Insert(NewNote('Title2', '222'));
  20.   S.Insert(NewNote('Title1', '111'));
  21.  
  22.   it := S.Min;
  23.   repeat
  24.     p := it.GetData;
  25.     Done := not it.Next;
  26.     S.Delete(p);
  27.     Dispose(p);
  28.   until Done;
  29.   it.Free;
  30.   S.Free;
  31. end;  
  32. ...
  33.  

And if you store records in TSet, and not pointers to them, then this problem will not arise at all.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 09, 2021, 01:20:59 pm
> all methods that return an iterator return nil on an empty TSet.

Yes, TSet does do that.


>@dbannon, before freeing memory allocated for an item, it should be removed from the container. It seems that this code does not cause memory leaks: ....

Thank AVK. I may have confused you with my previous post.  The snippit I posted that leaks does so because I set a pointer to point to the record using S.Min.GetData, not because of an incomplete dispose.  It appears that calling S.Min.GetData always creates a new iterator using the create statement in the Min function.  Min itself is not a creator but it calls create and returns the result.

So, easy answer, every call to S.Min requires a Free and just don't do things like -
Code: Pascal  [Select][+][-]
  1.     dispose(S.Min.GetData);
  2. //or
  3.     p := S.Min.GetData;
  4.  
In both the above cases, you loose track of the iterator so cannot free it.

So, the right way to free/dispose, yes, I see what your suggestion is trying to do, makes sense. But my version works fine, no leaks and no crashes and a shorter code and ever so slightly quicker. To avoid confusion, here it is again -

Code: Pascal  [Select][+][-]
  1.     it := S.Min;
  2.     repeat
  3.             p := it.GetData;
  4.             dispose(p);
  5.     until not it.Next;;
  6.     it.Free;
  7.     S.Free;

Create (just the one) iterator, call its GetData function to get a pointer to a record, dispose the record and repeat until the iterator's Next function is false. Then free the iterator and then the TSet itself.  I am not removing the pointer from the TSet, just disposing of what ever its pointing to.  And heattrc tells me that I have no leaks.

I have been putting quite a lot of data through the TSet in the last hour or so, it does seem quick to me. But I have not yet tried the AVL version to compare.

QUESTION : If I write up this wiki page, do I (ie 'we') say we don't recommend people use this unit because it has a few unexplained behaviors ?

EDIT : I also find it a bit strange that you need to free and recreate that iterator every time you use it for something different. There is no resetting it back to Min because, as mentioned, that "creates" a new one.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 09, 2021, 02:35:07 pm
...
Create (just the one) iterator, call its GetData function to get a pointer to a record, dispose the record and repeat until the iterator's Next function is false. Then free the iterator and then the TSet itself.  I am not removing the pointer from the TSet, just disposing of what ever its pointing to.  And heattrc tells me that I have no leaks.
...

The iterator does not access the data when searching for the next node, so if you immediately call S.Free(), everything should go fine.

As for use/not use, it looks clunky and a little alien, OTOH there is not much choice, TSortedSet from Generics.Collections does not seem much richer in functionality. Well, there are also third-party implementations.
BTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?

...
EDIT : I also find it a bit strange that you need to free and recreate that iterator every time you use it for something different. There is no resetting it back to Min because, as mentioned, that "creates" a new one.

Maybe try something like this:
Code: Pascal  [Select][+][-]
  1. procedure TestTSet;
  2. var
  3.   S: TNoteSet;
  4.   it: TNoteSet.TIterator;
  5.   p: PNote;
  6.   pMin: TNoteSet.PNode;
  7. begin
  8.   S := TNoteSet.Create;
  9.   S.Insert(NewNote('Title4', '444'));
  10.   S.Insert(NewNote('Title3', '333'));
  11.   S.Insert(NewNote('Title2', '222'));
  12.   S.Insert(NewNote('Title1', '111'));
  13.  
  14.   it := S.Min;
  15.   pMin := it.FNode;
  16.   repeat
  17.     p := it.GetData;
  18.     WriteLn('Title: ', p^.Title, ', ID: ', p^.ID);
  19.   until not it.Next;
  20.  
  21.   it.FNode := pMin;
  22.   repeat
  23.     p := it.GetData;
  24.     Dispose(p);
  25.   until not it.Next;
  26.  
  27.   it.Free;
  28.   S.Free;
  29. end;  
  30.  
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 10, 2021, 01:00:51 am
As for use/not use, it looks clunky and a little alien, OTOH there is not much choice,
I am driven by two factors here, firstly I am considering (just considering) using something like this in my app. But right now, AVL seems a safer choice. And, secondly, maybe more importantly, I really want to improve the documentation.  So, however it works out I will write up a wiki page for TSet. I guess if I list the inconsistencies we have identified, a potential user has fair warning.

Quote
Well, there are also third-party implementations.
For my own purposes, I am reluctant to bring in third party solutions. My app get built remotely on the Debian and Ubuntu build systems, any extras adds a significent layer of complexity. And so many third party things have insufficiently clear licenses. Debian has very strict license requirements.

Quote
BTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Quote
... need to free and recreate that iterator every time you use it for something different.

Maybe try something like this:
.....
Yes, that works (I had been trying to save a second iterator, no!). While your model works, if you change the first item, either delete it or add a new one that is even earlier, it goes wrong. As, I am sure, you would expect.  I suspect, unless you are treating the data set as read only, the only safe thing to do is free and recreate.

I reckon all these "funny" things could be reasonably easily fixed but whether its worth the effort remains to be seen. Right now, I would be very reluctant to recommend TSet to someone. (someone got their pHD on the basis of this unit, if I was asked to review I would have sent it back saying "you forgot to attach the docs" ...)

So far, been an interesting exercise, very grateful for your input !

Davo
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 10, 2021, 07:08:12 pm
...
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !
...

I believe this is a copy-paste artifact, and needs to be fixed.
Just take a quote from the manual: "TSet implements container for storing ordered set of unique elements".
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: devEric69 on October 14, 2021, 09:53:45 am
Quote
BTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Hello @dbannon, hello @all others,

No order is guaranteed??!
And @avk said: "I believe this (data being sorted) is a copy-paste artifact, and needs to be fixed."

I've followed this discussion "diagonally", followed at a distance. At the end, I don't know what to conclude from this. Is TSet really finalized as an efficient container, to be used in an application for inserting and deleting large amounts of data (this by automatically re-balancing the container tree with the red-black algo., made to do this quickly, taking into account the "insertion key" of each data)?

Regards.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: Warfley on October 14, 2021, 04:48:08 pm
Quote
BTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Hello @dbannon, hello @all others,

No order is guaranteed??!

This is just a mistake in the wiki. In the paper by the author that was linked earlier:
There's also this (https://ioinformatics.org/journal/INFOL101.pdf) paper by the same author.
It clearly states
Quote
TSet is an ordered container of unique elements.

I've followed this discussion "diagonally", followed at a distance. At the end, I don't know what to conclude from this. Is TSet really finalized as an efficient container, to be used in an application for inserting and deleting large amounts of data (this by automatically re-balancing the container tree with the red-black algo., made to do this quickly, taking into account the "insertion key" of each data)?

Regards.
Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterparts
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: devEric69 on October 14, 2021, 04:58:39 pm
Quote
BTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Hello @dbannon, hello @all others,

No order is guaranteed??!

This is just a mistake in the wiki.
Thanks for the clarification.


It clearly states:
TSet is an ordered container of unique elements.
Reassuring: put a sorting alogorithm in a container, and don't use it...


Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterparts
Yes, but nothing new: C\C++ is 5 to 15% faster than Object Pascal, overall...
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: avk on October 14, 2021, 05:38:50 pm
...
Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterparts
Yes, but nothing new: C\C++ is 10 to 15% faster than Object Pascal, overall...

The problem is possible and not only this. TSet seems to implement a slightly different version of the red-black tree I mentioned somewhere above. I have a conventional red-black tree implementation, a simple comparison benchmark with 1000000 integers shows the following results:
Code: Text  [Select][+][-]
  1. ---FCL-STL TSet---
  2.  insertion 951
  3.  success lookup 577
  4.  failed lookup 671
  5.  deletion 1248
  6.  
  7. ---TGLiteRbTree---
  8.  insertion 650
  9.  success lookup 570
  10.  failed lookup 650
  11.  deletion 750
  12.  

Search times are almost the same(which is not surprising, since all BSTs do it the same way), but insertions and deletions are noticeably different.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: devEric69 on October 14, 2021, 06:58:58 pm
Wow: nice comparative algorithm's tests bench.
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 15, 2021, 07:03:55 am
My performance figures for both a large data input and a find indicate TSet is about the same as the AVL Tree. Maybe +/20%.  I intend to write up a wiki page on the subject, more to document what we, here in this thread, have found than to promote its use.

It is just a little different in the way you use it compared to what regular users of FPC/Lazarus expect, that alone might be about the only real difference between TSet and AVLTree.  I suggest bug reports mentioning AVLTree might get a lot more attention that TSet.

In my case, I am storing a pointer to a rather simple record, what I would consider probably a pretty normal use. I'll probably use the AVLTree for my project if I decide I do need to implement what I have in mind.

Davo
 
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: dbannon on October 19, 2021, 07:18:37 am

https://wiki.freepascal.org/GSet

Comments and constrictive criticism is welcome !

Davo
Title: Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
Post by: devEric69 on October 19, 2021, 09:26:31 am
I understood everything: so, this is proof that it's really very dadactic :D .

The example using a simple record (New, Dispose) rather than an "overkill" TClass (Create, Free) and using an iterator, is meticulous and complete.
TinyPortal © 2005-2018