Recent

Author Topic: Philosophically speaking: TFPSortedObjectList or TFPStringList?  (Read 12543 times)

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #30 on: October 08, 2021, 02:55:06 pm »
My bad, the iterator is the enumerator so it must be:
Code: Pascal  [Select][+][-]
  1.     for p in s.Min do
  2.     begin
  3.       WriteLn('Title: ', p^.Title, ', ID: ', p^.ID);
  4.       Dispose(p);
  5.     end;
But this actually has a bug, it skips the first element (because the enumerator next gets called before the first value is read). So yeah, while this theoretically works, better to use a repeat until loop

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #31 on: October 09, 2021, 01:17:06 am »
Thats curious Warfley, I am reluctant to alter something while it has a for loop working around it. So, prefer a repeat or while. This makes more sense (to me), but crashes (on one occasion for me did not crash but leaked memory). Don't know why ...

Code: Pascal  [Select][+][-]
  1. it := S.Min;          
  2. repeat
  3.         dispose(S.Min.getdata);
  4. until not it.Next;
  5. it.free;
  6.  

Following works fine, no crash, no memory leak
Code: Pascal  [Select][+][-]
  1. it := S.Min;
  2. repeat
  3.         p := it.GetData;
  4.         dispose(p);
  5. until not it.Next;;
  6. it.Free;

Likewise, as you would expect, the following works fine -

Code: Pascal  [Select][+][-]
  1. it := S.Min;
  2. repeat
  3.         dispose(it.GetData);
  4. until not it.Next;;
  5. it.Free;

But I also found the following leaks memory, just assigning Min to a pointer, not using it in any way. Apparently setting p to point to first element allocates some memory, that does not make sense at all ! Unless S.Min.GetData makes a copy of the data ??

Code: Pascal  [Select][+][-]
  1.      
  2. it := S.Min;
  3. repeat
  4.         p := S.Min.GetData;        // this causes a leak ??
  5.         dispose(it.GetData);      
  6. until not it.Next;;
  7. it.Free;
  8. NS.free;    
 

Starting to look like very fragile code ...

Davo
         
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #32 on: October 09, 2021, 08:31:38 am »
...
But this actually has a bug, it skips the first element (because the enumerator next gets called before the first value is read). So yeah, while this theoretically works, better to use a repeat until loop

I already said that FCL-STL is a pretty old library. IIRC, some time ago, TSetIterator did not contain the MoveNext, GetEnumerator and Current methods. Apparently they were added later and in such a clumsy way.
It would be possible to fix it like this:
Code: Pascal  [Select][+][-]
  1. ...
  2. type
  3.  
  4.   { TSetIterator }
  5.  
  6.   generic TSetIterator<T, TNode>=class
  7.   public
  8.   type
  9.     PNode=^TNode;
  10.   var
  11.     FNode:PNode;
  12.   private
  13.     FInLoop: Boolean;
  14.   public
  15.     function GetData:T; Inline;
  16.     function Next:boolean;
  17.     function MoveNext:boolean; Inline;
  18.     function GetEnumerator : TSetIterator; Inline;
  19.     function Prev:boolean;
  20.     property Data:T read GetData;
  21.     property Current:T read GetData;
  22.   end;
  23. ...
  24. implementation
  25. ...
  26. function TSetIterator.GetData:T;
  27. begin
  28.   GetData:= FNode^.Data;
  29. end;
  30.  
  31. function TSetIterator.Next:boolean;
  32. var temp:PNode;
  33. begin
  34.   if(FNode=nil) then exit(false);
  35.   if(FNode^.Right<>nil) then begin
  36.     temp:=FNode^.Right;
  37.     while(temp^.Left<>nil) do temp:=temp^.Left;
  38.   end
  39.   else begin
  40.     temp:=FNode;
  41.     while(true) do begin
  42.       if(temp^.Parent=nil) then begin temp:=temp^.Parent; break; end;
  43.       if(temp^.Parent^.Left=temp) then begin temp:=temp^.Parent; break; end;
  44.       temp:=temp^.Parent;
  45.     end;
  46.   end;
  47.   if (temp = nil) then exit(false);
  48.   FNode:=temp;
  49.   Result:=true;
  50. end;
  51.  
  52. function TSetIterator.MoveNext: boolean;
  53. begin
  54.   if(FNode=nil) then exit(false);
  55.   if FInLoop then
  56.     exit(Next);
  57.   FInLoop := True;
  58.   Result:=true;
  59. end;
  60.  
  61. function TSetIterator.GetEnumerator: TSetIterator;
  62. begin
  63.   result:=self;
  64. end;
  65.  
  66. function TSetIterator.Prev:boolean;
  67. var temp:PNode;
  68. begin
  69.   if(FNode=nil) then exit(false);
  70.   if(FNode^.Left<>nil) then begin
  71.     temp:=FNode^.Left;
  72.     while(temp^.Right<>nil) do temp:=temp^.Right;
  73.   end
  74.   else begin
  75.     temp:=FNode;
  76.     while(true) do begin
  77.       if(temp^.Parent=nil) then begin temp:=temp^.Parent; break; end;
  78.       if(temp^.Parent^.Right=temp) then begin temp:=temp^.Parent; break; end;
  79.       temp:=temp^.Parent;
  80.     end;
  81.   end;
  82.   if (temp = nil) then exit(false);
  83.   FNode:=temp;
  84.   Prev:=true;
  85. end;
  86.  

but what will the TSet.Max enumerator do? In addition, all methods that return an iterator return nil on an empty TSet.

@dbannon, before freeing memory allocated for an item, it should be removed from the container.
It seems that this code does not cause memory leaks:
Code: Pascal  [Select][+][-]
  1. ...
  2. function NewNote(const aTitle, aID: string): PNote;
  3. begin
  4.   Result := New(PNote);
  5.   Result^.Title := aTitle;
  6.   Result^.ID := aID;
  7. end;
  8.  
  9. procedure TestTSet;
  10. var
  11.   S: TNoteSet;
  12.   it: TNoteSet.TIterator;
  13.   p: PNote;
  14.   Done: Boolean;
  15. begin
  16.   S := TNoteSet.Create;
  17.   S.Insert(NewNote('Title4', '444'));
  18.   S.Insert(NewNote('Title3', '333'));
  19.   S.Insert(NewNote('Title2', '222'));
  20.   S.Insert(NewNote('Title1', '111'));
  21.  
  22.   it := S.Min;
  23.   repeat
  24.     p := it.GetData;
  25.     Done := not it.Next;
  26.     S.Delete(p);
  27.     Dispose(p);
  28.   until Done;
  29.   it.Free;
  30.   S.Free;
  31. end;  
  32. ...
  33.  

And if you store records in TSet, and not pointers to them, then this problem will not arise at all.

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #33 on: October 09, 2021, 01:20:59 pm »
> all methods that return an iterator return nil on an empty TSet.

Yes, TSet does do that.


>@dbannon, before freeing memory allocated for an item, it should be removed from the container. It seems that this code does not cause memory leaks: ....

Thank AVK. I may have confused you with my previous post.  The snippit I posted that leaks does so because I set a pointer to point to the record using S.Min.GetData, not because of an incomplete dispose.  It appears that calling S.Min.GetData always creates a new iterator using the create statement in the Min function.  Min itself is not a creator but it calls create and returns the result.

So, easy answer, every call to S.Min requires a Free and just don't do things like -
Code: Pascal  [Select][+][-]
  1.     dispose(S.Min.GetData);
  2. //or
  3.     p := S.Min.GetData;
  4.  
In both the above cases, you loose track of the iterator so cannot free it.

So, the right way to free/dispose, yes, I see what your suggestion is trying to do, makes sense. But my version works fine, no leaks and no crashes and a shorter code and ever so slightly quicker. To avoid confusion, here it is again -

Code: Pascal  [Select][+][-]
  1.     it := S.Min;
  2.     repeat
  3.             p := it.GetData;
  4.             dispose(p);
  5.     until not it.Next;;
  6.     it.Free;
  7.     S.Free;

Create (just the one) iterator, call its GetData function to get a pointer to a record, dispose the record and repeat until the iterator's Next function is false. Then free the iterator and then the TSet itself.  I am not removing the pointer from the TSet, just disposing of what ever its pointing to.  And heattrc tells me that I have no leaks.

I have been putting quite a lot of data through the TSet in the last hour or so, it does seem quick to me. But I have not yet tried the AVL version to compare.

QUESTION : If I write up this wiki page, do I (ie 'we') say we don't recommend people use this unit because it has a few unexplained behaviors ?

EDIT : I also find it a bit strange that you need to free and recreate that iterator every time you use it for something different. There is no resetting it back to Min because, as mentioned, that "creates" a new one.
« Last Edit: October 09, 2021, 01:32:14 pm by dbannon »
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #34 on: October 09, 2021, 02:35:07 pm »
...
Create (just the one) iterator, call its GetData function to get a pointer to a record, dispose the record and repeat until the iterator's Next function is false. Then free the iterator and then the TSet itself.  I am not removing the pointer from the TSet, just disposing of what ever its pointing to.  And heattrc tells me that I have no leaks.
...

The iterator does not access the data when searching for the next node, so if you immediately call S.Free(), everything should go fine.

As for use/not use, it looks clunky and a little alien, OTOH there is not much choice, TSortedSet from Generics.Collections does not seem much richer in functionality. Well, there are also third-party implementations.
BTW, have you seen this wiki page?

...
EDIT : I also find it a bit strange that you need to free and recreate that iterator every time you use it for something different. There is no resetting it back to Min because, as mentioned, that "creates" a new one.

Maybe try something like this:
Code: Pascal  [Select][+][-]
  1. procedure TestTSet;
  2. var
  3.   S: TNoteSet;
  4.   it: TNoteSet.TIterator;
  5.   p: PNote;
  6.   pMin: TNoteSet.PNode;
  7. begin
  8.   S := TNoteSet.Create;
  9.   S.Insert(NewNote('Title4', '444'));
  10.   S.Insert(NewNote('Title3', '333'));
  11.   S.Insert(NewNote('Title2', '222'));
  12.   S.Insert(NewNote('Title1', '111'));
  13.  
  14.   it := S.Min;
  15.   pMin := it.FNode;
  16.   repeat
  17.     p := it.GetData;
  18.     WriteLn('Title: ', p^.Title, ', ID: ', p^.ID);
  19.   until not it.Next;
  20.  
  21.   it.FNode := pMin;
  22.   repeat
  23.     p := it.GetData;
  24.     Dispose(p);
  25.   until not it.Next;
  26.  
  27.   it.Free;
  28.   S.Free;
  29. end;  
  30.  
« Last Edit: October 09, 2021, 03:19:50 pm by avk »

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #35 on: October 10, 2021, 01:00:51 am »
As for use/not use, it looks clunky and a little alien, OTOH there is not much choice,
I am driven by two factors here, firstly I am considering (just considering) using something like this in my app. But right now, AVL seems a safer choice. And, secondly, maybe more importantly, I really want to improve the documentation.  So, however it works out I will write up a wiki page for TSet. I guess if I list the inconsistencies we have identified, a potential user has fair warning.

Quote
Well, there are also third-party implementations.
For my own purposes, I am reluctant to bring in third party solutions. My app get built remotely on the Debian and Ubuntu build systems, any extras adds a significent layer of complexity. And so many third party things have insufficiently clear licenses. Debian has very strict license requirements.

Quote
BTW, have you seen this wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Quote
... need to free and recreate that iterator every time you use it for something different.

Maybe try something like this:
.....
Yes, that works (I had been trying to save a second iterator, no!). While your model works, if you change the first item, either delete it or add a new one that is even earlier, it goes wrong. As, I am sure, you would expect.  I suspect, unless you are treating the data set as read only, the only safe thing to do is free and recreate.

I reckon all these "funny" things could be reasonably easily fixed but whether its worth the effort remains to be seen. Right now, I would be very reluctant to recommend TSet to someone. (someone got their pHD on the basis of this unit, if I was asked to review I would have sent it back saying "you forgot to attach the docs" ...)

So far, been an interesting exercise, very grateful for your input !

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #36 on: October 10, 2021, 07:08:12 pm »
...
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !
...

I believe this is a copy-paste artifact, and needs to be fixed.
Just take a quote from the manual: "TSet implements container for storing ordered set of unique elements".

devEric69

  • Hero Member
  • *****
  • Posts: 648
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #37 on: October 14, 2021, 09:53:45 am »
Quote
BTW, have you seen this wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Hello @dbannon, hello @all others,

No order is guaranteed??!
And @avk said: "I believe this (data being sorted) is a copy-paste artifact, and needs to be fixed."

I've followed this discussion "diagonally", followed at a distance. At the end, I don't know what to conclude from this. Is TSet really finalized as an efficient container, to be used in an application for inserting and deleting large amounts of data (this by automatically re-balancing the container tree with the red-black algo., made to do this quickly, taking into account the "insertion key" of each data)?

Regards.
« Last Edit: October 14, 2021, 10:00:48 am by devEric69 »
use: Linux 64 bits (Ubuntu 20.04 LTS).
Lazarus version: 2.0.4 (svn revision: 62502M) compiled with fpc 3.0.4 - fpDebug \ Dwarf3.

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #38 on: October 14, 2021, 04:48:08 pm »
Quote
BTW, have you seen this wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Hello @dbannon, hello @all others,

No order is guaranteed??!

This is just a mistake in the wiki. In the paper by the author that was linked earlier:
There's also this paper by the same author.
It clearly states
Quote
TSet is an ordered container of unique elements.

I've followed this discussion "diagonally", followed at a distance. At the end, I don't know what to conclude from this. Is TSet really finalized as an efficient container, to be used in an application for inserting and deleting large amounts of data (this by automatically re-balancing the container tree with the red-black algo., made to do this quickly, taking into account the "insertion key" of each data)?

Regards.
Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterparts

devEric69

  • Hero Member
  • *****
  • Posts: 648
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #39 on: October 14, 2021, 04:58:39 pm »
Quote
BTW, have you seen this wiki page?
"TSet: Implements red-black tree backed set, no order is guaranteed"  ?   I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted.  And I sure hope to get a touch more documentation in place than that !

Hello @dbannon, hello @all others,

No order is guaranteed??!

This is just a mistake in the wiki.
Thanks for the clarification.


It clearly states:
TSet is an ordered container of unique elements.
Reassuring: put a sorting alogorithm in a container, and don't use it...


Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterparts
Yes, but nothing new: C\C++ is 5 to 15% faster than Object Pascal, overall...
« Last Edit: October 15, 2021, 09:19:34 am by devEric69 »
use: Linux 64 bits (Ubuntu 20.04 LTS).
Lazarus version: 2.0.4 (svn revision: 62502M) compiled with fpc 3.0.4 - fpDebug \ Dwarf3.

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #40 on: October 14, 2021, 05:38:50 pm »
...
Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterparts
Yes, but nothing new: C\C++ is 10 to 15% faster than Object Pascal, overall...

The problem is possible and not only this. TSet seems to implement a slightly different version of the red-black tree I mentioned somewhere above. I have a conventional red-black tree implementation, a simple comparison benchmark with 1000000 integers shows the following results:
Code: Text  [Select][+][-]
  1. ---FCL-STL TSet---
  2.  insertion 951
  3.  success lookup 577
  4.  failed lookup 671
  5.  deletion 1248
  6.  
  7. ---TGLiteRbTree---
  8.  insertion 650
  9.  success lookup 570
  10.  failed lookup 650
  11.  deletion 750
  12.  

Search times are almost the same(which is not surprising, since all BSTs do it the same way), but insertions and deletions are noticeably different.

devEric69

  • Hero Member
  • *****
  • Posts: 648
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #41 on: October 14, 2021, 06:58:58 pm »
Wow: nice comparative algorithm's tests bench.
« Last Edit: October 14, 2021, 07:02:07 pm by devEric69 »
use: Linux 64 bits (Ubuntu 20.04 LTS).
Lazarus version: 2.0.4 (svn revision: 62502M) compiled with fpc 3.0.4 - fpDebug \ Dwarf3.

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #42 on: October 15, 2021, 07:03:55 am »
My performance figures for both a large data input and a find indicate TSet is about the same as the AVL Tree. Maybe +/20%.  I intend to write up a wiki page on the subject, more to document what we, here in this thread, have found than to promote its use.

It is just a little different in the way you use it compared to what regular users of FPC/Lazarus expect, that alone might be about the only real difference between TSet and AVLTree.  I suggest bug reports mentioning AVLTree might get a lot more attention that TSet.

In my case, I am storing a pointer to a rather simple record, what I would consider probably a pretty normal use. I'll probably use the AVLTree for my project if I decide I do need to implement what I have in mind.

Davo
 
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

dbannon

  • Hero Member
  • *****
  • Posts: 2786
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #43 on: October 19, 2021, 07:18:37 am »

https://wiki.freepascal.org/GSet

Comments and constrictive criticism is welcome !

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

devEric69

  • Hero Member
  • *****
  • Posts: 648
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #44 on: October 19, 2021, 09:26:31 am »
I understood everything: so, this is proof that it's really very dadactic :D .

The example using a simple record (New, Dispose) rather than an "overkill" TClass (Create, Free) and using an iterator, is meticulous and complete.
use: Linux 64 bits (Ubuntu 20.04 LTS).
Lazarus version: 2.0.4 (svn revision: 62502M) compiled with fpc 3.0.4 - fpDebug \ Dwarf3.

 

TinyPortal © 2005-2018