@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).That is true for worst case. Best case is O(n*log2(n) ( given a quicksort in both cases).
@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).But thanks to the large caches modern CPUs today have (mine has 40mb), it is usually (that means when the list is rather small, like only a few pages of size) still much faster than linked non continous datastructures.
Just wondering at what point the sorted list turned into a sorted set. And the reasoning about processor caches seems to require at least some confirmation.TSet is part of the FCL-STL package and is basically a reimplementation of the C++ set datatype. It's just the name C++ gave the sorted datastructure. It can not hold duplicate values, for this the multiset type exists. It has nothing to do with the (bit) sets like in pascal.
TSet is part of the FCL-STL package and is basically a reimplementation of the C++ set datatype. It's just the name C++ gave the sorted datastructure. It can not hold duplicate values, for this the multiset type exists. It has nothing to do with the (bit) sets like in pascal.
...
...
About the caching, there is this great talk by Bjarne Stroustrup that discusses this: Link
Basically as the linear search time through the list dominates the runtime, arrays which make use of caching are much faster at that and therefore overall faster
And if we are talking about sorted lists, arrays are even better, because each binary search iteration can be done in constant time (as random access on arrays is in constant time), while all the distances need to be traversed on linked lists (and if the linked list isn't double linked, this it always needs to search from the root, making it O(n) rather than O(logn))
...
So what do you find it looks like a sorted list?It is a sorted list in everything but name.
Finding items is great, but I think I was talking about adding, not finding items.But to add you need to first do a search. Let's say I have a linked list and want to add an item at position n, then I need to traverse from the root to the (n-1)th element to insert the newly created element right after it. Adding in linked lists is only faster, if you already have an element and want to add an element after (or before in case of a double linked list) that. If you just want to add an element to an arbitrary position, you first need to find that position.
...
It is a sorted list in everything but name.
...
...
But to add you need to first do a search. Let's say I have a linked list and want to add an item at position n, then I need to traverse from the root to the (n-1)th element to insert the newly created element right after it. Adding in linked lists is only faster, if you already have an element and want to add an element after (or before in case of a double linked list) that. If you just want to add an element to an arbitrary position, you first need to find that position.
Well, and what is it in the end? Without a doubt, searching in a sorted array is O(LogN), but I was talking about adding and this is O(N), and which will win?Asymptotic notation does not tell you how fast an operation will be, it tells you about the runtime growth with increasing input size. O(N) will for some input size be slower than O(logN) but if that input size threshold is 2^800 bytes, we are a long way before there will be any PC that has even enough memory to have such an input.
...
The question is not how efficient an algorithm is asymptotically but how efficient it is in a real world scenario. If you only have lists that do not exceed 100 elements, it doesn't matter how well a datastructure performs for arbitrary N, it only matters how efficient it is on N <= 100
Yes Egan, TSet sounds interesting. What a pity is not documented anywhere in FPC space.
I did find this http://history.ioinformatics.org/conf/c6_boza.pdf - looks like a 2012 conference talk. Enough to spark a bit of interest.
....
There's also this (https://ioinformatics.org/journal/INFOL101.pdf) paper by the same author.
Ah, thats heaps more informative. If I can make sense of it, and find some time, I will write something, initially as a wiki page, and then if people like yourself consider it not comple rubbish, as a doc patch.
...
So, is TSet really just capapable of managing longints (or integer, shortints) ? Thats all ?
...
IIRC, TSet is built on top of the LLRbTree (https://en.wikipedia.org/wiki/Left-leaning_red–black_tree) implementation, which is always slower than AvlTree. And yes, TSet is a Set, not a SortedList.With the latter I agree, with the former, no. A set (computer theory, not Pascal sets perse, however Pascal adheres to those properties) is characterized by in-order. ( and ultimately with no duplicates)
Also, it is a rather old library and in my opinion too STL-centric.
...
In essence a sorted list with no duplcates allowed is a "set".
...
You can consider this version:
program TSetDemo; {$mode objfpc}{$H+} {$modeswitch advancedrecords} uses SysUtils, gvector, gset, gutil; type PNote = ^TNote; TNote = record Title : string; ID : string; end; TLess = record class function c(L, R: PNote): Boolean; static; end; class function TLess.c(L, R: PNote): Boolean; begin Result := L^.ID < R^.ID; end; type TNoteSet = specialize TSet<PNote, TLess>; procedure TestTSet; var S: TNoteSet; it: TNoteSet.TIterator; p: PNote; const Notes: array of TNote = ( (Title: 'Title1'; ID: '111'), (Title: 'Title2'; ID: '222'), (Title: 'Title3'; ID: '333'), (Title: 'Title4'; ID: '444')); begin S := TNoteSet.Create; S.Insert(@Notes[3]); S.Insert(@Notes[2]); S.Insert(@Notes[1]); S.Insert(@Notes[0]); it := S.Min; repeat p := it.GetData; writeln('Title: ', p^.Title, ', ID: ', p^.ID); until not it.Next; it.Free; S.free; end; begin TestTSet; end.
...
If iterating through it, one can also use the enumerator:
...
But this actually has a bug, it skips the first element (because the enumerator next gets called before the first value is read). So yeah, while this theoretically works, better to use a repeat until loop
...
Create (just the one) iterator, call its GetData function to get a pointer to a record, dispose the record and repeat until the iterator's Next function is false. Then free the iterator and then the TSet itself. I am not removing the pointer from the TSet, just disposing of what ever its pointing to. And heattrc tells me that I have no leaks.
...
...
EDIT : I also find it a bit strange that you need to free and recreate that iterator every time you use it for something different. There is no resetting it back to Min because, as mentioned, that "creates" a new one.
As for use/not use, it looks clunky and a little alien, OTOH there is not much choice,I am driven by two factors here, firstly I am considering (just considering) using something like this in my app. But right now, AVL seems a safer choice. And, secondly, maybe more importantly, I really want to improve the documentation. So, however it works out I will write up a wiki page for TSet. I guess if I list the inconsistencies we have identified, a potential user has fair warning.
Well, there are also third-party implementations.For my own purposes, I am reluctant to bring in third party solutions. My app get built remotely on the Debian and Ubuntu build systems, any extras adds a significent layer of complexity. And so many third party things have insufficiently clear licenses. Debian has very strict license requirements.
BTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?"TSet: Implements red-black tree backed set, no order is guaranteed" ? I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted. And I sure hope to get a touch more documentation in place than that !
Yes, that works (I had been trying to save a second iterator, no!). While your model works, if you change the first item, either delete it or add a new one that is even earlier, it goes wrong. As, I am sure, you would expect. I suspect, unless you are treating the data set as read only, the only safe thing to do is free and recreate.... need to free and recreate that iterator every time you use it for something different.
Maybe try something like this:
.....
...
"TSet: Implements red-black tree backed set, no order is guaranteed" ? I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted. And I sure hope to get a touch more documentation in place than that !
...
QuoteBTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?"TSet: Implements red-black tree backed set, no order is guaranteed" ? I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted. And I sure hope to get a touch more documentation in place than that !
QuoteBTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?"TSet: Implements red-black tree backed set, no order is guaranteed" ? I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted. And I sure hope to get a touch more documentation in place than that !
Hello @dbannon, hello @all others,
No order is guaranteed??!
There's also this (https://ioinformatics.org/journal/INFOL101.pdf) paper by the same author.It clearly states
TSet is an ordered container of unique elements.
I've followed this discussion "diagonally", followed at a distance. At the end, I don't know what to conclude from this. Is TSet really finalized as an efficient container, to be used in an application for inserting and deleting large amounts of data (this by automatically re-balancing the container tree with the red-black algo., made to do this quickly, taking into account the "insertion key" of each data)?Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterparts
Regards.
QuoteBTW, have you seen this (https://wiki.freepascal.org/Data_Structures,_Containers,_Collections) wiki page?"TSet: Implements red-black tree backed set, no order is guaranteed" ? I find the "no order is guaranteed" an interesting statement. I am throwing in over two thousand random records, they come out sorted. And I sure hope to get a touch more documentation in place than that !
Hello @dbannon, hello @all others,
No order is guaranteed??!
This is just a mistake in the wiki.Thanks for the clarification.
It clearly states:Reassuring: put a sorting alogorithm in a container, and don't use it...
TSet is an ordered container of unique elements.
Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterpartsYes, but nothing new: C\C++ is 5 to 15% faster than Object Pascal, overall...
...Yes it is generally designed for performance but as the paper shows, it is slower than the C++ counterpartsYes, but nothing new: C\C++ is 10 to 15% faster than Object Pascal, overall...