Recent

Author Topic: Philosophically speaking: TFPSortedObjectList or TFPStringList?  (Read 11100 times)

EganSolo

  • Sr. Member
  • ****
  • Posts: 257
Philosophically speaking: TFPSortedObjectList or TFPStringList?
« on: October 03, 2021, 10:24:32 am »
So I need an object list that can stay sorted.

Start Paren:
In case you were like me at some point and you didn't know about TFPObjectList and TFPStringList, these are cousins to TObjectList and TStringList but without event-related overhead: they don't notify when an action is taken.
End Paren.

I need an object list that can stay sorted. I could be lazy and use TFPStringList and add a field to my objects to act as a unique string, or I could write a descendant to TFPObjectList that has a sorted property and in case sorted is true, IndexOf would rely on a Compare function to do a binary search over the list.

The reason why I'm hesitant to do that is that when I googled "free pascal Lazarus TFPSortedObjectList TSortedFPObjectList TSortedObjectList" Mr. Google came back empty-handed. Am I the _only_ one who needs a sorted object list?

Suddenly I feel like the guy who looks up and finds out that he's alone in a saloon....

So is it then the case that everyone else is just using TStringList?

What am I missing?

Egan

MarkMLl

  • Hero Member
  • *****
  • Posts: 4502
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #1 on: October 03, 2021, 10:48:31 am »
So why are you assuming that somebody who'd done the work would use the names you've apparently pulled out of a hat, and that all of those names would appear on the same page (which is what you've asked for)?

Your curiosity is entirely valid, but you're assuming that Google is an oracle that can divine your intention: it's not, it's there to sell advertising services, and there's more than enough fuzziness in it already.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Thaddy

  • Hero Member
  • *****
  • Posts: 11633
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #2 on: October 03, 2021, 12:40:40 pm »
You can maybe use a TFPGMap or TFpgMapObject,
Example:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}
  2. {$mode objfpc}{$H+}{$warn 3123 off}{$warn 6058 off}
  3. Uses fgl;
  4. type
  5.   TMySortedObjectList = specialize TFpgMapObject<string,TObject>;
  6.  
  7. var
  8.   MyList:TMySortedObjectList;  
  9. begin
  10.   MyList := TMySortedObjectList.Create(true);
  11.   MyList.Sorted:=true;
  12.   try
  13.     MyList.AddOrSetData('test',nil); // using this prevents duplicate keys.
  14.     Writeln(MyList.Keys[0]); // writes 'test'
  15.     MyList.AddOrSetData('best',nil);
  16.     Writeln(MyList.Keys[0]); // since it is sorted, now prints 'best'
  17.   finally
  18.     MyList.Free;
  19.   end;
  20. end.
I guess this already does what you want.
« Last Edit: October 03, 2021, 01:32:23 pm by Thaddy »
Black themes should be banned.

avk

  • Hero Member
  • *****
  • Posts: 602
    • my self-education project
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #3 on: October 03, 2021, 01:45:06 pm »
@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).

Thaddy

  • Hero Member
  • *****
  • Posts: 11633
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #4 on: October 03, 2021, 01:46:04 pm »
Slight variation:
Code: Pascal  [Select][+][-]
  1. {$mode objfpc}{$H+}{$warn 3123 off}{$warn 6058 off}
  2. {$macro on}
  3. // do not allow duplicate keys macro}
  4. {$define add:=AddOrSetData}
  5. Uses fgl;
  6. type
  7.   TMySortedObjectList = specialize TFpgMapObject<string,TObject>;
  8.  
  9. var
  10.   MyList:TMySortedObjectList;  
  11. begin
  12.   MyList := TMySortedObjectList.Create(true);
  13.   MyList.Sorted:=true;
  14.   try
  15.     MyList.Add('test',TObject.create); // using this prevents duplicate keys.
  16.     Writeln(MyList.Keys[0]); // writes 'test'
  17.     MyList.Add('best',TObject.create);
  18.     Writeln(MyList.Keys[0]); // since it is sorted, now prints 'best'
  19.   finally
  20.     MyList.Free;
  21.   end;
  22. end.

If you want that...
Black themes should be banned.

Thaddy

  • Hero Member
  • *****
  • Posts: 11633
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #5 on: October 03, 2021, 01:51:16 pm »
@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).
That is true for worst case. Best case is O(n*log2(n) ( given a quicksort in both cases).
Average case is O(n log n);

Theory is well explained here: https://iq.opengenus.org/time-and-space-complexity-of-quick-sort

Anyway you can expect a penalty as opposed unsorted.
« Last Edit: October 03, 2021, 01:58:47 pm by Thaddy »
Black themes should be banned.

Warfley

  • Hero Member
  • *****
  • Posts: 859
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #6 on: October 03, 2021, 02:10:18 pm »
If I need a sorted list I usually take TSet from the GSet unit. It makes use of an Red-Black Tree structure on an Array for storing the data. It has some advantages over most other implementations, first it can give you pointer access to the array (avoiding copies, very neat for large records) and it takes the sorting function as generic parameter, making it inlinable (which the function pointer based approaches of structures like TStringList can't). Also it is very lightweight, meaning it doesn't have so much additional functionality like events and customization options, which of course also take their time to be checked and implemented
But it does not contain functionality to automatically free the objects like TObjectList has.

Also, what exactly do you need it for, maybe some other structure (like a min-heap) can serve you just as well.

@EganSolo, just FYI, adding items to an array based sorted list usually has a time complexity of O(N^2).
But thanks to the large caches modern CPUs today have (mine has 40mb), it is usually (that means when the list is rather small, like only a few pages of size) still much faster than linked non continous datastructures.
One could create more complex datastructures that are linked and still make use of cache locality (e.g. LinkedLists in Java usually are implemented as a linked list of arrays whose size is optimized for the CPU to maximize cache locality). But as far as I know there aren't many such implementations for FreePascal around. So an Array based list is probably the best you get.
Asymptotic complexity is only half the story, often an asymptotically bad algorithm can in real world scenarios, thanks to low static/implementation based overhead, outperform a asymptotically better algorithm.
« Last Edit: October 03, 2021, 02:13:34 pm by Warfley »

avk

  • Hero Member
  • *****
  • Posts: 602
    • my self-education project
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #7 on: October 03, 2021, 03:00:48 pm »
Just wondering at what point the sorted list turned into a sorted set? And the reasoning about processor caches seems to require at least some confirmation.
« Last Edit: October 03, 2021, 04:11:45 pm by avk »

Warfley

  • Hero Member
  • *****
  • Posts: 859
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #8 on: October 03, 2021, 05:21:21 pm »
Just wondering at what point the sorted list turned into a sorted set. And the reasoning about processor caches seems to require at least some confirmation.
TSet is part of the FCL-STL package and is basically a reimplementation of the C++ set datatype. It's just the name C++ gave the sorted datastructure. It can not hold duplicate values, for this the multiset type exists. It has nothing to do with the (bit) sets like in pascal.

About the caching, there is this great talk by Bjarne Stroustrup that discusses this: Link
Basically as the linear search time through the list dominates the runtime, arrays which make use of caching are much faster at that and therefore overall faster

And if we are talking about sorted lists, arrays are even better, because each binary search iteration can be done in constant time (as random access on arrays is in constant time), while all the distances need to be traversed on linked lists (and if the linked list isn't double linked, this it always needs to search from the root, making it O(n) rather than O(logn))

Linked lists are usually only found in places where large continous allocations are not possible like in the kernel. When you still find linked lists in the wild they are usually highly optimized (e.g. by having them allocate using a stack allocator with swap delete in the background, or linking small chunks of multiple objects together, e.g. in Java)

Also arrays have another massive advantage, you can preallocate the memory or perform geometric growth, which can result in massive speedups if the heap is highly fragmented, and many small allocations lead to long search times for the memory allocation
« Last Edit: October 03, 2021, 05:27:27 pm by Warfley »

avk

  • Hero Member
  • *****
  • Posts: 602
    • my self-education project
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #9 on: October 03, 2021, 05:49:00 pm »
TSet is part of the FCL-STL package and is basically a reimplementation of the C++ set datatype. It's just the name C++ gave the sorted datastructure. It can not hold duplicate values, for this the multiset type exists. It has nothing to do with the (bit) sets like in pascal.
...

So what do you find it looks like a sorted list?

...
About the caching, there is this great talk by Bjarne Stroustrup that discusses this: Link
Basically as the linear search time through the list dominates the runtime, arrays which make use of caching are much faster at that and therefore overall faster

And if we are talking about sorted lists, arrays are even better, because each binary search iteration can be done in constant time (as random access on arrays is in constant time), while all the distances need to be traversed on linked lists (and if the linked list isn't double linked, this it always needs to search from the root, making it O(n) rather than O(logn))
...

Finding items is great, but I think I was talking about adding, not finding items.

Warfley

  • Hero Member
  • *****
  • Posts: 859
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #10 on: October 03, 2021, 06:02:56 pm »
So what do you find it looks like a sorted list?
It is a sorted list in everything but name.

Finding items is great, but I think I was talking about adding, not finding items.
But to add you need to first do a search. Let's say I have a linked list and want to add an item at position n, then I need to traverse from the root to the (n-1)th element to insert the newly created element right after it. Adding in linked lists is only faster, if you already have an element and want to add an element after (or before in case of a double linked list) that. If you just want to add an element to an arbitrary position, you first need to find that position.

avk

  • Hero Member
  • *****
  • Posts: 602
    • my self-education project
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #11 on: October 03, 2021, 06:26:05 pm »
...
It is a sorted list in everything but name.
...

Do you think list items should have a "name" attribute?

...
But to add you need to first do a search. Let's say I have a linked list and want to add an item at position n, then I need to traverse from the root to the (n-1)th element to insert the newly created element right after it. Adding in linked lists is only faster, if you already have an element and want to add an element after (or before in case of a double linked list) that. If you just want to add an element to an arbitrary position, you first need to find that position.

Well, and what is it in the end? Without a doubt, searching in a sorted array is O(LogN), but I was talking about adding and this is O(N), and which will win?

Warfley

  • Hero Member
  • *****
  • Posts: 859
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #12 on: October 03, 2021, 06:49:50 pm »
Well, and what is it in the end? Without a doubt, searching in a sorted array is O(LogN), but I was talking about adding and this is O(N), and which will win?
Asymptotic notation does not tell you how fast an operation will be, it tells you about the runtime growth with increasing input size. O(N) will for some input size be slower than O(logN) but if that input size threshold is 2^800 bytes, we are a long way before there will be any PC that has even enough memory to have such an input.

Take searching an index in a sorted list. Using binary search you need to do logarithmic many lookups. A lookup on an array is a single operation, while a lookup on a linked list is a traversal from the root to the element. So you have in a linked list a lookup that is in O(logN) the same asymptotic runtime you have in an array, but the static overhead of each access is much larger for the linked list than for the array.
When adding an object to the list you need to potentially move all the elements of the list backwards, which is in O(N).
So the runtime is c1*logN + c2 for adding in the linked list and c3*logN + c4*N for the array.
While for every c1, c2, c3 and c4 there will be an N such that c1*logN + c2 < c3*logN + c4*N, this does not mean that this holds for all N.

The question is not how efficient an algorithm is asymptotically but how efficient it is in a real world scenario. If you only have lists that do not exceed 100 elements, it doesn't matter how well a datastructure performs for arbitrary N, it only matters how efficient it is on N <= 100
« Last Edit: October 03, 2021, 06:52:49 pm by Warfley »

avk

  • Hero Member
  • *****
  • Posts: 602
    • my self-education project
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #13 on: October 03, 2021, 08:02:39 pm »
...
The question is not how efficient an algorithm is asymptotically but how efficient it is in a real world scenario. If you only have lists that do not exceed 100 elements, it doesn't matter how well a datastructure performs for arbitrary N, it only matters how efficient it is on N <= 100

Whose question is this? Did someone ask it?

Warfley

  • Hero Member
  • *****
  • Posts: 859
Re: Philosophically speaking: TFPSortedObjectList or TFPStringList?
« Reply #14 on: October 03, 2021, 08:52:41 pm »
If the question is what datastructure to use, the question what is the most efficient in a real world scenario is pretty much the most important question. What is most efficient asymptotically is only of relevance with respect of answering the former question

 

TinyPortal © 2005-2018