Recent

Author Topic: Removing duplicates from an array  (Read 4781 times)

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #15 on: May 30, 2022, 05:59:15 pm »
...
Anyway, it is too slow to sort the elements in a large array.
I tested with quicksort, the sort time taken is about the total task time taken in my previous method, then you have to remove the duplicates, easy enough in a sorted array.
I didn't bother making a remove procedure, it wasn't worth it.

It seems your conclusion is too hasty.
I tried running TGOrdinalArrayHelper.SelectDistinct (denoted Sorting) in the benchmark from my post #8 and got the following timings:
Code: Text  [Select][+][-]
  1. Array length: 10000
  2.   Subrange: 1..1024
  3.     Sorting: 47
  4.   Subrange: 1..65536
  5.     Sorting: 78
  6.   Subrange: 1..4194304
  7.     Sorting: 94
  8.   Subrange: 1..268435456
  9.     Sorting: 109
  10.   Subrange: 1..2147483647
  11.     Sorting: 109
  12.  
  13. Array length: 50000
  14.   Subrange: 1..1024
  15.     Sorting: 47
  16.   Subrange: 1..65536
  17.     Sorting: 109
  18.   Subrange: 1..4194304
  19.     Sorting: 93
  20.   Subrange: 1..268435456
  21.     Sorting: 109
  22.   Subrange: 1..2147483647
  23.     Sorting: 109
  24.  
  25. Array length: 200000
  26.   Subrange: 1..1024
  27.     Sorting: 47
  28.   Subrange: 1..65536
  29.     Sorting: 156
  30.   Subrange: 1..4194304
  31.     Sorting: 110
  32.   Subrange: 1..268435456
  33.     Sorting: 125
  34.   Subrange: 1..2147483647
  35.     Sorting: 125
  36.  
  37. Array length: 1000000
  38.   Subrange: 1..1024
  39.     Sorting: 46
  40.   Subrange: 1..65536
  41.     Sorting: 93
  42.   Subrange: 1..4194304
  43.     Sorting: 140
  44.   Subrange: 1..268435456
  45.     Sorting: 140
  46.   Subrange: 1..2147483647
  47.     Sorting: 140
  48.  
  49. Array length: 5000000
  50.   Subrange: 1..1024
  51.     Sorting: 46
  52.   Subrange: 1..65536
  53.     Sorting: 78
  54.   Subrange: 1..4194304
  55.     Sorting: 171
  56.   Subrange: 1..268435456
  57.     Sorting: 140
  58.   Subrange: 1..2147483647
  59.     Sorting: 140
  60.  

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Removing duplicates from an array
« Reply #16 on: May 31, 2022, 12:09:59 am »
@avk,

There is one thing I don't understand in the results you posted, it is: the (I presume) time taken for 5 million elements is the same as the time taken for 1 million elements.  I understand sorting time isn't linear but, it should take longer to sort 5 million elements than 1 million.

Am I misinterpreting the results you posted ?
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #17 on: May 31, 2022, 06:50:36 am »
If you look inside the benchmark code, you can see that the same number of elements (10M) is processed each time for all array lengths.
BTW in this case (TGOrdinalArrayHelper) the sorting time should be exactly linear.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Removing duplicates from an array
« Reply #18 on: May 31, 2022, 08:45:36 am »
If you want to keep the order as it is, it's probably still simpler to first create an index, then sort and remove the duplicates and last to use the index to put it back in the original order.

BobDog

  • Sr. Member
  • ****
  • Posts: 394
Re: Removing duplicates from an array
« Reply #19 on: May 31, 2022, 01:20:57 pm »

Here is my contribution remove duplicates by sorting.
Code: Pascal  [Select][+][-]
  1.  
  2. uses
  3. sysutils;
  4.  
  5.  //{$rangeChecks on}
  6.  
  7.  
  8. generic Procedure QuickSort<T>(Var arr:array of T;Start:Int32;Finish:Int32);
  9. Var
  10.   i,j: Int64;
  11.   x,temp:T;
  12. Begin
  13.   i := Start ;
  14.   j := finish;
  15.   x := arr[(Start+finish) Div 2];
  16.   While I <= J Do
  17.     Begin
  18.       While arr[I] < x Do
  19.         i:=i+ 1;
  20.       While arr[J] > x Do
  21.         j:=j- 1;
  22.       If I<=J Then
  23.         Begin
  24.           temp := arr[j];
  25.           arr[j] := arr[i];
  26.           arr[i] := temp;
  27.           I:= i+1;
  28.           J:= j- 1;
  29.         End;
  30.     End;
  31.   If J > Start Then specialize QuickSort<T>(arr,Start,J);
  32.   If I < Finish Then specialize QuickSort<T>(arr,I,Finish);
  33. End;
  34.  
  35. generic function RemoveDuplicates<T,Q>(arr:T):T;
  36. var i,count:int32;
  37. var ret:T;
  38. begin
  39. specialize quicksort<Q>(arr,0,high(arr));
  40. count:=0;
  41. setlength (ret,length(arr));
  42. ret[0]:=arr[0];
  43. for i:=0 to high(arr)-1 do
  44.   begin
  45.    if arr[i]<>arr[i+1] then
  46.    begin
  47.     count:=count+1;
  48.     ret[count]:=arr[i+1];
  49.    end;
  50. end;
  51. setlength(ret,count+1);
  52. exit(ret);
  53. end;
  54.  
  55. function range(mymin:int32;mymax:int32):int32;
  56.    begin
  57.    range:=trunc(int((Random*(Mymax-mymin+1)))+MyMin);
  58.   end;
  59.  
  60. type aoi=array of int32;
  61. var
  62. ans,i:aoi;
  63. j:int32;
  64. t:int64;
  65.  
  66.  
  67. begin
  68. ans:=nil;
  69. i:=nil;
  70. setlength(i,5000000);
  71. for j:=0 to high(i) do i[j]:=range(1,1024);
  72. writeln('Please wait');
  73. t:=gettickcount64;
  74. ans:=specialize RemoveDuplicates<aoi,int32>(i);
  75. writeln(' Time taken = ',gettickcount64-t);
  76. for j:=low(ans) to high(ans) do write(ans[j],' ');
  77. writeln;
  78. writeln('Press return to end . . .');
  79. readln;
  80. end.
  81.  

It is easy to implement this way, but for indexing and returning to proper positions I'll have to think about.

SymbolicFrank

  • Hero Member
  • *****
  • Posts: 1313
Re: Removing duplicates from an array
« Reply #20 on: May 31, 2022, 01:36:20 pm »
Alternatively, you can sort the index. It amounts to the same thing.

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #21 on: June 05, 2022, 12:15:24 pm »

Here is my contribution remove duplicates by sorting.
  ...
It is easy to implement this way, but for indexing and returning to proper positions I'll have to think about.

Maybe the simplest way:
  • store the values ​​from the original array, along with their indices, into an auxiliary array;
  • sort auxiliary array entries by values(sorting must be stable);
  • remove entries in the auxiliary array that contain duplicate values;
  • sort the remaining records by indices;
  • store the values ​​from the auxiliary array in the resulting order into the original array and adjust its length;

It is clear that in order for the performance of this method to compete with the  hashset, both sorts must be very fast.

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Removing duplicates from an array
« Reply #22 on: June 05, 2022, 02:52:08 pm »
you realize of course one could simply use a HASH table which allows for faster interactions.

Just sort the localized hashes.
The only true wisdom is knowing you know nothing

Thaddy

  • Hero Member
  • *****
  • Posts: 14197
  • Probably until I exterminate Putin.
Re: Removing duplicates from an array
« Reply #23 on: June 05, 2022, 03:57:22 pm »
you realize of course one could simply use a HASH table which allows for faster interactions.

Just sort the localized hashes.
Sorting hashes is utter nonsense. Ignore that.
Why?
Because hashes have no natural order, just a unique value, which means the sort has no meaning at all.
There is no relationship between the two, except that hashes are already sorted by nature..
Unique should be taken with a pinch of salt: accidents will happen. Elvis Costello is right.
Sorting on hash and then using it is O(n) so futile compared to real sorts.

https://www.youtube.com/watch?v=Ae_bdGihxy8
And:
https://en.wikipedia.org/wiki/Hashed_array_tree
« Last Edit: June 05, 2022, 04:11:06 pm by Thaddy »
Specialize a type, not a var.

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #24 on: June 05, 2022, 05:00:03 pm »
you realize of course one could simply use a HASH table which allows for faster interactions.
...

Are you one of those who only reads the last post in a thread?

...
Just sort the localized hashes.

Funny approach, how would you implement it?

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Removing duplicates from an array
« Reply #25 on: June 05, 2022, 07:37:37 pm »
Some of you are funny, have little common sense and yet you want to bite the hand that feeds you.

 Bad planning starts an app off in the wrong direction.

 Have fun sobbing in your sorrows while others like myself have no issues generating blazing fast execution code.

 what a sad state of affairs.


The only true wisdom is knowing you know nothing

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #26 on: June 05, 2022, 07:44:25 pm »
Amen.

Thaddy

  • Hero Member
  • *****
  • Posts: 14197
  • Probably until I exterminate Putin.
Re: Removing duplicates from an array
« Reply #27 on: June 05, 2022, 08:32:50 pm »
what a sad state of affairs.
Because you can't read? You got at least two proper answers.
And Amen.
Specialize a type, not a var.

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Removing duplicates from an array
« Reply #28 on: June 06, 2022, 01:41:51 pm »
 Of all people @Thaddy, you should be last one stepping up to the plate.

 King of bloat and when some bloat does not work as desired add some more bloat on top of that bloat.


 The vicious cycle continues.

 Good job @Thaddy. Hope you get some more blinded sided followers to add to your crusade.

 There are some that are simply unemployable, meanwhile I need to get back to work and make some real headway.

   Happy Bloating.
The only true wisdom is knowing you know nothing

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #29 on: June 06, 2022, 02:35:27 pm »
It's a shame that the topic ceases to be constructive due to the efforts of participants who have contributed nothing but empty ranting to it.

 

TinyPortal © 2005-2018