Recent

Author Topic: Removing duplicates from an array  (Read 4784 times)

BillMcEnaney

  • New Member
  • *
  • Posts: 14
Removing duplicates from an array
« on: May 19, 2022, 07:07:24 pm »
Hi, everyone,

Here's my Pascal function to remove duplicates from a list stored in an array. Please forgive me for any FPC-related mistakes because I'm used to ANSI Standard Pascal.

Here are the type and constant declarations for the program.

const
   Capacity = 80;

type
   Index_Range = 0..Capacity;
   Element_Tupe = integer;
   List = array[Index_Range] of Element_Type;

Code: [Select]
function Remove_Duplicates(Table: List; Table_Length: Index_Range): List;
   var
      Result: List;
      Here, There: Index_Range;
      Duplicates: set of Element_Type;
   begin
      There := 1;
      for Here ::= 1 to Table_Length do
         if Table[Here] not in Duplicates
         then
            begin
               include(Duplicates, Table[Here]);
               Result[There] := Table[Here];
               There := succ(There)
            end;
         if Duplicates = []
            then Result := Table;
        Remove_Duplicates := Result
   end;

The compiler made my laptop complain about the if-statement until I changed Element_Type to 0..5. So I wonder whether the set of integers between 1 and MaxInt is too big to represent with a Pascal set. Am I right about that?

I could overwrite duplicates in the original list. But I use the set and the result list to speed up the function.

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Removing duplicates from an array
« Reply #1 on: May 19, 2022, 11:38:45 pm »
So I wonder whether the set of integers between 1 and MaxInt is too big to represent with a Pascal set. Am I right about that?
You wonder correctly. FPC sets are limited to Byte range ordinals (0..255).
There are generics libraries that offer sets with greater ranges and more functionalities (e.g. a set Count property) than the built-in FPC set implementation.
For instance see here .
« Last Edit: May 19, 2022, 11:43:07 pm by howardpc »

BobDog

  • Sr. Member
  • ****
  • Posts: 394
Re: Removing duplicates from an array
« Reply #2 on: May 20, 2022, 07:17:14 pm »

Is this it in fp maybe?
Code: Pascal  [Select][+][-]
  1.  
  2. const
  3.    Capacity = 255;
  4.  
  5. type
  6.    Index_Range = 0..Capacity;
  7.    Element_Type = integer;
  8.    List = array[Index_Range] of Element_Type;
  9.    aod= array of Element_Type;
  10.    
  11.    function Remove_Duplicates(Table: List; Table_Length: Index_Range):aod;
  12.    var
  13.       Result:array of Element_type;//
  14.      
  15.       Here, There: Index_Range;
  16.       Duplicates:set of index_range;
  17.    begin
  18.     setlength(result,capacity);
  19.     Duplicates:=[0];
  20.       There := 1;
  21.       for Here := 1 to Table_Length do
  22.          if (Table[Here] in Duplicates) =false then
  23.          
  24.             begin
  25.                include(Duplicates, Table[Here]);
  26.                Result[There] := Table[Here];
  27.                There := succ(There)
  28.             end;
  29.          if Duplicates = []
  30.             then Result := Table;
  31.             setlength(result,There);
  32.         Remove_Duplicates := Result
  33.    end;
  34.    
  35.    
  36.   function range(mymin:int32;mymax:int32):int32;
  37.    begin
  38.    range:=trunc(int((Random*(Mymax-mymin+1)))+MyMin);
  39.   end;
  40.  
  41.    
  42.    
  43.    var
  44.    i:integer;
  45.    table:List;
  46.    answer:aod;
  47.    begin
  48.    for i:=0 to high(table) do table[i]:=range(100,110);
  49.    writeln('Before:');
  50.    for i:=0 to high(table) do write(table[i],' ');
  51.    answer:=Remove_Duplicates(table,high(table));
  52.    writeln;
  53.    writeln('After:');
  54.    for i:=1 to high(answer) do write(answer[i],' ');
  55.    writeln;
  56.    writeln('Press return to end');
  57.    readln;
  58.    end.
  59.  
  60.  

And my attempt, keeping the order and using dynamic arrays and template (generics), but not sets.

Code: Pascal  [Select][+][-]
  1.  
  2. generic function CleanUp<T>(a:T):T;
  3. var
  4.  n1,n2,flag:int32;
  5.  count:int32=1;
  6.  ret:T;
  7. begin
  8.   setlength(ret,high(a));
  9.   ret[0]:=a[0];
  10.   For n1 :=1 To high(a) do
  11.   begin
  12.    flag:=0;
  13.    For n2 :=0 to  n1-1 do
  14.     begin
  15.     If (a[n1]=a[n2]) Then
  16.      begin
  17.       flag:=1;
  18.       break;
  19.       end;
  20.      end;
  21.  If (flag=0) Then
  22.     begin
  23.      ret[count]:=a[n1];
  24.      count:=count+1;
  25.     end;
  26.  end;
  27.  setlength(ret,count);
  28.  exit(ret);
  29. end;
  30.  
  31.  function range(mymin:int32;mymax:int32):int32;
  32.    begin
  33.    range:=trunc(int((Random*(Mymax-mymin+1)))+MyMin);
  34.   end;
  35.  
  36.  
  37. type aoi=array of int32;
  38. type aos=array of char;
  39. var ans:aoi=nil;
  40. var ret:aos=nil;
  41. var j:int32;
  42. var i:aoi;
  43. var s:aos;
  44.  
  45. begin
  46. setlength(i,255);
  47. for j:=0 to high(i) do i[j]:=range(100,110);
  48. writeln('Before:');
  49. for j:=low(i) to high(i) do write(i[j],' ');
  50. writeln;
  51. writeln('After:');
  52. ans:=specialize CleanUp<aoi>(i);
  53. for j:=low(ans) to high(ans) do write(ans[j],' ');
  54. writeln;
  55. writeln;
  56. writeln;
  57. setlength(s,50);
  58. for j:=0 to high(s) do if range(1,10)>5 then s[j]:=chr(range(65,90)) else s[j]:=chr(range(97,122));//s[j]:=range
  59. writeln('Before:');
  60. for j:=low(s) to high(s) do write(s[j],' ');
  61. writeln;
  62. writeln('After:');;
  63. ret:=specialize CleanUp<aos>(s);
  64. for j:=low(ret) to high(ret) do write(ret[j],' ');
  65. writeln;
  66. writeln('Press return to end');
  67. readln;
  68.  
  69. end.
  70.  
  71.  
  72.  

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Removing duplicates from an array
« Reply #3 on: May 20, 2022, 07:57:27 pm »
Just a suggestion, while sets in FPC are limited to 256 elements, bitpacked arrays of boolean are not.

That would allow creating a bitmap of duplicates to be used to remove them in a subsequent pass.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #4 on: May 23, 2022, 12:26:19 pm »
The bitmap for the above range (1..MaxInt) must be at least 228 bytes in size and must be initialized.
It doesn't look very practical, does it?

Warfley

  • Hero Member
  • *****
  • Posts: 1499
Re: Removing duplicates from an array
« Reply #5 on: May 23, 2022, 12:50:38 pm »
Using (bit)sets is not a real alternative for large value ranges. The alternative by BobDog is inplace, but has complexity of O(n²) and is therefore quite slow.
Therefore I would suggest as an alternative, to use a dynamic set datastructure, either a hashset, which has an average lookup complexity of O(1) or a sorted set (e.g. RBT) which has a lookup time of O(logn), but the advantage of being easiely pre allocatable.

E.g.
Code: Pascal  [Select][+][-]
  1. uses
  2.   Generics.Collections;
  3.  
  4. type
  5.   TIntArray = array of Integer;
  6.   TIntSet = specialize THashSet<Integer>;
  7.  
  8. function RemoveDublicates(Data: TIntArray): TIntArray;
  9. var
  10.   Visited: TIntSet;
  11.   WriteHead: SizeInt;
  12.   Elem: Integer;
  13. begin
  14.   SetLength(Result, Length(Data));
  15.   WriteHead := 0;
  16.   Visited := TIntSet.Create;
  17.   try
  18.     for Elem in Data do
  19.       if not Visited.Contains(Elem) then
  20.       begin
  21.         Result[WriteHead] := Elem;
  22.         Inc(WriteHead);
  23.         Visited.Add(Elem);
  24.       end;
  25.   finally
  26.     Visited.Free;
  27.   end;
  28.   SetLength(Result, WriteHead);
  29. end;
  30.  
  31. var
  32.   Data: Array of Integer = (5,2,1,3,5,2,3,4,1,2);
  33.   Elem: Integer;
  34. begin
  35.   Data := RemoveDublicates(Data);
  36.   for Elem in Data do
  37.     WriteLn(Elem);
  38.   ReadLn;
  39. end.
With a hash set the average runtime complexity of this function should be O(n), and therefore quite efficient, while not blowing up the memory consumption like a bitset would
« Last Edit: May 23, 2022, 12:52:37 pm by Warfley »

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Removing duplicates from an array
« Reply #6 on: May 23, 2022, 01:18:17 pm »
The bitmap for the above range (1..MaxInt) must be at least 228 bytes in size and must be initialized.
It doesn't look very practical, does it?
You got that right, I didn't say it was a small set  :D  but even a 32 bit process can afford the (approx) 260MB it will take.

I think the really critical part is the number of elements in the array.  For a large number of elements, 2 or 3 hundred thousand+, it might be reasonable to temporarily have a bitmap that large.  Honestly, I don't know where the cut-off is but, for smaller counts, I have no doubt there are smaller and faster ways.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

BobDog

  • Sr. Member
  • ****
  • Posts: 394
Re: Removing duplicates from an array
« Reply #7 on: May 23, 2022, 05:36:22 pm »
Using (bit)sets is not a real alternative for large value ranges. The alternative by BobDog is inplace, but has complexity of O(n²) and is therefore quite slow.
Therefore I would suggest as an alternative, to use a dynamic set datastructure, either a hashset, which has an average lookup complexity of O(1) or a sorted set (e.g. RBT) which has a lookup time of O(logn), but the advantage of being easiely pre allocatable.

E.g.
Code: Pascal  [Select][+][-]
  1. uses
  2.   Generics.Collections;
  3.  
  4. type
  5.   TIntArray = array of Integer;
  6.   TIntSet = specialize THashSet<Integer>;
  7.  
  8. function RemoveDublicates(Data: TIntArray): TIntArray;
  9. var
  10.   Visited: TIntSet;
  11.   WriteHead: SizeInt;
  12.   Elem: Integer;
  13. begin
  14.   SetLength(Result, Length(Data));
  15.   WriteHead := 0;
  16.   Visited := TIntSet.Create;
  17.   try
  18.     for Elem in Data do
  19.       if not Visited.Contains(Elem) then
  20.       begin
  21.         Result[WriteHead] := Elem;
  22.         Inc(WriteHead);
  23.         Visited.Add(Elem);
  24.       end;
  25.   finally
  26.     Visited.Free;
  27.   end;
  28.   SetLength(Result, WriteHead);
  29. end;
  30.  
  31. var
  32.   Data: Array of Integer = (5,2,1,3,5,2,3,4,1,2);
  33.   Elem: Integer;
  34. begin
  35.   Data := RemoveDublicates(Data);
  36.   for Elem in Data do
  37.     WriteLn(Elem);
  38.   ReadLn;
  39. end.
With a hash set the average runtime complexity of this function should be O(n), and therefore quite efficient, while not blowing up the memory consumption like a bitset would

Here are the results for 10 million elements.

BobDog
1250 milliseconds
1055 1059 1072 1085 1060 1086 1042 1062 1065 1038 1044 1030 1090 1005 1097 1027 1048 1079 1082 1053 1057 1039 1093 1084 1007 1034 1008 1002 1037 1096 1078 1014 1087 1098 1047 1080 1046 1052 1068 1011 1064 1058 1054 1095 1076 1010 1041 1026 1018 1074 1021 1013 1001 1032 1015 1061 1022 1091 1045 1036 1070 1006 1067 1017 1075 1031 1003 1099 1016 1100 1025 1024 1066 1019 1009 1088 1051 1028 1012 1029 1069 1092 1000 1040 1083 1071 1043 1077 1089 1073 1050 1023 1035 1081 1049 1020 1094 1063 1004 1056 1033



Warfley
219 milliseconds
1055 1059 1072 1085 1060 1086 1042 1062 1065 1038 1044 1030 1090 1005 1097 1027 1048 1079 1082 1053 1057 1039 1093 1084 1007 1034 1008 1002 1037 1096 1078 1014 1087 1098 1047 1080 1046 1052 1068 1011 1064 1058 1054 1095 1076 1010 1041 1026 1018 1074 1021 1013 1001 1032 1015 1061 1022 1091 1045 1036 1070 1006 1067 1017 1075 1031 1003 1099 1016 1100 1025 1024 1066 1019 1009 1088 1051 1028 1012 1029 1069 1092 1000 1040 1083 1071 1043 1077 1089 1073 1050 1023 1035 1081 1049 1020 1094 1063 1004 1056 1033

array of length 10000000 with range(1000,1100) each element





avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #8 on: May 24, 2022, 03:46:10 pm »
...
I think the really critical part is the number of elements in the array.  For a large number of elements, 2 or 3 hundred thousand+, it might be reasonable to temporarily have a bitmap that large.  Honestly, I don't know where the cut-off is but, for smaller counts, I have no doubt there are smaller and faster ways.

It is possible to try to roughly estimate the effect of range size and array length on performance.
I am used LGenerics on a Win64 machine:
Code: Pascal  [Select][+][-]
  1. program proj;
  2.  
  3. {$mode objfpc}{$h+}
  4. {$modeswitch advancedrecords}
  5. {$maxstacksize $40000000}
  6.  
  7. uses
  8.   SysUtils, Math, lgUtils, lgHashSet, lgHash, lgTreeSet;
  9.  
  10. var
  11.   ArrayLen: Integer = 10000;
  12.   RepCount: Integer = 1000;
  13.  
  14. const
  15.   TryCount = 5;
  16.  
  17. type
  18.   THelper = record
  19.     class function HashCode(aValue: DWord): SizeInt; static; inline;
  20.     class function Equal(L, R: DWord): Boolean; static; inline;
  21.   end;
  22.  
  23.   generic TRangeTest<const UpBound: Integer> = record
  24.   const
  25.     UPPER = UpBound or Ord(UpBound = 0);
  26.  
  27.   type
  28.     TElType           = 1..UPPER;
  29.     TElArray          = array of TElType;
  30.     TRemoveDuplicates = function(const a: array of TElType): TElArray;
  31.  
  32.     TTestFun = record
  33.       Fun: TRemoveDuplicates;
  34.       DisplayText: string;
  35.     end;
  36.  
  37.     TBitSet      = specialize TGSet<TElType>;
  38.     THashSetType = specialize TGLiteChainHashSet<TElType, THelper>;
  39.     THashSet     = THashSetType.TSet;
  40.     TTreeSet     = specialize TGLiteTreeSet<TElType, DWord>;
  41.  
  42.     class function  BitSet(const a: array of TElType): TElArray; static;
  43.     class function  HashSet(const a: array of TElType): TElArray; static;
  44.     class function  TreeSet(const a: array of TElType): TElArray; static;
  45.     class procedure Execute; static;
  46.   end;
  47.  
  48. class function THelper.HashCode(aValue: DWord): SizeInt;
  49. begin
  50.   Result := JdkHash(aValue);
  51. end;
  52.  
  53. class function THelper.Equal(L, R: DWord): Boolean;
  54. begin
  55.   Result := L = R;
  56. end;
  57.  
  58. class function TRangeTest.BitSet(const a: array of TElType): TElArray;
  59. var
  60.   s: TBitSet;
  61.   I, J: SizeInt;
  62. begin
  63.   SetLength(Result, Length(a));
  64.   J := 0;
  65.   for I := 0 to High(a) do
  66.     begin
  67.       if a[I] in s then continue;
  68.       s.Include(a[I]);
  69.       Result[J] := a[I];
  70.       Inc(J);
  71.     end;
  72.   SetLength(Result, J);
  73. end;
  74.  
  75. {$warn 5089 off}
  76. class function TRangeTest.HashSet(const a: array of TElType): TElArray;
  77. var
  78.   s: THashSet;
  79.   I, J: SizeInt;
  80. begin
  81.   s.EnsureCapacity(Min(Ord(High(TElType)) - Ord(Low(TElType)) + 1, Length(a)));
  82.   SetLength(Result, Length(a));
  83.   J := 0;
  84.   for I := 0 to High(a) do
  85.     if s.Add(a[I]) then
  86.       begin
  87.         Result[J] := a[I];
  88.         Inc(J);
  89.       end;
  90.   SetLength(Result, J);
  91. end;
  92.  
  93. class function TRangeTest.TreeSet(const a: array of TElType): TElArray;
  94. var
  95.   s: TTreeSet;
  96.   I, J: SizeInt;
  97. begin
  98.   s.EnsureCapacity(Min(Ord(High(TElType)) - Ord(Low(TElType)) + 1, Length(a)));
  99.   SetLength(Result, Length(a));
  100.   J := 0;
  101.   for I := 0 to High(a) do
  102.     if s.Add(a[I]) then
  103.       begin
  104.         Result[J] := a[I];
  105.         Inc(J);
  106.       end;
  107.   SetLength(Result, J);
  108. end;
  109.  
  110. class procedure TRangeTest.Execute;
  111. var
  112.   a, u: TElArray;
  113.   CurrFun: TTestFun;
  114.   I, J: SizeInt;
  115.   BestScore, Score: QWord;
  116. const
  117.   TestFuns: array of TTestFun = (
  118.     (Fun: @BitSet;  DisplayText: '    BitSet:  '),
  119.     (Fun: @HashSet; DisplayText: '    HashSet: '),
  120.     (Fun: @TreeSet; DisplayText: '    TreeSet: ')
  121.   );
  122. begin
  123.   WriteLn('  Subrange: 1..', UPPER);
  124.   SetLength(a, ArrayLen);
  125.   for I := 0 to High(a) do
  126.     a[I] := Random(UPPER) + 1;
  127.  
  128.   for CurrFun in TestFuns do
  129.     begin
  130.       BestScore := High(QWord);
  131.       for I := 1 to TryCount do
  132.         begin
  133.           Score := GetTickCount64;
  134.           for J := 1 to RepCount do
  135.             u := CurrFun.Fun(a);
  136.           Score := GetTickCount64 - Score;
  137.           if Score < BestScore then
  138.             BestScore := Score;
  139.         end;
  140.       WriteLn(CurrFun.DisplayText, BestScore);
  141.     end;
  142. end;
  143.  
  144. procedure RunTest;
  145. begin
  146.   WriteLn('Array length: ', ArrayLen);
  147.   specialize TRangeTest<1024>.Execute;
  148.   specialize TRangeTest<65536>.Execute;
  149.   specialize TRangeTest<4194304>.Execute;
  150.   specialize TRangeTest<268435456>.Execute;
  151.   specialize TRangeTest<2147483647>.Execute;
  152.   WriteLn;
  153. end;
  154.  
  155. begin
  156.   RunTest;
  157.   ArrayLen := 50000; RepCount := 200;
  158.   RunTest;
  159.   ArrayLen := 200000; RepCount := 50;
  160.   RunTest;
  161.   ArrayLen := 1000000; RepCount := 10;
  162.   RunTest;
  163.   ArrayLen := 5000000; RepCount := 2;
  164.   RunTest;
  165.   ReadLn;
  166. end.
  167.  

And I got something like this:
Code: Text  [Select][+][-]
  1. Array length: 10000
  2.   Subrange: 1..1024
  3.     BitSet:  0
  4.     HashSet: 78
  5.     TreeSet: 655
  6.   Subrange: 1..65536
  7.     BitSet:  15
  8.     HashSet: 171
  9.     TreeSet: 1342
  10.   Subrange: 1..4194304
  11.     BitSet:  124
  12.     HashSet: 156
  13.     TreeSet: 1388
  14.   Subrange: 1..268435456
  15.     BitSet:  10484
  16.     HashSet: 156
  17.     TreeSet: 1372
  18.   Subrange: 1..2147483647
  19.     BitSet:  80138
  20.     HashSet: 171
  21.     TreeSet: 1373
  22.  
  23. Array length: 50000
  24.   Subrange: 1..1024
  25.     BitSet:  0
  26.     HashSet: 78
  27.     TreeSet: 608
  28.   Subrange: 1..65536
  29.     BitSet:  46
  30.     HashSet: 343
  31.     TreeSet: 1684
  32.   Subrange: 1..4194304
  33.     BitSet:  47
  34.     HashSet: 374
  35.     TreeSet: 1919
  36.   Subrange: 1..268435456
  37.     BitSet:  2231
  38.     HashSet: 374
  39.     TreeSet: 1918
  40.   Subrange: 1..2147483647
  41.     BitSet:  16177
  42.     HashSet: 374
  43.     TreeSet: 1918
  44.  
  45. Array length: 200000
  46.   Subrange: 1..1024
  47.     BitSet:  0
  48.     HashSet: 78
  49.     TreeSet: 608
  50.   Subrange: 1..65536
  51.     BitSet:  31
  52.     HashSet: 203
  53.     TreeSet: 1638
  54.   Subrange: 1..4194304
  55.     BitSet:  31
  56.     HashSet: 592
  57.     TreeSet: 2777
  58.   Subrange: 1..268435456
  59.     BitSet:  718
  60.     HashSet: 592
  61.     TreeSet: 2823
  62.   Subrange: 1..2147483647
  63.     BitSet:  4227
  64.     HashSet: 592
  65.     TreeSet: 2808
  66.  
  67. Array length: 1000000
  68.   Subrange: 1..1024
  69.     BitSet:  0
  70.     HashSet: 78
  71.     TreeSet: 608
  72.   Subrange: 1..65536
  73.     BitSet:  15
  74.     HashSet: 141
  75.     TreeSet: 1622
  76.   Subrange: 1..4194304
  77.     BitSet:  47
  78.     HashSet: 795
  79.     TreeSet: 4836
  80.   Subrange: 1..268435456
  81.     BitSet:  312
  82.     HashSet: 826
  83.     TreeSet: 5070
  84.   Subrange: 1..2147483647
  85.     BitSet:  1046
  86.     HashSet: 826
  87.     TreeSet: 5070
  88.  
  89. Array length: 5000000
  90.   Subrange: 1..1024
  91.     BitSet:  0
  92.     HashSet: 78
  93.     TreeSet: 593
  94.   Subrange: 1..65536
  95.     BitSet:  15
  96.     HashSet: 140
  97.     TreeSet: 1591
  98.   Subrange: 1..4194304
  99.     BitSet:  62
  100.     HashSet: 717
  101.     TreeSet: 6489
  102.   Subrange: 1..268435456
  103.     BitSet:  234
  104.     HashSet: 889
  105.     TreeSet: 7628
  106.   Subrange: 1..2147483647
  107.     BitSet:  405
  108.     HashSet: 889
  109.     TreeSet: 7613  
  110.  

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Removing duplicates from an array
« Reply #9 on: May 24, 2022, 05:08:22 pm »
It is possible to try to roughly estimate the effect of range size and array length on performance.
Thank you @avk.

From the results you posted I'd estimate the threshold to be in the vicinity of 2 million elements.  That's quite a bit more than I than initially thought.

That really shows the value of testing off-the-cuff estimates :)

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14199
  • Probably until I exterminate Putin.
Re: Removing duplicates from an array
« Reply #10 on: May 29, 2022, 08:14:14 am »
Note OP's capacity is 81, not 80...
Specialize a type, not a var.

circular

  • Hero Member
  • *****
  • Posts: 4195
    • Personal webpage
Re: Removing duplicates from an array
« Reply #11 on: May 29, 2022, 08:59:14 am »
Another approach is to sort the array. Then it is easy to remove the duplicates with a loop, by having a second index that progress only when a new value is found and where it is copied to. This can be done inplace. In the end set the length of the array of the number of distinct items found.
Conscience is the debugger of the mind

Thaddy

  • Hero Member
  • *****
  • Posts: 14199
  • Probably until I exterminate Putin.
Re: Removing duplicates from an array
« Reply #12 on: May 29, 2022, 01:31:20 pm »
Another approach is using generics.collections and TArray<T>. And indeed it needs to sort, which it can.
That should be a two liner with the correct TDuplicates for TList<T>
Not tested (yet) but it should work.

The magic is in combining the array and the list.
« Last Edit: May 29, 2022, 01:42:26 pm by Thaddy »
Specialize a type, not a var.

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Removing duplicates from an array
« Reply #13 on: May 29, 2022, 02:45:53 pm »
Another approach is to sort the array. Then it is easy to remove the duplicates with a loop, by having a second index that progress only when a new value is found and where it is copied to. This can be done inplace. In the end set the length of the array of the number of distinct items found.

Of course, in LGenerics the T(some)ArrayHelper.SelectDistinct() method does exactly that, but this method does not preserve the order of the elements in the array.

BobDog

  • Sr. Member
  • ****
  • Posts: 394
Re: Removing duplicates from an array
« Reply #14 on: May 30, 2022, 03:13:33 pm »
Another approach is using generics.collections and TArray<T>. And indeed it needs to sort, which it can.
That should be a two liner with the correct TDuplicates for TList<T>
Not tested (yet) but it should work.

The magic is in combining the array and the list.
Anyway, it is too slow to sort the elements in a large array.
I tested with quicksort, the sort time taken is about the total task time taken in my previous method, then you have to remove the duplicates, easy enough in a sorted array.
I didn't bother making a remove procedure, it wasn't worth it.



 

TinyPortal © 2005-2018