Recent

Author Topic: Sorting and Counting  (Read 36224 times)

440bx

  • Hero Member
  • *****
  • Posts: 3921
Re: Sorting and Counting
« Reply #60 on: July 19, 2019, 09:11:34 pm »
@julkas, done
@440bx:
So, do you offer to compete with the Windows file mapping and ntdll? Okay, why not. :)
To run your SortCount (from SortCount2) inside a Howard's benchmark, I put the SortCount2 code in a separate unit:
I appreciate that, thank you. :)

I have to say, I am impressed with the performance of TGOrdinalArrayHelper. 

Code: Pascal  [Select][+][-]
  1. RandomRange = 5
  2.   Avk2's time: 0.7330 #unique: 490842 #total: 2000000
  3.  440bx's time: 0.7490 #unique: 490842 #total: 2000000
  4.  
  5. RandomRange = 6
  6.   Avk2's time: 0.7800 #unique: 578734 #total: 2000000
  7.  440bx's time: 0.7180 #unique: 578734 #total: 2000000
  8.  
  9. RandomRange = 7
  10.   Avk2's time: 0.7950 #unique: 659876 #total: 2000000
  11.  440bx's time: 0.7340 #unique: 659876 #total: 2000000
  12.  
  13. RandomRange = 8
  14.   Avk2's time: 0.8270 #unique: 733819 #total: 2000000
  15.  440bx's time: 0.7490 #unique: 733819 #total: 2000000
  16.  
  17. RandomRange = 9
  18.   Avk2's time: 0.8420 #unique: 802323 #total: 2000000
  19.  440bx's time: 0.7650 #unique: 802323 #total: 2000000
  20.  
  21. RandomRange = 10
  22.   Avk2's time: 0.8730 #unique: 864809 #total: 2000000
  23.  440bx's time: 0.7650 #unique: 864809 #total: 2000000
  24.  
I expected results like the above because using the API one doesn't have to incur in string to binary conversion back and forth (and that is only in this particular case.)  On the other hand, I can't quite figure out why when the test is repeated, your implementation performs better.

Code: Pascal  [Select][+][-]
  1. repeatMillionsCount = 10
  2.   Avk2's time: 3.0730 #unique: 799999 #total: 10000000
  3.  440bx's time: 3.1670 #unique: 799999 #total: 10000000
  4.  
  5. repeatMillionsCount = 12
  6.   Avk2's time: 3.6350 #unique: 799999 #total: 12000000
  7.  440bx's time: 3.7600 #unique: 799999 #total: 12000000
  8.  
  9. repeatMillionsCount = 14
  10.   Avk2's time: 4.1970 #unique: 800000 #total: 14000000
  11.  440bx's time: 4.3360 #unique: 800000 #total: 14000000
  12.  
  13. repeatMillionsCount = 16
  14.   Avk2's time: 4.7580 #unique: 800000 #total: 16000000
  15.  440bx's time: 4.9300 #unique: 800000 #total: 16000000
  16.  
  17. repeatMillionsCount = 18
  18.   Avk2's time: 5.3040 #unique: 800000 #total: 18000000
  19.  440bx's time: 5.5220 #unique: 800000 #total: 18000000
  20.  
  21. repeatMillionsCount = 20
  22.   Avk2's time: 5.9130 #unique: 800000 #total: 20000000
  23.  440bx's time: 6.0840 #unique: 800000 #total: 20000000
  24.  
The differences are not large but, they are consistent.  When executed many times over, your implementation performs better.  One possible reason that comes to mind is, qsort has to call the compare function "many times" and, I'm guessing, TGOrdinalArrayHelper doesn't have the overhead of calling a compare function (I haven't looked at how it is implemented, therefore I really don't know if that may be the reason.)  I thought that not having to do string to integer conversions would compensate for the overhead of passing parameters to the sort function, apparently that is not enough.

Either way, TGOrdinalArrayHelper does a very nice job. 

Are those results for 32bit programs or 64bits programs ?  I'm thinking that in 64bit, the API implementation may be able to match TGOrdinalArrayHelper because it doesn't have to push the pointer arguments onto the stack. 


ETA:

A typical comparison looks like this (I'm on Linux here, so have sadly omitted 440bx)
One thing is for sure, I cannot compete on portability <chuckle>
« Last Edit: July 19, 2019, 09:17:17 pm by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: Sorting and Counting
« Reply #61 on: July 19, 2019, 09:46:04 pm »
I'm guessing, TGOrdinalArrayHelper doesn't have the overhead of calling a compare function (I haven't looked at how it is implemented, therefore I really don't know if that may be the reason.)

As a (happy!) user of LGenerics, I can tell you: it doesn't. As it's TGOrdinalArrayHelper, it uses a built-in sort specifically designed for, well, ordinals (or at least, anything that can be directly compared to them without need of an explicit comparison function, which could possibly also be achieved via operator overloading in the case of more complex types.)

For anything that doesn't fit that description it just won't compile if specialized with them (failing explicitly on the lines that attempt to do the comparisons, obviously.)

LGenerics does also though use a classic (well, classic as much as concepts mostly only relevant since generics were introduced to the language can be considered classic) trick for avoiding the fact that FPC cannot inline function pointers (and thus could not inline "comparer callbacks" if written with an API that used them directly.)

Instead, it takes structured types containing static "class functions" with specific "required names" for different tasks as generic parameters, which can be inlined (if marked inline) they're not being accessed through pointers.

Here's a simplistic example of what I'm talking about (note: the following code is not actually something from LGenerics)

Code: Pascal  [Select][+][-]
  1. program Example;
  2.  
  3. {$mode Delphi}
  4.  
  5. type
  6.   TCallbackFilterer<T> = record
  7.   public type
  8.     TFilterTest = function(constref Current: T): Boolean;
  9.   public
  10.     // ↓↓↓ Test is a function pointer and can never be inlined
  11.     class function Filter(constref Values: array of T; const Test: TFilterTest): TArray<T>; static;
  12.   end;
  13.  
  14.   class function TCallbackFilterer<T>.Filter(constref Values: array of T; const Test: TFilterTest): TArray<T>;
  15.   var I, J: PtrUInt;
  16.   begin
  17.     J := 0;
  18.     SetLength(Result, Length(Values));
  19.     for I := 0 to High(Values) do
  20.       if Test(Values[I]) then begin
  21.         Result[J] := Values[I];
  22.         Inc(J);
  23.       end;
  24.     SetLength(Result, J);
  25.   end;
  26.  
  27. type
  28.   TFunctorFilterer<T, TFilterImpl> = record
  29.   (*
  30.     TFilterImpl being something like:
  31.  
  32.     type TNotGenericTester = record
  33.     public
  34.       class function Test(constref Current: SomeType): Boolean; static; inline;
  35.     end;
  36.  
  37.     or like:
  38.  
  39.     type TGenericTester<T> = record
  40.     public
  41.       class function Test(constref Current: T): Boolean; static; inline;
  42.     end;
  43.   *)
  44.   public
  45.     class function Filter(constref Values: array of T): TArray<T>; static;
  46.   end;
  47.  
  48.   class function TFunctorFilterer<T, TFilterImpl>.Filter(constref Values: array of T): TArray<T>;
  49.   var I, J: PtrUInt;
  50.   begin
  51.     J := 0;
  52.     SetLength(Result, Length(Values));
  53.     for I := 0 to High(Values) do
  54.       // ↓↓↓ This won't compile if TFilterImpl, whatever it may be, has no "Test" function.
  55.       // ↓↓↓ Again though, if it does, this part (usually) will be inlined if it is marked as such.
  56.       if TFilterImpl.Test(Values[I]) then begin
  57.         Result[J] := Values[I];
  58.         Inc(J);
  59.       end;
  60.     SetLength(Result, J);
  61.   end;
  62.  
  63. begin
  64. end.
« Last Edit: July 19, 2019, 09:48:34 pm by Akira1364 »

440bx

  • Hero Member
  • *****
  • Posts: 3921
Re: Sorting and Counting
« Reply #62 on: July 19, 2019, 10:03:17 pm »
As a (happy!) user of LGenerics, I can tell you: it doesn't. As it's TGOrdinalArrayHelper, it uses a built-in sort specifically designed for, well, ordinals (or at least, anything that can be directly compared to them without need of an explicit comparison function, which could possibly also be achieved via operator overloading in the case of more complex types.)
Thank you for clearing that.  As you pointed out, the presence of Ordinal in the helper name is a good hint that the helper is designed and optimized for ordinal types.  I suspected it but, without looking at the actual code, I didn't want to draw any conclusions based on the just the name.

That would explain why, as the number of elements to sort grows, its advantage over anything that passes parameters on the stack grows as well. 

For anything that doesn't fit that description it just won't compile if specialized with them (failing explicitly on the lines that attempt to do the comparisons, obviously.)
That makes perfect sense.

Thank you.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

VTwin

  • Hero Member
  • *****
  • Posts: 1215
  • Former Turbo Pascal 3 user
Re: Sorting and Counting
« Reply #63 on: July 20, 2019, 12:53:32 am »
Just to see what might be done without using generics, I tried replacing the sort in Akv2 with a HeapSort and then a QuickSort (in howardpc's OccurenceCounter). The HeapSort version was about twice as slow as howardpc's, while the QuickSort version edged out that code. I did not directly compare it to Akv2, but that code smokes.

Very interesting!
« Last Edit: July 20, 2019, 01:54:26 am by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

Free Pascal Compiler 3.2.2
macOS 12.1: Lazarus 2.2.6 (64 bit Cocoa M1)
Ubuntu 18.04.3: Lazarus 2.2.6 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.2.6 (64 bit on VBox)

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: Sorting and Counting
« Reply #64 on: July 20, 2019, 02:51:29 am »
Just to see what might be done without using generics, I tried replacing the sort in Akv2 with a HeapSort and then a QuickSort (in howardpc's OccurenceCounter). The HeapSort version was about twice as slow as howardpc's, while the QuickSort version edged out that code. I did not directly compare it to Akv2, but that code smokes.
Very interesting!

TGOrdinalArrayHelper itself does also have a QuickSort, BTW. Also IntroSort and "DualPivotQuickSort."

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Sorting and Counting
« Reply #65 on: July 20, 2019, 06:17:24 am »
@julkas, sorry, it was my mistake,
correct code and output:
Code: Pascal  [Select][+][-]
  1. program OccurrenceCounter;
  2.  
  3. {$mode delphi}
  4. {$ImplicitExceptions Off}
  5. {$MODESWITCH NESTEDPROCVARS}
  6.  
  7. uses
  8.   SysUtils, DateUtils,
  9.   Generics.Defaults, Generics.Collections,
  10.   LGUtils, LGHashMultiSet, LGArrayHelpers,
  11.   gutil, garrayutils, gvector, gmap,
  12.   WinSortCount2;
  13.  
  14. type
  15.   TIntPair = TPair<LongInt, LongInt>;
  16.   TProcedureArray = array of procedure;
  17.  
  18.   function ComparePairs(constref L, R: TIntPair): LongInt;
  19.   begin
  20.     if L.Key < R.Key then
  21.       Result := -1
  22.     else if L.Key = R.Key then
  23.       Result := 0
  24.     else
  25.       Result := 1;
  26.   end;
  27.  
  28. var
  29.   Total, Unique, repeatCount, randomrange: Integer;
  30.   Start: TDateTime;
  31.   inFilename: String = 'data.txt';
  32.   outFilename: String = 'sorted.txt';
  33.   routineName: String;
  34.   procedures: TProcedureArray;
  35.   proc: procedure;
  36.  
  37.   procedure GenerateData(randomRange: Integer=8; repeatMillionsCount: Integer=2);
  38.   var
  39.     InFile: Text;
  40.     I: LongInt;
  41.   begin
  42.     Assign(InFile, inFilename);
  43.     Rewrite(InFile);
  44.     for I := 1 to repeatMillionsCount * 1000000 do
  45.       WriteLn(InFile, 1500000000 + Random(randomRange * 100000));
  46.     Close(InFile);
  47.   end;
  48.  
  49.   procedure SortCountAkira;
  50.   var
  51.     I: LongInt;
  52.     InOut: Text;
  53.     Map: TDictionary<LongInt, LongInt>;
  54.     Pair: TIntPair;
  55.     Pairs: TArray<TIntPair>;
  56.   begin
  57.     routineName := {$I %currentroutine%};
  58.     Map := TDictionary<LongInt, LongInt>.Create();
  59.     Map.Capacity := 10000000;
  60.     Assign(InOut, inFilename);
  61.     Reset(InOut);
  62.     while not EOF(InOut) do begin
  63.       ReadLn(InOut, I);
  64.       Inc(Total);
  65.       if not Map.ContainsKey(I) then
  66.         begin
  67.           Map.Add(I, 1);
  68.           Inc(Unique);
  69.         end
  70.       else
  71.         Map[I] := Map[I] + 1;
  72.     end;
  73.     Close(InOut);
  74.     Pairs := Map.ToArray();
  75.     Map.Free();
  76.     TArrayHelper<TIntPair>.Sort(
  77.       Pairs,
  78.       TComparer<TIntPair>.Construct(ComparePairs)
  79.     );
  80.     Assign(InOut, outFilename);
  81.     Rewrite(InOut);
  82.     for Pair in Pairs do with Pair do
  83.       WriteLn(InOut, Key, ' - ', Value);
  84.     Close(InOut);
  85.   end;
  86.  
  87.   procedure SortCountHoward;
  88.   var
  89.     arr: array of Integer;
  90.     textf: TextFile;
  91.     min: Integer = High(Integer);
  92.     max: Integer = -1;
  93.     i: Integer;
  94.   begin
  95.     routineName := {$I %currentroutine%};
  96.     AssignFile(textf, inFilename);
  97.     Reset(textf);
  98.     while not EOF(textf) do
  99.       begin
  100.         ReadLn(textf, i);
  101.         Inc(Total);
  102.         if i < min then
  103.           min := i;
  104.         if i > max then
  105.           max := i;
  106.       end;
  107.     SetLength(arr, max-min+1);
  108.  
  109.     Reset(textf);
  110.     while not EOF(textf) do
  111.       begin
  112.         ReadLn(textf, i);
  113.         Dec(i, min);
  114.         Inc(arr[i]);
  115.       end;
  116.     CloseFile(textf);
  117.  
  118.     AssignFile(textf, outFilename);
  119.     Rewrite(textf);
  120.     for i := Low(arr) to High(arr) do
  121.       case (arr[i] > 0) of
  122.         True:
  123.           begin
  124.             WriteLn(textf, i+min, ' - ', arr[i]);
  125.             Inc(Unique);
  126.           end;
  127.       end;
  128.     CloseFile(textf);
  129.     SetLength(arr, 0);
  130.   end;
  131.  
  132.   procedure SortCountAvk1;
  133.   type
  134.     TCounter  = TGHashMultiSetLP<Integer>;
  135.     TCountRef = TGAutoRef<TCounter>;
  136.     TEntry    = TCounter.TEntry;
  137.  
  138.     function EntryCmp(constref L, R: TEntry): SizeInt;
  139.     begin
  140.       if L.Key > R.Key then
  141.         Result := 1
  142.       else
  143.         if L.Key < R.Key then
  144.           Result := -1
  145.         else
  146.           Result := 0;
  147.     end;
  148.  
  149.   var
  150.     CountRef: TCountRef;
  151.     InOut: Text;
  152.     Counter: TCounter;
  153.     e: TEntry;
  154.     I: Integer;
  155.   begin
  156.     routineName := {$I %currentroutine%};
  157.     Counter := CountRef;
  158.     Counter.LoadFactor := 0.7;
  159.     Assign(InOut, inFilename);
  160.     Reset(InOut);
  161.     while not EOF(InOut) do
  162.       begin
  163.         ReadLn(InOut, I);
  164.         Counter.Add(I);
  165.       end;
  166.     Close(InOut);
  167.     Total := Counter.Count;
  168.     Unique := Counter.EntryCount;
  169.     if Counter.NonEmpty then
  170.       begin
  171.         Assign(InOut, outFilename);
  172.         Rewrite(InOut);
  173.         for e in Counter.Entries.Sorted(EntryCmp) do
  174.           with e do
  175.             WriteLn(InOut, Key, ' - ', Count);
  176.         Close(InOut);
  177.       end;
  178.   end;
  179.  
  180.   procedure SortCountAvk2;
  181.   var
  182.     List: array of Integer;
  183.     InOut: Text;
  184.     I, J, Count, DupCount: Integer;
  185.   begin
  186.     routineName := {$I %currentroutine%};
  187.     Assign(InOut, inFilename);
  188.     Reset(InOut);
  189.     SetLength(List, 4096);
  190.     I := 0;
  191.     while not EOF(InOut) do
  192.       begin
  193.         ReadLn(InOut, J);
  194.         Inc(Total);
  195.         if Length(List) = I then
  196.           SetLength(List, I * 2);
  197.         List[I] := J;
  198.         Inc(I);
  199.       end;
  200.     Close(InOut);
  201.     SetLength(List, I);
  202.     if List = nil then
  203.       exit;
  204.     TGOrdinalArrayHelper<Integer>.Sort(List);
  205.     Count := I;
  206.     DupCount := 0;
  207.     I := 0;
  208.     Assign(InOut, outFilename);
  209.     Rewrite(InOut);
  210.     repeat
  211.       J := List[I];
  212.       while (I < Count) and (List[I] = J) do
  213.         begin
  214.           Inc(DupCount);
  215.           Inc(I);
  216.         end;
  217.       WriteLn(InOut, J, ' - ', DupCount);
  218.       Inc(Unique);
  219.       DupCount := 0;
  220.     until I = Count;
  221.     Close(InOut);
  222.   end;
  223.  
  224.   procedure SortCountJulkas1;
  225.   type
  226.     TIntLess = TLess<LongInt>;
  227.     TDict = TMap<LongInt, LongInt, TIntLess>;
  228.   var
  229.     sc: TDict;
  230.     scit: TDict.TIterator;
  231.     InOut: Text;
  232.     key, cnt: LongInt;
  233.   begin
  234.     routineName := {$I %currentroutine%};
  235.     sc := TDict.Create;
  236.     Assign(InOut, inFilename);
  237.     Reset(InOut);
  238.     while not EOF(InOut) do
  239.       begin
  240.         ReadLn(InOut, key);
  241.         Inc(Total);
  242.         cnt := 0;
  243.         sc.TryGetValue(key, cnt);
  244.         sc[key] := cnt + 1;
  245.       end;
  246.     Close(InOut);
  247.     Unique := sc.Size;
  248.     if Unique > 0 then
  249.       begin
  250.         Assign(InOut, outFilename);
  251.         Rewrite(InOut);
  252.         scit := sc.Min;
  253.         repeat
  254.           WriteLn(InOut, scit.Key, ' - ', scit.Value);
  255.         until not scit.Next;
  256.         Close(InOut);
  257.         scit.Free;
  258.       end;
  259.     sc.Free;
  260.   end;
  261.  
  262.   procedure SortCountJulkas2;
  263.   type
  264.     TIntLess = TLess<LongInt>;
  265.     TIntVect = TVector<LongInt>;
  266.     TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  267.   var
  268.     sc: array[0..21474] of TIntVect;
  269.     i: LongInt;
  270.     pkey, key, cnt: LongInt;
  271.     offset: LongInt;
  272.     InOut: Text;
  273.   begin
  274.     routineName := {$I %currentroutine%};
  275.     for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  276.  
  277.     Assign(InOut, inFilename);
  278.     Reset(InOut);
  279.     while not EOF(InOut) do
  280.       begin
  281.         ReadLn(InOut, key);
  282.         Inc(Total);
  283.         sc[key div 100000].PushBack(key mod 100000);
  284.       end;
  285.     Close(InOut);
  286.  
  287.     Assign(InOut, outFilename);
  288.     Rewrite(InOut);
  289.  
  290.     for i := Low(sc) to High(sc) do if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  291.  
  292.     offset := -100000;
  293.     for i := Low(sc) to High(sc) do
  294.     begin
  295.       Inc(offset, 100000);
  296.       pkey := -1;
  297.       cnt := 0;
  298.       for key in sc[i] do
  299.       begin
  300.         if pkey <> key then
  301.         begin
  302.           if cnt <> 0 then
  303.           begin
  304.             WriteLn(InOut, offset + pkey, ' - ', cnt);
  305.             Inc(Unique);
  306.           end;
  307.           pkey := key;
  308.           cnt := 0;
  309.         end;
  310.         Inc(cnt);
  311.       end;
  312.       if cnt <> 0 then
  313.       begin
  314.         WriteLn(InOut, offset + pkey, ' - ', cnt);
  315.         Inc(Unique);
  316.       end;
  317.     end;
  318.  
  319.     Close(InOut);
  320.     for i := Low(sc) to High(sc) do
  321.       sc[i].Destroy;
  322.   end;
  323.  
  324.   procedure SortCount440bx;
  325.   begin
  326.     routineName := {$I %currentroutine%};
  327.     WinSortCount2.DataCount := 0;
  328.     WinSortCount2.Unique := 0;
  329.     WinSortCount2.InFileName := inFilename;
  330.     WinSortCount2.OutFileName := outFilename;
  331.     WinSortCount2.SortCount;
  332.     Total := WinSortCount2.DataCount;
  333.     Unique := WinSortCount2.Unique;
  334.   end;
  335.  
  336.   procedure Run(aProc: TProcedure);
  337.   begin
  338.     Total := 0;
  339.     Unique := 0;
  340.     Start := Now;
  341.     try
  342.       aProc();
  343.       WriteLn(Copy(routineName, 10, 20):7,'''s time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4,' #unique: ',Unique,' #total: ',Total);
  344.     except
  345.       on e: Exception do
  346.         WriteLn('crashes with message "', e.Message, '"');
  347.     end;
  348.   end;
  349.  
  350. begin
  351.   Randomize;
  352.  
  353.   procedures := TProcedureArray.Create(
  354.                    SortCountJulkas1, SortCountJulkas2, SortCountAkira, SortCountHoward, SortCountAvk1,
  355.                    SortCountAvk2, SortCount440bx);
  356.  
  357.   for randomrange := 5 to 10 do
  358.     begin
  359.       GenerateData(randomrange);
  360.       WriteLn(#10'RandomRange = ',randomrange);
  361.       for proc in procedures do
  362.         Run(proc);
  363.     end;
  364.  
  365.   for repeatCount := 5 to 10 do
  366.     begin
  367.       GenerateData(8, 2*repeatCount);
  368.       WriteLn(#10'repeatMillionsCount = ', 2*repeatCount);
  369.       for proc in procedures do
  370.         Run(proc);
  371.     end;
  372. end.
  373.  
Code: Text  [Select][+][-]
  1.  
  2. RandomRange = 5
  3. Julkas1's time: 2.7300 #unique: 490897 #total: 2000000
  4. Julkas2's time: 0.9200 #unique: 490897 #total: 2000000
  5.   Akira's time: 1.4530 #unique: 490897 #total: 2000000
  6.  Howard's time: 1.2790 #unique: 490897 #total: 2000000
  7.    Avk1's time: 0.9770 #unique: 490897 #total: 2000000
  8.    Avk2's time: 0.6910 #unique: 490897 #total: 2000000
  9.   440bx's time: 0.7130 #unique: 490897 #total: 2000000
  10.  
  11. RandomRange = 6
  12. Julkas1's time: 2.8220 #unique: 578415 #total: 2000000
  13. Julkas2's time: 0.9490 #unique: 578415 #total: 2000000
  14.   Akira's time: 1.4860 #unique: 578415 #total: 2000000
  15.  Howard's time: 1.2890 #unique: 578415 #total: 2000000
  16.    Avk1's time: 1.0180 #unique: 578415 #total: 2000000
  17.    Avk2's time: 0.7690 #unique: 578415 #total: 2000000
  18.   440bx's time: 0.7360 #unique: 578415 #total: 2000000
  19.  
  20. RandomRange = 7
  21. Julkas1's time: 3.0370 #unique: 659865 #total: 2000000
  22. Julkas2's time: 1.0280 #unique: 659865 #total: 2000000
  23.   Akira's time: 1.5800 #unique: 659865 #total: 2000000
  24.  Howard's time: 1.3480 #unique: 659865 #total: 2000000
  25.    Avk1's time: 1.0770 #unique: 659865 #total: 2000000
  26.    Avk2's time: 0.7860 #unique: 659865 #total: 2000000
  27.   440bx's time: 0.7690 #unique: 659865 #total: 2000000
  28.  
  29. RandomRange = 8
  30. Julkas1's time: 3.1540 #unique: 734083 #total: 2000000
  31. Julkas2's time: 1.0200 #unique: 734083 #total: 2000000
  32.   Akira's time: 1.6300 #unique: 734083 #total: 2000000
  33.  Howard's time: 1.3630 #unique: 734083 #total: 2000000
  34.    Avk1's time: 1.1290 #unique: 734083 #total: 2000000
  35.    Avk2's time: 0.8000 #unique: 734083 #total: 2000000
  36.   440bx's time: 0.7670 #unique: 734083 #total: 2000000
  37.  
  38. RandomRange = 9
  39. Julkas1's time: 3.0850 #unique: 802832 #total: 2000000
  40. Julkas2's time: 1.0320 #unique: 802832 #total: 2000000
  41.   Akira's time: 1.7190 #unique: 802832 #total: 2000000
  42.  Howard's time: 1.3930 #unique: 802832 #total: 2000000
  43.    Avk1's time: 1.1770 #unique: 802832 #total: 2000000
  44.    Avk2's time: 0.8290 #unique: 802832 #total: 2000000
  45.   440bx's time: 0.7690 #unique: 802832 #total: 2000000
  46.  
  47. RandomRange = 10
  48. Julkas1's time: 3.3060 #unique: 865097 #total: 2000000
  49. Julkas2's time: 1.0560 #unique: 865097 #total: 2000000
  50.   Akira's time: 1.7560 #unique: 865097 #total: 2000000
  51.  Howard's time: 1.4280 #unique: 865097 #total: 2000000
  52.    Avk1's time: 1.2060 #unique: 865097 #total: 2000000
  53.    Avk2's time: 0.8500 #unique: 865097 #total: 2000000
  54.   440bx's time: 0.7820 #unique: 865097 #total: 2000000
  55.  
  56. repeatMillionsCount = 10
  57. Julkas1's time: 13.9870 #unique: 799997 #total: 10000000
  58. Julkas2's time: 4.2000 #unique: 799997 #total: 10000000
  59.   Akira's time: 5.0900 #unique: 799997 #total: 10000000
  60.  Howard's time: 5.9450 #unique: 799997 #total: 10000000
  61.    Avk1's time: 4.0040 #unique: 799997 #total: 10000000
  62.    Avk2's time: 3.0150 #unique: 799997 #total: 10000000
  63.   440bx's time: 3.2640 #unique: 799997 #total: 10000000
  64.  
  65. repeatMillionsCount = 12
  66. Julkas1's time: 17.3100 #unique: 799999 #total: 12000000
  67. Julkas2's time: 5.4000 #unique: 799999 #total: 12000000
  68.   Akira's time: 6.4410 #unique: 799999 #total: 12000000
  69.  Howard's time: 7.3240 #unique: 799999 #total: 12000000
  70.    Avk1's time: 4.6600 #unique: 799999 #total: 12000000
  71.    Avk2's time: 3.5000 #unique: 799999 #total: 12000000
  72.   440bx's time: 3.8170 #unique: 799999 #total: 12000000
  73.  
  74. repeatMillionsCount = 14
  75. Julkas1's time: 19.2340 #unique: 800000 #total: 14000000
  76. Julkas2's time: 5.7660 #unique: 800000 #total: 14000000
  77.   Akira's time: 6.3470 #unique: 800000 #total: 14000000
  78.  Howard's time: 8.0130 #unique: 800000 #total: 14000000
  79.    Avk1's time: 5.3040 #unique: 800000 #total: 14000000
  80.    Avk2's time: 4.0090 #unique: 800000 #total: 14000000
  81.   440bx's time: 4.4350 #unique: 800000 #total: 14000000
  82.  
  83. repeatMillionsCount = 16
  84. Julkas1's time: 21.5130 #unique: 800000 #total: 16000000
  85. Julkas2's time: 6.5050 #unique: 800000 #total: 16000000
  86.   Akira's time: 7.1720 #unique: 800000 #total: 16000000
  87.  Howard's time: 9.0940 #unique: 800000 #total: 16000000
  88.    Avk1's time: 6.7390 #unique: 800000 #total: 16000000
  89.    Avk2's time: 4.7270 #unique: 800000 #total: 16000000
  90.   440bx's time: 5.0230 #unique: 800000 #total: 16000000
  91.  
  92. repeatMillionsCount = 18
  93. Julkas1's time: 27.4910 #unique: 800000 #total: 18000000
  94. Julkas2's time: 7.9570 #unique: 800000 #total: 18000000
  95.   Akira's time: 7.9090 #unique: 800000 #total: 18000000
  96.  Howard's time: 10.1090 #unique: 800000 #total: 18000000
  97.    Avk1's time: 6.7080 #unique: 800000 #total: 18000000
  98.    Avk2's time: 5.1170 #unique: 800000 #total: 18000000
  99.   440bx's time: 5.5690 #unique: 800000 #total: 18000000
  100.  
  101. repeatMillionsCount = 20
  102. Julkas1's time: 26.8320 #unique: 800000 #total: 20000000
  103. Julkas2's time: 8.0490 #unique: 800000 #total: 20000000
  104.   Akira's time: 8.6270 #unique: 800000 #total: 20000000
  105.  Howard's time: 11.1390 #unique: 800000 #total: 20000000
  106.    Avk1's time: 7.3470 #unique: 800000 #total: 20000000
  107.    Avk2's time: 5.6160 #unique: 800000 #total: 20000000
  108.   440bx's time: 6.2250 #unique: 800000 #total: 20000000
  109.  

@Akira1364, thank you for the detailed explanation LGenerics's internals.

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Sorting and Counting
« Reply #66 on: July 20, 2019, 06:23:13 am »
Fascinating thread.

We should send a copy to Knuth, to show that the flame is being kept alive :D
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

440bx

  • Hero Member
  • *****
  • Posts: 3921
Re: Sorting and Counting
« Reply #67 on: July 20, 2019, 07:36:17 am »
Fascinating thread.
yes, it is.

One thing I noticed is that the number of unique occurrences basically stays constant from 10 million to 20 million.  That's likely caused by the range not growing along with the number of unixtimes in the file.

It seems reasonable to presume that a larger number of elements are usually obtained over a longer period of time which would mean that the range (controlled by randomRange) should be larger as a result of that.

Increasing the range as the number of unixtimes grows would likely reflect what happens in reality more accurately.


ETA:

The attached version - SortCount3 (unit) - should perform better than SortCount2 (if the range remains constant instead of increasing.)



« Last Edit: July 20, 2019, 08:29:14 am by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

ASerge

  • Hero Member
  • *****
  • Posts: 2212
Re: Sorting and Counting
« Reply #68 on: July 20, 2019, 10:09:18 am »
Looks great, but not testable. Avk, can you upload the full project compilable on fpc 3.0.4? The function from 440bx needs to be wrapped in {$IFDEF WINDOWS}.

julkas

  • Guest
Re: Sorting and Counting
« Reply #69 on: July 20, 2019, 11:54:24 am »
Second edition improvement. CPU - Intel Dual E2200, RAM - 3 GB, SATA HDD. Please update your benchmarks (check Lazarus project mode - debug or release, my - release)  ;D
Code: Pascal  [Select][+][-]
  1. program sc2;
  2. {$mode delphi}
  3. uses gvector, gutil, garrayutils, SysUtils;
  4. const
  5.   keyNum = 10000000;
  6.   bsz = 1 shl 17;
  7. type
  8.   TIntLess = TLess<LongInt>;
  9.   TIntVect = TVector<LongInt>;
  10.   TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  11. var
  12.   sc: array[0..2147483647 shr 17] of TIntVect;
  13.   i: LongInt;
  14.   pkey, key, cnt, uniq: LongInt;
  15.   offset: LongInt;
  16.   start: QWord;
  17.   outFile: Text;
  18. begin
  19.   start := GetTickCount64();
  20.   for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  21.  
  22.   for i := 0 to keyNum do
  23.   begin
  24.     key := Random(2147483647);
  25.     //key := 1500000000 + Random(800000);
  26.     sc[key shr 17].PushBack(key and $1FFFF);
  27.   end;
  28.   WriteLn('Populated (ticks) - ', GetTickCount64() - start);
  29.   Assign(outFile, 'out.txt');
  30.   Rewrite(outFile);
  31.  
  32.   offset := -bsz;
  33.   uniq := 0;
  34.   for i := Low(sc) to High(sc) do
  35.   begin
  36.     Inc(offset, bsz);
  37.     pkey := -1;
  38.     cnt := 0;
  39.     if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  40.     for key in sc[i] do
  41.     begin
  42.       if pkey <> key then
  43.       begin
  44.         if cnt <> 0 then
  45.         begin
  46.           WriteLn(outFile, offset + pkey, ' - ', cnt);
  47.           Inc(uniq);
  48.         end;
  49.         pkey := key;
  50.         cnt := 0;
  51.       end;
  52.       Inc(cnt);
  53.     end;
  54.     if cnt <> 0 then
  55.     begin
  56.       WriteLn(outFile, offset + pkey, ' - ', cnt);
  57.       Inc(uniq);
  58.     end;
  59.   end;
  60.  
  61.   Close(outFile);
  62.   for i := Low(sc) to High(sc) do sc[i].Destroy;
  63.   WriteLn('Total (ticks) - ', GetTickCount64() - start);
  64.   WriteLn('Uniq keys - ', uniq, ', out of - ', keyNum);
  65.   ReadLn;
  66. end.
Output
Code: Text  [Select][+][-]
  1. Populated (ticks) - 3853
  2. Total (ticks) - 15397
  3. Uniq keys - 9976566, out of - 10000000
« Last Edit: July 20, 2019, 12:19:31 pm by julkas »

Thaddy

  • Hero Member
  • *****
  • Posts: 14164
  • Probably until I exterminate Putin.
Re: Sorting and Counting
« Reply #70 on: July 20, 2019, 03:05:54 pm »
Glad to see 440bx used my suggestion to use memory mapped files. This can also be done for unix-likes with fpmmap/fpmflush/fpmunmap. It should render a similar speed increase (as I explained)
Specialize a type, not a var.

VTwin

  • Hero Member
  • *****
  • Posts: 1215
  • Former Turbo Pascal 3 user
Re: Sorting and Counting
« Reply #71 on: July 20, 2019, 06:30:18 pm »
TGOrdinalArrayHelper itself does also have a QuickSort, BTW. Also IntroSort and "DualPivotQuickSort."

Impressive. I see the default is an Introsort. I tried my own home-rolled Quicksort, Heapsort, and DualPivotQuicksort, to see how they held up. The Quicksort does well, oddly the DualPivotQuicksort does worse.

It goes to show the value of a well crafted generics library, no tambourines required. I have some catch up reading to do on generics. Kudos to avk.

Code: Pascal  [Select][+][-]
  1. repeatMillionsCount = 20
  2.     Howard time: 7.1940 #unique: 800000 #total: 20000000
  3.     VTHeap time: 17.262 #unique: 800000 #total: 20000000
  4.    VTQuick time: 6.5210 #unique: 800000 #total: 20000000
  5.  VTDPQuick time: 8.6090 #unique: 800000 #total: 20000000
« Last Edit: July 20, 2019, 06:50:46 pm by VTwin »
“Talk is cheap. Show me the code.” -Linus Torvalds

Free Pascal Compiler 3.2.2
macOS 12.1: Lazarus 2.2.6 (64 bit Cocoa M1)
Ubuntu 18.04.3: Lazarus 2.2.6 (64 bit on VBox)
Windows 7 Pro SP1: Lazarus 2.2.6 (64 bit on VBox)

avk

  • Hero Member
  • *****
  • Posts: 752
Re: Sorting and Counting
« Reply #72 on: July 21, 2019, 07:03:41 am »
@VTwin, thanks.
It seems DualPivotQuicksort is quit good when the input array contains only a few repetitive values.

@ASerge, I'm not sure about rtl-generics, but LGenerics is definitely incompatible with FPC 3.0.4.

@julkas, I replaced the SortCountJulkas1 code with the one of your new version.

@ 440bx, I used your new WinSortCount3, but I turned off I/O checking.
Benchmark is compiled with a 32-bit compiler and runs on 64-bit Windows7.

code:
Code: Pascal  [Select][+][-]
  1. program OccurrenceCounter;
  2.  
  3. {$mode delphi}
  4. {$ImplicitExceptions Off}
  5. {$MODESWITCH NESTEDPROCVARS}
  6.  
  7. uses
  8.   SysUtils, DateUtils,
  9.   Generics.Defaults, Generics.Collections,
  10.   LGUtils, LGHashMultiSet, LGArrayHelpers,
  11.   gutil, garrayutils, gvector, gmap,
  12.   WinSortCount3;
  13.  
  14. type
  15.   TIntPair = TPair<LongInt, LongInt>;
  16.   TProcedureArray = array of procedure;
  17.  
  18.   function ComparePairs(constref L, R: TIntPair): LongInt;
  19.   begin
  20.     if L.Key < R.Key then
  21.       Result := -1
  22.     else if L.Key = R.Key then
  23.       Result := 0
  24.     else
  25.       Result := 1;
  26.   end;
  27.  
  28. var
  29.   Total, Unique, repeatCount, randomrange: Integer;
  30.   Start: TDateTime;
  31.   inFilename: String = 'data.txt';
  32.   outFilename: String = 'sorted.txt';
  33.   routineName: String;
  34.   procedures: TProcedureArray;
  35.   proc: procedure;
  36.  
  37.   procedure GenerateData(randomRange: Integer=8; repeatMillionsCount: Integer=2);
  38.   var
  39.     InFile: Text;
  40.     I: LongInt;
  41.   begin
  42.     Assign(InFile, inFilename);
  43.     Rewrite(InFile);
  44.     for I := 1 to repeatMillionsCount * 1000000 do
  45.       WriteLn(InFile, 1500000000 + Random(randomRange * 100000));
  46.     Close(InFile);
  47.   end;
  48.  
  49.   procedure SortCountAkira;
  50.   var
  51.     I: LongInt;
  52.     InOut: Text;
  53.     Map: TDictionary<LongInt, LongInt>;
  54.     Pair: TIntPair;
  55.     Pairs: TArray<TIntPair>;
  56.   begin
  57.     routineName := {$I %currentroutine%};
  58.     Map := TDictionary<LongInt, LongInt>.Create();
  59.     //Map.Capacity := 10000000;
  60.     Assign(InOut, inFilename);
  61.     Reset(InOut);
  62.     while not EOF(InOut) do begin
  63.       {$I-}ReadLn(InOut, I);{$I+}
  64.       Inc(Total);
  65.       if not Map.ContainsKey(I) then
  66.         begin
  67.           Map.Add(I, 1);
  68.           Inc(Unique);
  69.         end
  70.       else
  71.         Map[I] := Map[I] + 1;
  72.     end;
  73.     Close(InOut);
  74.     Pairs := Map.ToArray();
  75.     Map.Free();
  76.     TArrayHelper<TIntPair>.Sort(
  77.       Pairs,
  78.       TComparer<TIntPair>.Construct(ComparePairs)
  79.     );
  80.     Assign(InOut, outFilename);
  81.     Rewrite(InOut);
  82.     for Pair in Pairs do with Pair do
  83.       {$I-}WriteLn(InOut, Key, ' - ', Value);{$I+}
  84.     Close(InOut);
  85.   end;
  86.  
  87.   procedure SortCountHoward;
  88.   var
  89.     arr: array of Integer;
  90.     textf: TextFile;
  91.     min: Integer = High(Integer);
  92.     max: Integer = -1;
  93.     i: Integer;
  94.   begin
  95.     routineName := {$I %currentroutine%};
  96.     AssignFile(textf, inFilename);
  97.     Reset(textf);
  98.     while not EOF(textf) do
  99.       begin
  100.         {$I-}ReadLn(textf, i);{$I+}
  101.         Inc(Total);
  102.         if i < min then
  103.           min := i;
  104.         if i > max then
  105.           max := i;
  106.       end;
  107.     SetLength(arr, max-min+1);
  108.  
  109.     Reset(textf);
  110.     while not EOF(textf) do
  111.       begin
  112.         ReadLn(textf, i);
  113.         Dec(i, min);
  114.         Inc(arr[i]);
  115.       end;
  116.     CloseFile(textf);
  117.  
  118.     AssignFile(textf, outFilename);
  119.     Rewrite(textf);
  120.     for i := Low(arr) to High(arr) do
  121.       case (arr[i] > 0) of
  122.         True:
  123.           begin
  124.             {$I-}WriteLn(textf, i+min, ' - ', arr[i]); {$I+}
  125.             Inc(Unique);
  126.           end;
  127.       end;
  128.     CloseFile(textf);
  129.     SetLength(arr, 0);
  130.   end;
  131.  
  132.   procedure SortCountAvk1;
  133.   type
  134.     TCounter  = TGHashMultiSetLP<Integer>;
  135.     TCountRef = TGAutoRef<TCounter>;
  136.     TEntry    = TCounter.TEntry;
  137.  
  138.     function EntryCmp(constref L, R: TEntry): SizeInt;
  139.     begin
  140.       if L.Key > R.Key then
  141.         Result := 1
  142.       else
  143.         if L.Key < R.Key then
  144.           Result := -1
  145.         else
  146.           Result := 0;
  147.     end;
  148.  
  149.   var
  150.     CountRef: TCountRef;
  151.     InOut: Text;
  152.     Counter: TCounter;
  153.     e: TEntry;
  154.     I: Integer;
  155.   begin
  156.     routineName := {$I %currentroutine%};
  157.     Counter := CountRef;
  158.     Counter.LoadFactor := 0.7;
  159.     Assign(InOut, inFilename);
  160.     Reset(InOut);
  161.     while not EOF(InOut) do
  162.       begin
  163.         {$I-}ReadLn(InOut, I);{$I+}
  164.         Counter.Add(I);
  165.       end;
  166.     Close(InOut);
  167.     Total := Counter.Count;
  168.     Unique := Counter.EntryCount;
  169.     if Counter.NonEmpty then
  170.       begin
  171.         Assign(InOut, outFilename);
  172.         Rewrite(InOut);
  173.         for e in Counter.Entries.Sorted(EntryCmp) do
  174.           with e do
  175.             {$I-}WriteLn(InOut, Key, ' - ', Count);{$I+}
  176.         Close(InOut);
  177.       end;
  178.   end;
  179.  
  180.   procedure SortCountAvk2;
  181.   var
  182.     List: array of Integer;
  183.     InOut: Text;
  184.     I, J, Count, DupCount: Integer;
  185.   begin
  186.     routineName := {$I %currentroutine%};
  187.     Assign(InOut, inFilename);
  188.     Reset(InOut);
  189.     SetLength(List, 4096);
  190.     I := 0;
  191.     while not EOF(InOut) do
  192.       begin
  193.         {$I-}ReadLn(InOut, J);{$I+}
  194.         Inc(Total);
  195.         if Length(List) = I then
  196.           SetLength(List, I * 2);
  197.         List[I] := J;
  198.         Inc(I);
  199.       end;
  200.     Close(InOut);
  201.     SetLength(List, I);
  202.     if List = nil then
  203.       exit;
  204.     TGOrdinalArrayHelper<Integer>.Sort(List);
  205.     Count := I;
  206.     DupCount := 0;
  207.     I := 0;
  208.     Assign(InOut, outFilename);
  209.     Rewrite(InOut);
  210.     repeat
  211.       J := List[I];
  212.       while (I < Count) and (List[I] = J) do
  213.         begin
  214.           Inc(DupCount);
  215.           Inc(I);
  216.         end;
  217.       {$I-}WriteLn(InOut, J, ' - ', DupCount);{$I+}
  218.       Inc(Unique);
  219.       DupCount := 0;
  220.     until I = Count;
  221.     Close(InOut);
  222.   end;
  223.  
  224.   procedure SortCountJulkas1;
  225.   type
  226.     TIntLess = TLess<LongInt>;
  227.     TIntVect = TVector<LongInt>;
  228.     TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  229.   const
  230.     bsz = 1 shl 17;
  231.   var
  232.     sc: array[0..2147483647 shr 17] of TIntVect;
  233.     i: LongInt;
  234.     pkey, key, cnt: LongInt;
  235.     offset: LongInt;
  236.     InOut: Text;
  237.   begin
  238.     routineName := {$I %currentroutine%};
  239.     for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  240.  
  241.     Assign(InOut, inFilename);
  242.     Reset(InOut);
  243.     while not EOF(InOut) do
  244.       begin
  245.         {$I-}ReadLn(InOut, key);{$I+}
  246.         Inc(Total);
  247.         sc[key shr 17].PushBack(key and $1FFFF);
  248.       end;
  249.     Close(InOut);
  250.  
  251.     Assign(InOut, outFilename);
  252.     Rewrite(InOut);
  253.  
  254.     offset := -bsz;
  255.     for i := Low(sc) to High(sc) do
  256.     begin
  257.       Inc(offset, bsz);
  258.       pkey := -1;
  259.       cnt := 0;
  260.       if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  261.       for key in sc[i] do
  262.       begin
  263.         if pkey <> key then
  264.         begin
  265.           if cnt <> 0 then
  266.           begin
  267.             {$I-}WriteLn(InOut, offset + pkey, ' - ', cnt);{$I+}
  268.             Inc(Unique);
  269.           end;
  270.           pkey := key;
  271.           cnt := 0;
  272.         end;
  273.         Inc(cnt);
  274.       end;
  275.       if cnt <> 0 then
  276.       begin
  277.         {$I-}WriteLn(InOut, offset + pkey, ' - ', cnt);{$I+}
  278.         Inc(Unique);
  279.       end;
  280.     end;
  281.  
  282.     Close(InOut);
  283.     for i := Low(sc) to High(sc) do sc[i].Destroy;
  284.   end;
  285.  
  286.   procedure SortCountJulkas2;
  287.   type
  288.     TIntLess = TLess<LongInt>;
  289.     TIntVect = TVector<LongInt>;
  290.     TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  291.   var
  292.     sc: array[0..21474] of TIntVect;
  293.     i: LongInt;
  294.     pkey, key, cnt: LongInt;
  295.     offset: LongInt;
  296.     InOut: Text;
  297.   begin
  298.     routineName := {$I %currentroutine%};
  299.     for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  300.  
  301.     Assign(InOut, inFilename);
  302.     Reset(InOut);
  303.     while not EOF(InOut) do
  304.       begin
  305.         {$I-}ReadLn(InOut, key);{$I+}
  306.         Inc(Total);
  307.         sc[key div 100000].PushBack(key mod 100000);
  308.       end;
  309.     Close(InOut);
  310.  
  311.     Assign(InOut, outFilename);
  312.     Rewrite(InOut);
  313.  
  314.     for i := Low(sc) to High(sc) do if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  315.  
  316.     offset := -100000;
  317.     for i := Low(sc) to High(sc) do
  318.     begin
  319.       Inc(offset, 100000);
  320.       pkey := -1;
  321.       cnt := 0;
  322.       for key in sc[i] do
  323.       begin
  324.         if pkey <> key then
  325.         begin
  326.           if cnt <> 0 then
  327.           begin
  328.             {$I-}WriteLn(InOut, offset + pkey, ' - ', cnt);{$I+}
  329.             Inc(Unique);
  330.           end;
  331.           pkey := key;
  332.           cnt := 0;
  333.         end;
  334.         Inc(cnt);
  335.       end;
  336.       if cnt <> 0 then
  337.       begin
  338.         {$I-}WriteLn(InOut, offset + pkey, ' - ', cnt);{$I+}
  339.         Inc(Unique);
  340.       end;
  341.     end;
  342.  
  343.     Close(InOut);
  344.     for i := Low(sc) to High(sc) do
  345.       sc[i].Destroy;
  346.   end;
  347.  
  348.   procedure SortCount440bx;
  349.   begin
  350.     routineName := {$I %currentroutine%};
  351.     WinSortCount3.DataCount := 0;
  352.     WinSortCount3.Unique := 0;
  353.     WinSortCount3.InFileName := inFilename;
  354.     WinSortCount3.OutFileName := outFilename;
  355.     WinSortCount3.SortCount;
  356.     Total := WinSortCount3.DataCount;
  357.     Unique := WinSortCount3.Unique;
  358.   end;
  359.  
  360.   procedure Run(aProc: TProcedure);
  361.   begin
  362.     Total := 0;
  363.     Unique := 0;
  364.     Start := Now;
  365.     try
  366.       aProc();
  367.       WriteLn(Copy(routineName, 10, 20):7,'''s time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4,' #unique: ',Unique,' #total: ',Total);
  368.     except
  369.       on e: Exception do
  370.         WriteLn('crashes with message "', e.Message, '"');
  371.     end;
  372.   end;
  373.  
  374. begin
  375.   Randomize;
  376.  
  377.   procedures := TProcedureArray.Create(
  378.                    SortCountJulkas1, SortCountJulkas2, SortCountAkira, SortCountHoward, SortCountAvk1,
  379.                    SortCountAvk2, SortCount440bx);
  380.  
  381.   for randomrange := 1 to 10 do
  382.     begin
  383.       GenerateData(randomrange);
  384.       WriteLn(#10'RandomRange = ',randomrange);
  385.       for proc in procedures do
  386.         Run(proc);
  387.     end;
  388.  
  389.   for repeatCount := 1 to 10 do
  390.     begin
  391.       GenerateData(8, 2*repeatCount);
  392.       WriteLn(#10'repeatMillionsCount = ', 2*repeatCount);
  393.       for proc in procedures do
  394.         Run(proc);
  395.     end;
  396. end.
  397.  

output:
Code: Text  [Select][+][-]
  1.  
  2. RandomRange = 1
  3. Julkas1's time: 0.7800 #unique: 100000 #total: 2000000
  4. Julkas2's time: 0.7650 #unique: 100000 #total: 2000000
  5.   Akira's time: 0.7170 #unique: 100000 #total: 2000000
  6.  Howard's time: 0.9830 #unique: 100000 #total: 2000000
  7.    Avk1's time: 0.6400 #unique: 100000 #total: 2000000
  8.    Avk2's time: 0.4990 #unique: 100000 #total: 2000000
  9.   440bx's time: 0.4990 #unique: 100000 #total: 2000000
  10.  
  11. RandomRange = 2
  12. Julkas1's time: 0.7800 #unique: 199994 #total: 2000000
  13. Julkas2's time: 0.8270 #unique: 199994 #total: 2000000
  14.   Akira's time: 0.8730 #unique: 199994 #total: 2000000
  15.  Howard's time: 1.0450 #unique: 199994 #total: 2000000
  16.    Avk1's time: 0.7650 #unique: 199994 #total: 2000000
  17.    Avk2's time: 0.5300 #unique: 199994 #total: 2000000
  18.   440bx's time: 0.5460 #unique: 199994 #total: 2000000
  19.  
  20. RandomRange = 3
  21. Julkas1's time: 0.8270 #unique: 299625 #total: 2000000
  22. Julkas2's time: 0.8210 #unique: 299625 #total: 2000000
  23.   Akira's time: 0.9750 #unique: 299625 #total: 2000000
  24.  Howard's time: 1.0570 #unique: 299625 #total: 2000000
  25.    Avk1's time: 0.7830 #unique: 299625 #total: 2000000
  26.    Avk2's time: 0.5650 #unique: 299625 #total: 2000000
  27.   440bx's time: 0.5920 #unique: 299625 #total: 2000000
  28.  
  29. RandomRange = 4
  30. Julkas1's time: 0.8330 #unique: 397259 #total: 2000000
  31. Julkas2's time: 0.8300 #unique: 397259 #total: 2000000
  32.   Akira's time: 1.0660 #unique: 397259 #total: 2000000
  33.  Howard's time: 1.0970 #unique: 397259 #total: 2000000
  34.    Avk1's time: 0.8480 #unique: 397259 #total: 2000000
  35.    Avk2's time: 0.5800 #unique: 397259 #total: 2000000
  36.   440bx's time: 0.6270 #unique: 397259 #total: 2000000
  37.  
  38. RandomRange = 5
  39. Julkas1's time: 0.8630 #unique: 490899 #total: 2000000
  40. Julkas2's time: 0.8600 #unique: 490899 #total: 2000000
  41.   Akira's time: 1.1640 #unique: 490899 #total: 2000000
  42.  Howard's time: 1.1590 #unique: 490899 #total: 2000000
  43.    Avk1's time: 0.9890 #unique: 490899 #total: 2000000
  44.    Avk2's time: 0.6720 #unique: 490899 #total: 2000000
  45.   440bx's time: 0.6890 #unique: 490899 #total: 2000000
  46.  
  47. RandomRange = 6
  48. Julkas1's time: 0.9110 #unique: 578653 #total: 2000000
  49. Julkas2's time: 0.9180 #unique: 578653 #total: 2000000
  50.   Akira's time: 1.3110 #unique: 578653 #total: 2000000
  51.  Howard's time: 1.2000 #unique: 578653 #total: 2000000
  52.    Avk1's time: 0.9590 #unique: 578653 #total: 2000000
  53.    Avk2's time: 0.6510 #unique: 578653 #total: 2000000
  54.   440bx's time: 0.6900 #unique: 578653 #total: 2000000
  55.  
  56. RandomRange = 7
  57. Julkas1's time: 0.9810 #unique: 659394 #total: 2000000
  58. Julkas2's time: 0.9070 #unique: 659394 #total: 2000000
  59.   Akira's time: 1.3500 #unique: 659394 #total: 2000000
  60.  Howard's time: 1.2490 #unique: 659394 #total: 2000000
  61.    Avk1's time: 0.9820 #unique: 659394 #total: 2000000
  62.    Avk2's time: 0.7210 #unique: 659394 #total: 2000000
  63.   440bx's time: 0.6840 #unique: 659394 #total: 2000000
  64.  
  65. RandomRange = 8
  66. Julkas1's time: 0.9530 #unique: 734160 #total: 2000000
  67. Julkas2's time: 0.9200 #unique: 734160 #total: 2000000
  68.   Akira's time: 1.4140 #unique: 734160 #total: 2000000
  69.  Howard's time: 1.3470 #unique: 734160 #total: 2000000
  70.    Avk1's time: 1.0500 #unique: 734160 #total: 2000000
  71.    Avk2's time: 0.7050 #unique: 734160 #total: 2000000
  72.   440bx's time: 0.6900 #unique: 734160 #total: 2000000
  73.  
  74. RandomRange = 9
  75. Julkas1's time: 0.9410 #unique: 802680 #total: 2000000
  76. Julkas2's time: 0.9270 #unique: 802680 #total: 2000000
  77.   Akira's time: 1.5210 #unique: 802680 #total: 2000000
  78.  Howard's time: 1.2700 #unique: 802680 #total: 2000000
  79.    Avk1's time: 1.0660 #unique: 802680 #total: 2000000
  80.    Avk2's time: 0.7350 #unique: 802680 #total: 2000000
  81.   440bx's time: 0.7080 #unique: 802680 #total: 2000000
  82.  
  83. RandomRange = 10
  84. Julkas1's time: 0.9380 #unique: 864917 #total: 2000000
  85. Julkas2's time: 0.9580 #unique: 864917 #total: 2000000
  86.   Akira's time: 1.5790 #unique: 864917 #total: 2000000
  87.  Howard's time: 1.3090 #unique: 864917 #total: 2000000
  88.    Avk1's time: 1.1260 #unique: 864917 #total: 2000000
  89.    Avk2's time: 0.7490 #unique: 864917 #total: 2000000
  90.   440bx's time: 0.7210 #unique: 864917 #total: 2000000
  91.  
  92. repeatMillionsCount = 2
  93. Julkas1's time: 0.9690 #unique: 734280 #total: 2000000
  94. Julkas2's time: 0.9640 #unique: 734280 #total: 2000000
  95.   Akira's time: 1.4350 #unique: 734280 #total: 2000000
  96.  Howard's time: 1.3070 #unique: 734280 #total: 2000000
  97.    Avk1's time: 1.0430 #unique: 734280 #total: 2000000
  98.    Avk2's time: 0.7050 #unique: 734280 #total: 2000000
  99.   440bx's time: 0.6900 #unique: 734280 #total: 2000000
  100.  
  101. repeatMillionsCount = 4
  102. Julkas1's time: 1.6610 #unique: 794661 #total: 4000000
  103. Julkas2's time: 1.6720 #unique: 794661 #total: 4000000
  104.   Akira's time: 2.2260 #unique: 794661 #total: 4000000
  105.  Howard's time: 2.4500 #unique: 794661 #total: 4000000
  106.    Avk1's time: 1.7420 #unique: 794661 #total: 4000000
  107.    Avk2's time: 1.1830 #unique: 794661 #total: 4000000
  108.   440bx's time: 1.2800 #unique: 794661 #total: 4000000
  109.  
  110. repeatMillionsCount = 6
  111. Julkas1's time: 2.3830 #unique: 799556 #total: 6000000
  112. Julkas2's time: 2.3840 #unique: 799556 #total: 6000000
  113.   Akira's time: 2.9000 #unique: 799556 #total: 6000000
  114.  Howard's time: 3.2960 #unique: 799556 #total: 6000000
  115.    Avk1's time: 2.3930 #unique: 799556 #total: 6000000
  116.    Avk2's time: 1.6810 #unique: 799556 #total: 6000000
  117.   440bx's time: 1.8210 #unique: 799556 #total: 6000000
  118.  
  119. repeatMillionsCount = 8
  120. Julkas1's time: 3.1210 #unique: 799967 #total: 8000000
  121. Julkas2's time: 3.1040 #unique: 799967 #total: 8000000
  122.   Akira's time: 3.6550 #unique: 799967 #total: 8000000
  123.  Howard's time: 4.3260 #unique: 799967 #total: 8000000
  124.    Avk1's time: 3.0250 #unique: 799967 #total: 8000000
  125.    Avk2's time: 2.1500 #unique: 799967 #total: 8000000
  126.   440bx's time: 2.3370 #unique: 799967 #total: 8000000
  127.  
  128. repeatMillionsCount = 10
  129. Julkas1's time: 3.8410 #unique: 799997 #total: 10000000
  130. Julkas2's time: 3.8310 #unique: 799997 #total: 10000000
  131.   Akira's time: 4.3570 #unique: 799997 #total: 10000000
  132.  Howard's time: 5.3620 #unique: 799997 #total: 10000000
  133.    Avk1's time: 3.6840 #unique: 799997 #total: 10000000
  134.    Avk2's time: 2.6500 #unique: 799997 #total: 10000000
  135.   440bx's time: 2.8860 #unique: 799997 #total: 10000000
  136.  
  137. repeatMillionsCount = 12
  138. Julkas1's time: 4.5800 #unique: 799999 #total: 12000000
  139. Julkas2's time: 4.5300 #unique: 799999 #total: 12000000
  140.   Akira's time: 5.1050 #unique: 799999 #total: 12000000
  141.  Howard's time: 6.3740 #unique: 799999 #total: 12000000
  142.    Avk1's time: 4.4770 #unique: 799999 #total: 12000000
  143.    Avk2's time: 3.2540 #unique: 799999 #total: 12000000
  144.   440bx's time: 3.4680 #unique: 799999 #total: 12000000
  145.  
  146. repeatMillionsCount = 14
  147. Julkas1's time: 5.3170 #unique: 800000 #total: 14000000
  148. Julkas2's time: 5.3510 #unique: 800000 #total: 14000000
  149.   Akira's time: 6.1320 #unique: 800000 #total: 14000000
  150.  Howard's time: 7.5900 #unique: 800000 #total: 14000000
  151.    Avk1's time: 5.2320 #unique: 800000 #total: 14000000
  152.    Avk2's time: 3.5410 #unique: 800000 #total: 14000000
  153.   440bx's time: 3.9310 #unique: 800000 #total: 14000000
  154.  
  155. repeatMillionsCount = 16
  156. Julkas1's time: 6.4290 #unique: 800000 #total: 16000000
  157. Julkas2's time: 6.0400 #unique: 800000 #total: 16000000
  158.   Akira's time: 6.5250 #unique: 800000 #total: 16000000
  159.  Howard's time: 8.4090 #unique: 800000 #total: 16000000
  160.    Avk1's time: 5.5970 #unique: 800000 #total: 16000000
  161.    Avk2's time: 3.9940 #unique: 800000 #total: 16000000
  162.   440bx's time: 4.4460 #unique: 800000 #total: 16000000
  163.  
  164. repeatMillionsCount = 18
  165. Julkas1's time: 6.7460 #unique: 800000 #total: 18000000
  166. Julkas2's time: 6.7230 #unique: 800000 #total: 18000000
  167.   Akira's time: 7.2390 #unique: 800000 #total: 18000000
  168.  Howard's time: 9.3290 #unique: 800000 #total: 18000000
  169.    Avk1's time: 6.4050 #unique: 800000 #total: 18000000
  170.    Avk2's time: 4.8370 #unique: 800000 #total: 18000000
  171.   440bx's time: 5.3630 #unique: 800000 #total: 18000000
  172.  
  173. repeatMillionsCount = 20
  174. Julkas1's time: 7.6870 #unique: 800000 #total: 20000000
  175. Julkas2's time: 7.5800 #unique: 800000 #total: 20000000
  176.   Akira's time: 8.2640 #unique: 800000 #total: 20000000
  177.  Howard's time: 10.4280 #unique: 800000 #total: 20000000
  178.    Avk1's time: 6.8850 #unique: 800000 #total: 20000000
  179.    Avk2's time: 4.9860 #unique: 800000 #total: 20000000
  180.   440bx's time: 5.5030 #unique: 800000 #total: 20000000
  181.  
« Last Edit: July 21, 2019, 08:20:25 am by avk »

jamie

  • Hero Member
  • *****
  • Posts: 6077
Re: Sorting and Counting
« Reply #73 on: July 21, 2019, 06:54:42 pm »
I would like to get in on this adventure myself  :D

However, you guys wouldn't like my version because it would involve using Assembler code.  >:D
The only true wisdom is knowing you know nothing

julkas

  • Guest
Re: Sorting and Counting
« Reply #74 on: July 21, 2019, 07:17:42 pm »
I would like to get in on this adventure myself  :D

However, you guys wouldn't like my version because it would involve using Assembler code.  >:D
Go on. You are welcome.  ;)

 

TinyPortal © 2005-2018