Recent

Author Topic: Sorting and Counting  (Read 8461 times)

440bx

  • Hero Member
  • *****
  • Posts: 1269
Re: Sorting and Counting
« Reply #90 on: July 24, 2019, 12:05:14 am »
They do look a bit strange until you carefully analyze the algorithm's implementation and the data it has to handle.
But for large n it's better.
Yes, provided that the range is reasonably close to n.  As the ratio of range/n increases, a radix sort suffers.

And the algorithm can be improved. Since memory is still allocated a lot and we know the data format, it is better to set the minimum and maximum as constants.
True, the concern is, once the algorithm uses knowledge about the data format it didn't determine itself, the algorithm implementation may lose generality.
« Last Edit: July 24, 2019, 12:12:56 am by 440bx »
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

avk

  • Full Member
  • ***
  • Posts: 163
    • my self-education project
Re: Sorting and Counting
« Reply #91 on: July 31, 2019, 04:18:25 am »
I replaced fcl-stl TVector with generics.collections TList in my algo. I don't know why TList gives very poor performance.
Just curious how much performance has degraded?

... I believe Avk2 uses an Introsort.
All LGArrayHelpers sorting algorithms are hybrid, in particular TGOrdinalArrayHelper.Sort tries to use
Counting Sort, if possible, otherwise uses Introsort. 

... But it is impossible to check, @avk did not provide the project.
Let me guess, the project of which you always mention this is LGenerics? If so, see attachment.

avk

  • Full Member
  • ***
  • Posts: 163
    • my self-education project
Re: Sorting and Counting
« Reply #92 on: July 31, 2019, 06:23:04 am »
Just in case, I decided to check the coincidence of the results of the existing solutions.
A curious fact, all coincide, except SortCount440bx.

...So my fcl-stl TVector solution is better than Akira's generics.collections TDictionary...
This is highly dependent on the input data.

440bx

  • Hero Member
  • *****
  • Posts: 1269
Re: Sorting and Counting
« Reply #93 on: July 31, 2019, 06:44:57 am »
A curious fact, all coincide, except SortCount440bx.
that's because of the compare function.  The number of instances of each number is correct but, the collating sequence is different than what is obtained in a numerical comparison.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

avk

  • Full Member
  • ***
  • Posts: 163
    • my self-education project
Re: Sorting and Counting
« Reply #94 on: July 31, 2019, 06:56:08 am »
@440bx, thank you, now I understand the reason.
« Last Edit: August 01, 2019, 06:58:36 am by avk »

hnb

  • Sr. Member
  • ****
  • Posts: 268
Re: Sorting and Counting
« Reply #95 on: July 31, 2019, 09:13:58 am »
Hello,

First : rtl-generics TDictionary seems like strange choice. rtl-generics has better collection for such purpose (sort + hash map). IMO should be used TSortedHashSet<T> with custom AComparer and AEqualityComparer, with using Capacity property.

Secondly: @Akira version for rtl-generics TDictionary is not optimized and has many redundant calls (also in second @avk test the Capacity is disabled which can decrease performance):


Code: Pascal  [Select]
  1.       if not Map.ContainsKey(I) then
  2.         begin
  3.           Map.Add(I, 1);
  4.           Inc(Unique);
  5.         end
  6.       else
  7.         Map[I] := Map[I] + 1;
  8.  

such code can be optimized, it can be reduced to something like this:

Code: Pascal  [Select]
  1. var
  2.   Counters: array of LongInt;
  3.   P: PLongInt;
  4.   Map: TDictionary<LongInt, PLongInt>;
  5.  
  6. ...
  7.  
  8.       if not Map.TryGetValue(I, P) then
  9.         begin
  10.           P := @Counters[Unique];
  11.           Map.Add(I, P);
  12.           Inc(Unique);
  13.         end;
  14.       Inc(P^);
  15.  

Above code will be much faster. The "reads/lookups" from dictionary is reduced to one, or to two during adding instead of 3 and 4 "reads/lookups".

Thirdly: for many reads from dictionary worth to test other dictionary from rtl-generics : TCuckooD2<TKey, TValue> / TFastHashMap<TKey, TValue>. Cuckoo should be faster for such case.
« Last Edit: August 01, 2019, 10:27:27 am by hnb »
Checkout NewPascal initiative and donate beer - ready to use tuned FPC compiler + Lazarus for mORMot project

best regards,
Maciej Izak

julkas

  • Sr. Member
  • ****
  • Posts: 432
  • KISS principle / Lazarus 2.0.0 / FPC 3.0.4
Re: Sorting and Counting
« Reply #96 on: July 31, 2019, 09:38:50 am »
I replaced fcl-stl TVector with generics.collections TList in my algo. I don't know why TList gives very poor performance.
Just curious how much performance has degraded?
Hello @avk, @hnb ! (See - https://forum.lazarus.freepascal.org/index.php/topic,46254.0.html)
TList Code
Code: Pascal  [Select]
  1. program sc2;
  2. {$mode delphi}
  3.  
  4. uses SysUtils, Classes,
  5.     //gvector, gutil, garrayutils;
  6.     Generics.Defaults, Generics.Collections;
  7. const
  8.   keyNum = 10000000;
  9.   blckSize = 100000;
  10. //type
  11.   //TIntLess = TLess<LongInt>;
  12.   //TIntVect = TVector<LongInt>;
  13.   //TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  14.  
  15. var
  16.   //sc: array[0..2147483647 div blckSize] of TIntVect;
  17.   sc: array[0..2147483647 div blckSize] of Generics.Collections.TList<LongInt>;
  18.   i: LongInt;
  19.   pkey, key, cnt, uniq: LongInt;
  20.   offset: LongInt;
  21.   start: QWord;
  22.   outFile: Text;
  23.  
  24. begin
  25.   start := GetTickCount64();
  26.   //for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  27.   for i := Low(sc) to High(sc) do sc[i] := TList<LongInt>.Create;
  28.  
  29.   for i := 0 to keyNum do
  30.   begin
  31.     key := Random(2147483647);
  32.     //sc[key div blckSize].PushBack(key mod blckSize);
  33.     sc[key div 100000].Add(key mod 100000);
  34.   end;
  35.   WriteLn('Populated (ticks) - ', GetTickCount64() - start);
  36.  
  37.   Assign(outFile, 'out.txt');
  38.   Rewrite(outFile);
  39.  
  40.   //for i := Low(sc) to High(sc) do if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  41.   for i := Low(sc) to High(sc) do if sc[i].Count > 1 then sc[i].Sort();
  42.   offset := -blckSize;
  43.   uniq := 0;
  44.   for i := Low(sc) to High(sc) do
  45.   begin
  46.     Inc(offset, blckSize);
  47.     pkey := -1;
  48.     cnt := 0;
  49.     for key in sc[i] do
  50.     begin
  51.       if pkey <> key then
  52.       begin
  53.         if cnt <> 0 then
  54.         begin
  55.           WriteLn(outFile, offset + pkey, ' - ', cnt);
  56.           Inc(uniq);
  57.         end;
  58.         pkey := key;
  59.         cnt := 0;
  60.       end;
  61.       Inc(cnt);
  62.     end;
  63.     if cnt <> 0 then
  64.     begin
  65.     WriteLn(outFile, offset + pkey, ' - ', cnt);
  66.     Inc(uniq);
  67.     end;
  68.   end;
  69.  
  70.   Close(outFile);
  71.   for i := Low(sc) to High(sc) do
  72.     sc[i].Destroy;
  73.  
  74.   WriteLn('Total (ticks) - ', GetTickCount64() - start);
  75.   WriteLn('Uniq keys - ', uniq, ', out of - ', keyNum);
  76.   ReadLn;
  77. end.
  78.  
TList output -
Code: Text  [Select]
  1. Populated (ticks) - 3282
  2. Total (ticks) - 7078
  3. Uniq keys - 9976566, out of - 10000000
TVector output -
Code: Text  [Select]
  1. Populated (ticks) - 1328
  2. Total (ticks) - 4718
  3. Uniq keys - 9976566, out of - 10000000
« Last Edit: July 31, 2019, 10:56:31 am by julkas »
procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
  mov rax, a
  mov rdx, b
  mul rdx
  mov [clo], rax
  mov [chi], rdx
end;

avk

  • Full Member
  • ***
  • Posts: 163
    • my self-education project
Re: Sorting and Counting
« Reply #97 on: July 31, 2019, 11:27:14 am »
@hnb, you are welcome :)
@julkas, I believe that the slowdown is mainly due to notifications and slower sorting in Generics.Collections.TList. The last time I compared, TArrayHelper.Sort was 1.5-1.6 times slower than TOrderingArrayUtils.Sort.
« Last Edit: July 31, 2019, 11:34:34 am by avk »

BrunoK

  • Full Member
  • ***
  • Posts: 191
  • Retired programmer
Re: Sorting and Counting
« Reply #98 on: July 31, 2019, 05:13:28 pm »
Let's go with my entry, if it works and I have correctly solved the initial question.
All very traditional object pascal.
Code: Pascal  [Select]
  1.  
  2.   procedure SortCountBrunoK;  { Note : requires Classes }
  3.   const
  4.     cCR = $0D;
  5.     c0 = Ord('0');
  6.     function LoadStreamToList(aMemStream: TMemoryStream; aList: TFPList): integer;
  7.     var
  8.       { Parse lines }
  9.       lPByte, lPEndByte: PByte;
  10.  
  11.       { Values extraction }
  12.       lPByteTextStart: PByte = nil;
  13.       lValueStarted: boolean = False;
  14.       lDWORD: DWORD;
  15.  
  16.       lCntRec: integer;
  17.     begin
  18.       { Prepare aList }
  19.       lCntRec := aMemStream.Size;
  20.       if lCntRec <= 0 then // Stream empty ?
  21.         exit(0);
  22.       aList.Count := lCntRec div 10; // Setup approximative size
  23.       aList.Count := 0;
  24.  
  25.       lPByte := PByte(aMemStream.memory);
  26.       lPEndByte := lPByte + aMemStream.Size;
  27.  
  28.       while lPByte <= lPEndByte do begin
  29.         if (lPByte = lPEndByte) or (lPByte^ <= cCR) then begin
  30.           if lValueStarted then begin
  31.             lDWORD := 0;
  32.             while lPByteTextStart < lPByte do begin
  33.               lDWORD := lDWORD * 10 + lPByteTextStart^ - c0;
  34.               Inc(lPByteTextStart);
  35.             end;
  36.             aList.Add(Pointer(lDWORD));
  37.             lValueStarted := False;
  38.           end;
  39.         end
  40.         else if not lValueStarted then begin
  41.           lPByteTextStart := lPByte;
  42.           lValueStarted := True;
  43.         end;
  44.         Inc(lPByte);
  45.       end;
  46.       Result := aList.Count;
  47.     end;
  48.  
  49.     function BkCompare(Item1, Item2: Pointer): integer;
  50.     begin
  51.       Result := 1;
  52.       if Item1 < Item2 then
  53.         Result := -1
  54.       else if Item1 = Item2 then
  55.         Result := 0;
  56.     end;
  57.  
  58.   var
  59.     lFile: TextFile;
  60.     lMemStream: TMemoryStream;
  61.     lNbRecs: integer = 0;
  62.     lFPList: TFPList;
  63.     lIx: integer;
  64.     lLastValue: pointer;
  65.     lListCount: integer;
  66.     lLastValueCount: integer;
  67.   begin
  68.     routineName := 'SortCountBrunoK'; // {$I %currentroutine%};
  69.     lMemStream := TMemoryStream.Create;
  70.     lMemStream.LoadFromFile(inFileName);
  71.     lFPList := TFPList.Create;
  72.     lNbRecs := LoadStreamToList(lMemStream, lFPList);
  73.     lMemStream.Free; // Not needed anymore
  74.     if lNbRecs > 0 then begin
  75.       AssignFile(lFile, outFilename);
  76.       Rewrite(lFile);
  77.       lFPList.Sort(@BkCompare);
  78.       lIx := 0;
  79.       lLastValue := lFPList[lIx];
  80.       lLastValueCount := 1;
  81.       lListCount := lFPList.Count;
  82.       repeat
  83.         Inc(lIx);
  84.         if (lIx >= lListCount) or (lFPList[lIx] <> lLastValue) then begin
  85.           Inc(unique);
  86.           WriteLn(lFile, UINTPTR(lLastValue), ' - ', lLastValueCount);
  87.           if (lIx >= lListCount) then
  88.             Break;
  89.           lLastValue := lFPList[lIx];
  90.           lLastValueCount := 1;
  91.         end
  92.         else
  93.           Inc(lLastValueCount);
  94.       until False;
  95.       CloseFile(lFile);
  96.       Total := lIx;
  97.     end;
  98.     lFPList.Free;
  99.   end;
  100.  
If correct it runs 20% faster on 64 bit relative to 32 bit.
Lazarus trunk r. 62137/27.10.2019 (+/- patches regarding TScrollBar, IntitalSetupDialog, Options.Environment options, SearchResults).  Lazarus 3.0.6 raw from svn.
FPC 3.0.4 32 bits. (+heaptrc with leaked ClassName+Revisited TList) , Windows 10 Pro x64 (v. 1903 / 18362.418)

avk

  • Full Member
  • ***
  • Posts: 163
    • my self-education project
Re: Sorting and Counting
« Reply #99 on: August 01, 2019, 06:49:03 am »
I have added your code to the benchmark:
Code: Pascal  [Select]
  1. program OccurrenceCounter;
  2.  
  3. {$mode delphi}
  4. {$ImplicitExceptions Off}
  5. {$MODESWITCH NESTEDPROCVARS}
  6.  
  7. uses
  8.   Classes, SysUtils, DateUtils,
  9.   Generics.Defaults, Generics.Collections,
  10.   LGUtils, LGHashMultiSet, LGArrayHelpers,
  11.   gutil, garrayutils, gvector, gmap,
  12.   WinSortCount3;
  13.  
  14. type
  15.   TIntPair = TPair<LongInt, LongInt>;
  16.   TProcedureArray = array of procedure;
  17.  
  18.   function ComparePairs(constref L, R: TIntPair): LongInt;
  19.   begin
  20.     if L.Key < R.Key then
  21.       Result := -1
  22.     else if L.Key = R.Key then
  23.       Result := 0
  24.     else
  25.       Result := 1;
  26.   end;
  27.  
  28. var
  29.   Total, Unique, repeatCount, randomrange: Integer;
  30.   Start: TDateTime;
  31.   inFilename: String = 'data.txt';
  32.   outFilename: String = 'sorted.txt';
  33.   routineName: String;
  34.   procedures: TProcedureArray;
  35.   proc: procedure;
  36.  
  37.   procedure GenerateData(randomRange: Integer=8; repeatMillionsCount: Integer=2);
  38.   var
  39.     InFile: Text;
  40.     I: LongInt;
  41.   begin
  42.     Assign(InFile, inFilename);
  43.     Rewrite(InFile);
  44.     for I := 1 to repeatMillionsCount * 1000000 do
  45.       WriteLn(InFile, 1500000000 + Random(randomRange * 100000));
  46.     Close(InFile);
  47.   end;
  48.  
  49.   procedure SortCountAkira;
  50.   var
  51.     I: LongInt;
  52.     InOut: Text;
  53.     Map: TDictionary<LongInt, LongInt>;
  54.     Pair: TIntPair;
  55.     Pairs: TArray<TIntPair>;
  56.   begin
  57.     routineName := {$I %currentroutine%};
  58.     Map := TDictionary<LongInt, LongInt>.Create();
  59.     //Map.Capacity := 10000000;
  60.     Assign(InOut, inFilename);
  61.     Reset(InOut);
  62.     while not EOF(InOut) do begin
  63.       ReadLn(InOut, I);
  64.       Inc(Total);
  65.       if not Map.ContainsKey(I) then
  66.         begin
  67.           Map.Add(I, 1);
  68.           Inc(Unique);
  69.         end
  70.       else
  71.         Map[I] := Map[I] + 1;
  72.     end;
  73.     Close(InOut);
  74.     Pairs := Map.ToArray();
  75.     Map.Free();
  76.     TArrayHelper<TIntPair>.Sort(
  77.       Pairs,
  78.       TComparer<TIntPair>.Construct(ComparePairs)
  79.     );
  80.     Assign(InOut, outFilename);
  81.     Rewrite(InOut);
  82.     for Pair in Pairs do with Pair do
  83.       WriteLn(InOut, Key, ' - ', Value);
  84.     Close(InOut);
  85.   end;
  86.  
  87.   procedure SortCountHoward;
  88.   var
  89.     arr: array of Integer;
  90.     textf: TextFile;
  91.     min: Integer = High(Integer);
  92.     max: Integer = -1;
  93.     i: Integer;
  94.   begin
  95.     routineName := {$I %currentroutine%};
  96.     AssignFile(textf, inFilename);
  97.     Reset(textf);
  98.     while not EOF(textf) do
  99.       begin
  100.         ReadLn(textf, i);
  101.         Inc(Total);
  102.         if i < min then
  103.           min := i;
  104.         if i > max then
  105.           max := i;
  106.       end;
  107.     SetLength(arr, max-min+1);
  108.  
  109.     Reset(textf);
  110.     while not EOF(textf) do
  111.       begin
  112.         ReadLn(textf, i);
  113.         Dec(i, min);
  114.         Inc(arr[i]);
  115.       end;
  116.     CloseFile(textf);
  117.  
  118.     AssignFile(textf, outFilename);
  119.     Rewrite(textf);
  120.     for i := Low(arr) to High(arr) do
  121.       case (arr[i] > 0) of
  122.         True:
  123.           begin
  124.             WriteLn(textf, i+min, ' - ', arr[i]);
  125.             Inc(Unique);
  126.           end;
  127.       end;
  128.     CloseFile(textf);
  129.     SetLength(arr, 0);
  130.   end;
  131.  
  132.   procedure SortCountAvk1;
  133.   type
  134.     TCounter  = TGHashMultiSetLP<Integer>;
  135.     TCountRef = TGAutoRef<TCounter>;
  136.     TEntry    = TCounter.TEntry;
  137.  
  138.     function EntryCmp(constref L, R: TEntry): SizeInt;
  139.     begin
  140.       if L.Key > R.Key then
  141.         Result := 1
  142.       else
  143.         if L.Key < R.Key then
  144.           Result := -1
  145.         else
  146.           Result := 0;
  147.     end;
  148.  
  149.   var
  150.     CountRef: TCountRef;
  151.     InOut: Text;
  152.     Counter: TCounter;
  153.     e: TEntry;
  154.     I: Integer;
  155.   begin
  156.     routineName := {$I %currentroutine%};
  157.     Counter := CountRef;
  158.     //Counter.LoadFactor := 0.7;
  159.     Assign(InOut, inFilename);
  160.     Reset(InOut);
  161.     while not EOF(InOut) do
  162.       begin
  163.         ReadLn(InOut, I);
  164.         Counter.Add(I);
  165.       end;
  166.     Close(InOut);
  167.     Total := Counter.Count;
  168.     Unique := Counter.EntryCount;
  169.     if Counter.NonEmpty then
  170.       begin
  171.         Assign(InOut, outFilename);
  172.         Rewrite(InOut);
  173.         for e in Counter.Entries.Sorted(EntryCmp) do
  174.           with e do
  175.             WriteLn(InOut, Key, ' - ', Count);
  176.         Close(InOut);
  177.       end;
  178.   end;
  179.  
  180.   procedure SortCountAvk2;
  181.   var
  182.     List: array of Integer;
  183.     InOut: Text;
  184.     I, J, Count, DupCount: Integer;
  185.   begin
  186.     routineName := {$I %currentroutine%};
  187.     Assign(InOut, inFilename);
  188.     Reset(InOut);
  189.     SetLength(List, 262144);
  190.     I := 0;
  191.     while not EOF(InOut) do
  192.       begin
  193.         ReadLn(InOut, J);
  194.         Inc(Total);
  195.         if Length(List) = I then
  196.           SetLength(List, I * 2);
  197.         List[I] := J;
  198.         Inc(I);
  199.       end;
  200.     Close(InOut);
  201.     SetLength(List, I);
  202.     if List = nil then
  203.       exit;
  204.     TGOrdinalArrayHelper<Integer>.Sort(List);
  205.     Count := I;
  206.     DupCount := 0;
  207.     I := 0;
  208.     Assign(InOut, outFilename);
  209.     Rewrite(InOut);
  210.     repeat
  211.       J := List[I];
  212.       while (I < Count) and (List[I] = J) do
  213.         begin
  214.           Inc(DupCount);
  215.           Inc(I);
  216.         end;
  217.       WriteLn(InOut, J, ' - ', DupCount);
  218.       Inc(Unique);
  219.       DupCount := 0;
  220.     until I = Count;
  221.     Close(InOut);
  222.   end;
  223.  
  224.   procedure SortCountJulkas1;
  225.   type
  226.     TIntLess = TLess<LongInt>;
  227.     TIntVect = TVector<LongInt>;
  228.     TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  229.   const
  230.     bsz = 1 shl 17;
  231.   var
  232.     sc: array[0..2147483647 shr 17] of TIntVect;
  233.     i: LongInt;
  234.     pkey, key, cnt: LongInt;
  235.     offset: LongInt;
  236.     InOut: Text;
  237.   begin
  238.     routineName := {$I %currentroutine%};
  239.     for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  240.  
  241.     Assign(InOut, inFilename);
  242.     Reset(InOut);
  243.     while not EOF(InOut) do
  244.       begin
  245.         ReadLn(InOut, key);
  246.         Inc(Total);
  247.         sc[key shr 17].PushBack(key and $1FFFF);
  248.       end;
  249.     Close(InOut);
  250.  
  251.     Assign(InOut, outFilename);
  252.     Rewrite(InOut);
  253.  
  254.     offset := -bsz;
  255.     for i := Low(sc) to High(sc) do
  256.     begin
  257.       Inc(offset, bsz);
  258.       pkey := -1;
  259.       cnt := 0;
  260.       if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  261.       for key in sc[i] do
  262.       begin
  263.         if pkey <> key then
  264.         begin
  265.           if cnt <> 0 then
  266.           begin
  267.             WriteLn(InOut, offset + pkey, ' - ', cnt);
  268.             Inc(Unique);
  269.           end;
  270.           pkey := key;
  271.           cnt := 0;
  272.         end;
  273.         Inc(cnt);
  274.       end;
  275.       if cnt <> 0 then
  276.       begin
  277.         WriteLn(InOut, offset + pkey, ' - ', cnt);
  278.         Inc(Unique);
  279.       end;
  280.     end;
  281.  
  282.     Close(InOut);
  283.     for i := Low(sc) to High(sc) do sc[i].Destroy;
  284.   end;
  285.  
  286.   procedure SortCountJulkas2;
  287.   type
  288.     TIntLess = TLess<LongInt>;
  289.     TIntVect = TVector<LongInt>;
  290.     TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  291.   var
  292.     sc: array[0..21474] of TIntVect;
  293.     i: LongInt;
  294.     pkey, key, cnt: LongInt;
  295.     offset: LongInt;
  296.     InOut: Text;
  297.   begin
  298.     routineName := {$I %currentroutine%};
  299.     for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  300.  
  301.     Assign(InOut, inFilename);
  302.     Reset(InOut);
  303.     while not EOF(InOut) do
  304.       begin
  305.         ReadLn(InOut, key);
  306.         Inc(Total);
  307.         sc[key div 100000].PushBack(key mod 100000);
  308.       end;
  309.     Close(InOut);
  310.  
  311.     Assign(InOut, outFilename);
  312.     Rewrite(InOut);
  313.  
  314.     for i := Low(sc) to High(sc) do if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  315.  
  316.     offset := -100000;
  317.     for i := Low(sc) to High(sc) do
  318.     begin
  319.       Inc(offset, 100000);
  320.       pkey := -1;
  321.       cnt := 0;
  322.       for key in sc[i] do
  323.       begin
  324.         if pkey <> key then
  325.         begin
  326.           if cnt <> 0 then
  327.           begin
  328.             WriteLn(InOut, offset + pkey, ' - ', cnt);
  329.             Inc(Unique);
  330.           end;
  331.           pkey := key;
  332.           cnt := 0;
  333.         end;
  334.         Inc(cnt);
  335.       end;
  336.       if cnt <> 0 then
  337.       begin
  338.         WriteLn(InOut, offset + pkey, ' - ', cnt);
  339.         Inc(Unique);
  340.       end;
  341.     end;
  342.  
  343.     Close(InOut);
  344.     for i := Low(sc) to High(sc) do
  345.       sc[i].Destroy;
  346.   end;
  347.  
  348.   procedure SortCount440bx;
  349.   begin
  350.     routineName := {$I %currentroutine%};
  351.     WinSortCount3.DataCount := 0;
  352.     WinSortCount3.Unique := 0;
  353.     WinSortCount3.InFileName := inFilename;
  354.     WinSortCount3.OutFileName := outFilename;
  355.     WinSortCount3.SortCount;
  356.     Total := WinSortCount3.DataCount;
  357.     Unique := WinSortCount3.Unique;
  358.   end;
  359.  
  360.   procedure SortCountBrunoK;  { Note : requires Classes }
  361.   const
  362.     cCR = $0D;
  363.     c0 = Ord('0');
  364.     function LoadStreamToList(aMemStream: TMemoryStream; aList: TFPList): integer;
  365.     var
  366.       { Parse lines }
  367.       lPByte, lPEndByte: PByte;
  368.  
  369.       { Values extraction }
  370.       lPByteTextStart: PByte = nil;
  371.       lValueStarted: boolean = False;
  372.       lDWORD: DWORD;
  373.  
  374.       lCntRec: integer;
  375.     begin
  376.       { Prepare aList }
  377.       lCntRec := aMemStream.Size;
  378.       if lCntRec <= 0 then // Stream empty ?
  379.         exit(0);
  380.       aList.Count := lCntRec div 10; // Setup approximative size
  381.       aList.Count := 0;
  382.  
  383.       lPByte := PByte(aMemStream.memory);
  384.       lPEndByte := lPByte + aMemStream.Size;
  385.  
  386.       while lPByte <= lPEndByte do begin
  387.         if (lPByte = lPEndByte) or (lPByte^ <= cCR) then begin
  388.           if lValueStarted then begin
  389.             lDWORD := 0;
  390.             while lPByteTextStart < lPByte do begin
  391.               lDWORD := lDWORD * 10 + lPByteTextStart^ - c0;
  392.               Inc(lPByteTextStart);
  393.             end;
  394.             aList.Add(Pointer(lDWORD));
  395.             lValueStarted := False;
  396.           end;
  397.         end
  398.         else if not lValueStarted then begin
  399.           lPByteTextStart := lPByte;
  400.           lValueStarted := True;
  401.         end;
  402.         Inc(lPByte);
  403.       end;
  404.       Result := aList.Count;
  405.     end;
  406.  
  407.     function BkCompare(Item1, Item2: Pointer): integer;
  408.     begin
  409.       Result := 1;
  410.       if Item1 < Item2 then
  411.         Result := -1
  412.       else if Item1 = Item2 then
  413.         Result := 0;
  414.     end;
  415.  
  416.   var
  417.     lFile: TextFile;
  418.     lMemStream: TMemoryStream;
  419.     lNbRecs: integer = 0;
  420.     lFPList: TFPList;
  421.     lIx: integer;
  422.     lLastValue: pointer;
  423.     lListCount: integer;
  424.     lLastValueCount: integer;
  425.   begin
  426.     routineName := 'SortCountBrunoK'; //{$I %currentroutine%};
  427.     lMemStream := TMemoryStream.Create;
  428.     lMemStream.LoadFromFile(inFileName);
  429.     lFPList := TFPList.Create;
  430.     lNbRecs := LoadStreamToList(lMemStream, lFPList);
  431.     lMemStream.Free; // Not needed anymore
  432.     if lNbRecs > 0 then begin
  433.       AssignFile(lFile, outFilename);
  434.       Rewrite(lFile);
  435.       lFPList.Sort(@BkCompare);
  436.       lIx := 0;
  437.       lLastValue := lFPList[lIx];
  438.       lLastValueCount := 1;
  439.       lListCount := lFPList.Count;
  440.       repeat
  441.         Inc(lIx);
  442.         if (lIx >= lListCount) or (lFPList[lIx] <> lLastValue) then begin
  443.           Inc(unique);
  444.           WriteLn(lFile, UINTPTR(lLastValue), ' - ', lLastValueCount);
  445.           if (lIx >= lListCount) then
  446.             Break;
  447.           lLastValue := lFPList[lIx];
  448.           lLastValueCount := 1;
  449.         end
  450.         else
  451.           Inc(lLastValueCount);
  452.       until False;
  453.       CloseFile(lFile);
  454.       Total := lIx;
  455.     end;
  456.     lFPList.Free;
  457.   end;
  458.  
  459.   procedure Run(aProc: TProcedure);
  460.   begin
  461.     Total := 0;
  462.     Unique := 0;
  463.     Start := Now;
  464.     try
  465.       aProc();
  466.       WriteLn(Copy(routineName, 10, 20):7,'''s time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4,#9'#unique: ',Unique,' #total: ',Total);
  467.     except
  468.       on e: Exception do
  469.         WriteLn('crashes with message "', e.Message, '"');
  470.     end;
  471.   end;
  472.  
  473. begin
  474.   Randomize;
  475.  
  476.   procedures := TProcedureArray.Create(
  477.                    SortCountJulkas1, SortCountJulkas2, SortCountAkira, SortCountHoward, SortCountAvk1,
  478.                    SortCountAvk2, SortCount440bx, SortCountBrunoK);
  479.  
  480.   for randomrange := 1 to 10 do
  481.     begin
  482.       GenerateData(randomrange, 10);
  483.       WriteLn(#10'RandomRange = ',randomrange);
  484.       for proc in procedures do
  485.         Run(proc);
  486.     end;
  487.  
  488.   for repeatCount := 1 to 10 do
  489.     begin
  490.       GenerateData(8, 2*repeatCount);
  491.       WriteLn(#10'repeatMillionsCount = ', 2*repeatCount);
  492.       for proc in procedures do
  493.         Run(proc);
  494.     end;
  495. end.
  496.  

At least on my machine, the performance of your code is second only to 440bx one.
« Last Edit: August 01, 2019, 06:52:38 am by avk »

440bx

  • Hero Member
  • *****
  • Posts: 1269
Re: Sorting and Counting
« Reply #100 on: August 01, 2019, 08:23:16 am »
At least on my machine, the performance of your code is second only to 440bx one.
My entry should be disqualified because the sort sequence it generates is not the sort sequence expected by the "user" (in this case the OP.)

In the real world the different sequence would not be acceptable.

ETA:

Bruno's implementation can be made a smidgen faster by changing the compare function to test for equality first (since there are more duplicate values than unique), that would lower the number of comparisons required to determine relative magnitudes.
« Last Edit: August 01, 2019, 08:51:16 am by 440bx »
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

MathMan

  • Full Member
  • ***
  • Posts: 166
Re: Sorting and Counting
« Reply #101 on: August 01, 2019, 08:58:58 am »
Quote
ETA:

Bruno's implementation can be made a smidgen faster by changing the compare function to test for equality first (since there are more duplicate values than unique), that would lower the number of comparisons required to determine relative magnitudes.

I think that the compare function can be speeded up a lot by "linearizing" the code to eliminate data dependend branches like

Code: [Select]
  Result := 1 - integer( a=b ) - 2*integer( a<b );

At least on my system (Core i6700k) the latter runs a roughly tripple speed of the former.

MathMan

440bx

  • Hero Member
  • *****
  • Posts: 1269
Re: Sorting and Counting
« Reply #102 on: August 01, 2019, 09:29:21 am »
Code: [Select]
  Result := 1 - integer( a=b ) - 2*integer( a<b );
At least on my system (Core i6700k) the latter runs a roughly tripple speed of the former.

MathMan
I am surprised that is faster because in order to calculate the result, the arithmetic expression, unlike a Boolean expression, must be fully evaluated which means in all cases, two (2) compares instead of possibly just one, will be necessary.
using FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

avk

  • Full Member
  • ***
  • Posts: 163
    • my self-education project
Re: Sorting and Counting
« Reply #103 on: August 01, 2019, 09:46:58 am »
My entry should be disqualified because the sort sequence it generates is not the sort sequence expected by the "user" (in this case the OP.)
And this can not be fixed?

@BrunoK, you really added fuel to the fire. :)
I wondered what would happen if I attached a similar trick to SortCountAvk2:
Code: Pascal  [Select]
  1. program OccurrenceCounter;
  2.  
  3. {$mode delphi}
  4. {$ImplicitExceptions Off}
  5. {$MODESWITCH NESTEDPROCVARS}
  6.  
  7. uses
  8.   Classes, SysUtils, DateUtils,
  9.   Generics.Defaults, Generics.Collections,
  10.   LGUtils, LGHashMultiSet, LGArrayHelpers,
  11.   gutil, garrayutils, gvector, gmap,
  12.   WinSortCount3;
  13.  
  14. type
  15.   TIntPair = TPair<LongInt, LongInt>;
  16.   TProcedureArray = array of procedure;
  17.  
  18.   function ComparePairs(constref L, R: TIntPair): LongInt;
  19.   begin
  20.     if L.Key < R.Key then
  21.       Result := -1
  22.     else if L.Key = R.Key then
  23.       Result := 0
  24.     else
  25.       Result := 1;
  26.   end;
  27.  
  28. var
  29.   Total, Unique, repeatCount, randomrange: Integer;
  30.   Start: TDateTime;
  31.   inFilename: String = 'data.txt';
  32.   outFilename: String = 'sorted.txt';
  33.   routineName: String;
  34.   procedures: TProcedureArray;
  35.   proc: procedure;
  36.  
  37.   procedure GenerateData(randomRange: Integer=8; repeatMillionsCount: Integer=2);
  38.   var
  39.     InFile: Text;
  40.     I: LongInt;
  41.   begin
  42.     Assign(InFile, inFilename);
  43.     Rewrite(InFile);
  44.     for I := 1 to repeatMillionsCount * 1000000 do
  45.       WriteLn(InFile, 1500000000 + Random(randomRange * 100000));
  46.     Close(InFile);
  47.   end;
  48.  
  49.   procedure SortCountAkira;
  50.   var
  51.     I: LongInt;
  52.     InOut: Text;
  53.     Map: TDictionary<LongInt, LongInt>;
  54.     Pair: TIntPair;
  55.     Pairs: TArray<TIntPair>;
  56.   begin
  57.     routineName := {$I %currentroutine%};
  58.     Map := TDictionary<LongInt, LongInt>.Create();
  59.     //Map.Capacity := 10000000;
  60.     Assign(InOut, inFilename);
  61.     Reset(InOut);
  62.     while not EOF(InOut) do begin
  63.       ReadLn(InOut, I);
  64.       Inc(Total);
  65.       if not Map.ContainsKey(I) then
  66.         begin
  67.           Map.Add(I, 1);
  68.           Inc(Unique);
  69.         end
  70.       else
  71.         Map[I] := Map[I] + 1;
  72.     end;
  73.     Close(InOut);
  74.     Pairs := Map.ToArray();
  75.     Map.Free();
  76.     TArrayHelper<TIntPair>.Sort(
  77.       Pairs,
  78.       TComparer<TIntPair>.Construct(ComparePairs)
  79.     );
  80.     Assign(InOut, outFilename);
  81.     Rewrite(InOut);
  82.     for Pair in Pairs do with Pair do
  83.       WriteLn(InOut, Key, ' - ', Value);
  84.     Close(InOut);
  85.   end;
  86.  
  87.   procedure SortCountHoward;
  88.   var
  89.     arr: array of Integer;
  90.     textf: TextFile;
  91.     min: Integer = High(Integer);
  92.     max: Integer = -1;
  93.     i: Integer;
  94.   begin
  95.     routineName := {$I %currentroutine%};
  96.     AssignFile(textf, inFilename);
  97.     Reset(textf);
  98.     while not EOF(textf) do
  99.       begin
  100.         ReadLn(textf, i);
  101.         Inc(Total);
  102.         if i < min then
  103.           min := i;
  104.         if i > max then
  105.           max := i;
  106.       end;
  107.     SetLength(arr, max-min+1);
  108.  
  109.     Reset(textf);
  110.     while not EOF(textf) do
  111.       begin
  112.         ReadLn(textf, i);
  113.         Dec(i, min);
  114.         Inc(arr[i]);
  115.       end;
  116.     CloseFile(textf);
  117.  
  118.     AssignFile(textf, outFilename);
  119.     Rewrite(textf);
  120.     for i := Low(arr) to High(arr) do
  121.       case (arr[i] > 0) of
  122.         True:
  123.           begin
  124.             WriteLn(textf, i+min, ' - ', arr[i]);
  125.             Inc(Unique);
  126.           end;
  127.       end;
  128.     CloseFile(textf);
  129.     SetLength(arr, 0);
  130.   end;
  131.  
  132.   procedure SortCountAvk1;
  133.   type
  134.     TCounter  = TGHashMultiSetLP<Integer>;
  135.     TCountRef = TGAutoRef<TCounter>;
  136.     TEntry    = TCounter.TEntry;
  137.  
  138.     function EntryCmp(constref L, R: TEntry): SizeInt;
  139.     begin
  140.       if L.Key > R.Key then
  141.         Result := 1
  142.       else
  143.         if L.Key < R.Key then
  144.           Result := -1
  145.         else
  146.           Result := 0;
  147.     end;
  148.  
  149.   var
  150.     CountRef: TCountRef;
  151.     InOut: Text;
  152.     Counter: TCounter;
  153.     e: TEntry;
  154.     I: Integer;
  155.   begin
  156.     routineName := {$I %currentroutine%};
  157.     Counter := CountRef;
  158.     //Counter.LoadFactor := 0.7;
  159.     Assign(InOut, inFilename);
  160.     Reset(InOut);
  161.     while not EOF(InOut) do
  162.       begin
  163.         ReadLn(InOut, I);
  164.         Counter.Add(I);
  165.       end;
  166.     Close(InOut);
  167.     Total := Counter.Count;
  168.     Unique := Counter.EntryCount;
  169.     if Counter.NonEmpty then
  170.       begin
  171.         Assign(InOut, outFilename);
  172.         Rewrite(InOut);
  173.         for e in Counter.Entries.Sorted(EntryCmp) do
  174.           with e do
  175.             WriteLn(InOut, Key, ' - ', Count);
  176.         Close(InOut);
  177.       end;
  178.   end;
  179.  
  180.   procedure SortCountAvk2;
  181.   type
  182.     TIntArray = array of Integer;
  183.  
  184.     function LoadData: TIntArray;
  185.     var
  186.       PCurr, PLast: PByte;
  187.       DataSize, CurrValue, I: Integer;
  188.       DoReading: Boolean = False;
  189.     begin
  190.       Result := nil;
  191.       I := 0;
  192.       with TMemoryStream.Create do
  193.         try
  194.           LoadFromFile(inFileName);
  195.           DataSize := Size;
  196.           if DataSize <= 0 then
  197.             exit;
  198.           SetLength(Result, DataSize div 10);
  199.           PCurr := Memory;
  200.           PLast := PCurr + Size;
  201.           CurrValue := 0;
  202.           repeat
  203.             if PCurr^ > $0D then
  204.               begin
  205.                 DoReading := True;
  206.                 CurrValue := CurrValue * 10 + PCurr^ - Ord('0');
  207.               end
  208.             else
  209.               if DoReading then
  210.                 begin
  211.                   if Length(Result) = I then
  212.                     SetLength(Result, I * 2);
  213.                   Result[I] := CurrValue;
  214.                   Inc(I);
  215.                   CurrValue := 0;
  216.                   DoReading := False;
  217.                 end;
  218.             Inc(PCurr);
  219.           until PCurr > PLast;
  220.         finally
  221.           Free;
  222.         end;
  223.       SetLength(Result, I);
  224.     end;
  225.   var
  226.     List: TIntArray;
  227.     InOut: Text;
  228.     I, J, Count, DupCount: Integer;
  229.   begin
  230.     routineName := {$I %currentroutine%};
  231.     List := LoadData;
  232.     if List = nil then
  233.       exit;
  234.     TGOrdinalArrayHelper<Integer>.Sort(List);
  235.     Total := Length(List);
  236.     Count := Total;
  237.     DupCount := 0;
  238.     I := 0;
  239.     Assign(InOut, outFilename+'1');
  240.     Rewrite(InOut);
  241.     repeat
  242.       J := List[I];
  243.       while (I < Count) and (List[I] = J) do
  244.         begin
  245.           Inc(DupCount);
  246.           Inc(I);
  247.         end;
  248.       WriteLn(InOut, J, ' - ', DupCount);
  249.       Inc(Unique);
  250.       DupCount := 0;
  251.     until I = Count;
  252.     Close(InOut);
  253.   end;
  254.  
  255.   procedure SortCountJulkas1;
  256.   type
  257.     TIntLess = TLess<LongInt>;
  258.     TIntVect = TVector<LongInt>;
  259.     TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  260.   const
  261.     bsz = 1 shl 17;
  262.   var
  263.     sc: array[0..2147483647 shr 17] of TIntVect;
  264.     i: LongInt;
  265.     pkey, key, cnt: LongInt;
  266.     offset: LongInt;
  267.     InOut: Text;
  268.   begin
  269.     routineName := {$I %currentroutine%};
  270.     for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  271.  
  272.     Assign(InOut, inFilename);
  273.     Reset(InOut);
  274.     while not EOF(InOut) do
  275.       begin
  276.         ReadLn(InOut, key);
  277.         Inc(Total);
  278.         sc[key shr 17].PushBack(key and $1FFFF);
  279.       end;
  280.     Close(InOut);
  281.  
  282.     Assign(InOut, outFilename);
  283.     Rewrite(InOut);
  284.  
  285.     offset := -bsz;
  286.     for i := Low(sc) to High(sc) do
  287.     begin
  288.       Inc(offset, bsz);
  289.       pkey := -1;
  290.       cnt := 0;
  291.       if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  292.       for key in sc[i] do
  293.       begin
  294.         if pkey <> key then
  295.         begin
  296.           if cnt <> 0 then
  297.           begin
  298.             WriteLn(InOut, offset + pkey, ' - ', cnt);
  299.             Inc(Unique);
  300.           end;
  301.           pkey := key;
  302.           cnt := 0;
  303.         end;
  304.         Inc(cnt);
  305.       end;
  306.       if cnt <> 0 then
  307.       begin
  308.         WriteLn(InOut, offset + pkey, ' - ', cnt);
  309.         Inc(Unique);
  310.       end;
  311.     end;
  312.  
  313.     Close(InOut);
  314.     for i := Low(sc) to High(sc) do sc[i].Destroy;
  315.   end;
  316.  
  317.   procedure SortCountJulkas2;
  318.   type
  319.     TIntLess = TLess<LongInt>;
  320.     TIntVect = TVector<LongInt>;
  321.     TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  322.   var
  323.     sc: array[0..21474] of TIntVect;
  324.     i: LongInt;
  325.     pkey, key, cnt: LongInt;
  326.     offset: LongInt;
  327.     InOut: Text;
  328.   begin
  329.     routineName := {$I %currentroutine%};
  330.     for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  331.  
  332.     Assign(InOut, inFilename);
  333.     Reset(InOut);
  334.     while not EOF(InOut) do
  335.       begin
  336.         ReadLn(InOut, key);
  337.         Inc(Total);
  338.         sc[key div 100000].PushBack(key mod 100000);
  339.       end;
  340.     Close(InOut);
  341.  
  342.     Assign(InOut, outFilename);
  343.     Rewrite(InOut);
  344.  
  345.     for i := Low(sc) to High(sc) do if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  346.  
  347.     offset := -100000;
  348.     for i := Low(sc) to High(sc) do
  349.     begin
  350.       Inc(offset, 100000);
  351.       pkey := -1;
  352.       cnt := 0;
  353.       for key in sc[i] do
  354.       begin
  355.         if pkey <> key then
  356.         begin
  357.           if cnt <> 0 then
  358.           begin
  359.             WriteLn(InOut, offset + pkey, ' - ', cnt);
  360.             Inc(Unique);
  361.           end;
  362.           pkey := key;
  363.           cnt := 0;
  364.         end;
  365.         Inc(cnt);
  366.       end;
  367.       if cnt <> 0 then
  368.       begin
  369.         WriteLn(InOut, offset + pkey, ' - ', cnt);
  370.         Inc(Unique);
  371.       end;
  372.     end;
  373.  
  374.     Close(InOut);
  375.     for i := Low(sc) to High(sc) do
  376.       sc[i].Destroy;
  377.   end;
  378.  
  379.   procedure SortCount440bx;
  380.   begin
  381.     routineName := {$I %currentroutine%};
  382.     WinSortCount3.DataCount := 0;
  383.     WinSortCount3.Unique := 0;
  384.     WinSortCount3.InFileName := inFilename;
  385.     WinSortCount3.OutFileName := outFilename;
  386.     WinSortCount3.SortCount;
  387.     Total := WinSortCount3.DataCount;
  388.     Unique := WinSortCount3.Unique;
  389.   end;
  390.  
  391.   procedure SortCountBrunoK;  { Note : requires Classes }
  392.   const
  393.     cCR = $0D;
  394.     c0 = Ord('0');
  395.     function LoadStreamToList(aMemStream: TMemoryStream; aList: TFPList): integer;
  396.     var
  397.       { Parse lines }
  398.       lPByte, lPEndByte: PByte;
  399.  
  400.       { Values extraction }
  401.       lPByteTextStart: PByte = nil;
  402.       lValueStarted: boolean = False;
  403.       lDWORD: DWORD;
  404.  
  405.       lCntRec: integer;
  406.     begin
  407.       { Prepare aList }
  408.       lCntRec := aMemStream.Size;
  409.       if lCntRec <= 0 then // Stream empty ?
  410.         exit(0);
  411.       aList.Count := lCntRec div 10; // Setup approximative size
  412.       aList.Count := 0;
  413.  
  414.       lPByte := PByte(aMemStream.memory);
  415.       lPEndByte := lPByte + aMemStream.Size;
  416.  
  417.       while lPByte <= lPEndByte do begin
  418.         if (lPByte = lPEndByte) or (lPByte^ <= cCR) then begin
  419.           if lValueStarted then begin
  420.             lDWORD := 0;
  421.             while lPByteTextStart < lPByte do begin
  422.               lDWORD := lDWORD * 10 + lPByteTextStart^ - c0;
  423.               Inc(lPByteTextStart);
  424.             end;
  425.             aList.Add(Pointer(lDWORD));
  426.             lValueStarted := False;
  427.           end;
  428.         end
  429.         else if not lValueStarted then begin
  430.           lPByteTextStart := lPByte;
  431.           lValueStarted := True;
  432.         end;
  433.         Inc(lPByte);
  434.       end;
  435.       Result := aList.Count;
  436.     end;
  437.  
  438.     function BkCompare(Item1, Item2: Pointer): integer;
  439.     begin
  440.       Result := 1;
  441.       if Item1 < Item2 then
  442.         Result := -1
  443.       else if Item1 = Item2 then
  444.         Result := 0;
  445.     end;
  446.  
  447.   var
  448.     lFile: TextFile;
  449.     lMemStream: TMemoryStream;
  450.     lNbRecs: integer = 0;
  451.     lFPList: TFPList;
  452.     lIx: integer;
  453.     lLastValue: pointer;
  454.     lListCount: integer;
  455.     lLastValueCount: integer;
  456.   begin
  457.     routineName := 'SortCountBrunoK'; //{$I %currentroutine%};
  458.     lMemStream := TMemoryStream.Create;
  459.     lMemStream.LoadFromFile(inFileName);
  460.     lFPList := TFPList.Create;
  461.     lNbRecs := LoadStreamToList(lMemStream, lFPList);
  462.     lMemStream.Free; // Not needed anymore
  463.     if lNbRecs > 0 then begin
  464.       AssignFile(lFile, outFilename);
  465.       Rewrite(lFile);
  466.       lFPList.Sort(@BkCompare);
  467.       lIx := 0;
  468.       lLastValue := lFPList[lIx];
  469.       lLastValueCount := 1;
  470.       lListCount := lFPList.Count;
  471.       repeat
  472.         Inc(lIx);
  473.         if (lIx >= lListCount) or (lFPList[lIx] <> lLastValue) then begin
  474.           Inc(unique);
  475.           WriteLn(lFile, UINTPTR(lLastValue), ' - ', lLastValueCount);
  476.           if (lIx >= lListCount) then
  477.             Break;
  478.           lLastValue := lFPList[lIx];
  479.           lLastValueCount := 1;
  480.         end
  481.         else
  482.           Inc(lLastValueCount);
  483.       until False;
  484.       CloseFile(lFile);
  485.       Total := lIx;
  486.     end;
  487.     lFPList.Free;
  488.   end;
  489.  
  490.   procedure Run(aProc: TProcedure);
  491.   begin
  492.     Total := 0;
  493.     Unique := 0;
  494.     Start := Now;
  495.     try
  496.       aProc();
  497.       WriteLn(Copy(routineName, 10, 20):7,'''s time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4,#9'#unique: ',Unique,' #total: ',Total);
  498.     except
  499.       on e: Exception do
  500.         WriteLn('crashes with message "', e.Message, '"');
  501.     end;
  502.   end;
  503.  
  504. begin
  505.   Randomize;
  506.  
  507.   procedures := TProcedureArray.Create(
  508.                    SortCountJulkas1, SortCountJulkas2, SortCountAkira, SortCountHoward, SortCountAvk1,
  509.                    SortCountAvk2, SortCount440bx, SortCountBrunoK);
  510.  
  511.   for randomrange := 1 to 10 do
  512.     begin
  513.       GenerateData(randomrange, 10);
  514.       WriteLn(#10'RandomRange = ',randomrange);
  515.       for proc in procedures do
  516.         Run(proc);
  517.     end;
  518.  
  519.   for repeatCount := 1 to 10 do
  520.     begin
  521.       GenerateData(8, 2*repeatCount);
  522.       WriteLn(#10'repeatMillionsCount = ', 2*repeatCount);
  523.       for proc in procedures do
  524.         Run(proc);
  525.     end;
  526. end.
  527.  

Results in attachment.
« Last Edit: August 01, 2019, 02:14:28 pm by avk »

MathMan

  • Full Member
  • ***
  • Posts: 166
Re: Sorting and Counting
« Reply #104 on: August 01, 2019, 10:00:58 am »
Code: [Select]
  Result := 1 - integer( a=b ) - 2*integer( a<b );
At least on my system (Core i6700k) the latter runs a roughly tripple speed of the former.

MathMan
I am surprised that is faster because in order to calculate the result, the arithmetic expression, unlike a Boolean expression, must be fully evaluated which means in all cases, two (2) compares instead of possibly just one, will be necessary.

I did measure with 1 mio random pair entries (which might not be 100% equivalent to the calling patterns of the sort function) and the executon time were 16 mio clock cycles vs. 5.5 mio clock cycles on my system. The branch misprediction penalty on modern CPUs is quite high - in the range of 20-50 clock cycles - and for random entries you'll have around 50% misprediction.

MathMan