Lazarus

Programming => General => Topic started by: mpknap on July 16, 2019, 07:35:20 am

Title: Sorting and Counting
Post by: mpknap on July 16, 2019, 07:35:20 am
Hello.
I have a text file and in it unixtime times,
this is how it looks:

file.txt
1562551080

1562551140

1562551260

1562551260

1562551260

1562551260

1562551320

1562551380
.......


The file is very large. about 20MB.

Many lines are repeated, others are only once. The task that I want to do is to calculate how many times it is the same. So make an array with Unix time and the number of repetitions for a given time.

After sorting and calculation it should be like this:
1562551080 - 1

1562551140 - 1

1562551260 - 4

1562551320 -1

1562551380 -1
......


Any ideas?
Title: Re: Sorting and Counting
Post by: mangakissa on July 16, 2019, 08:17:48 am
You give the solution by yourself; a two dimensional array.
Title: Re: Sorting and Counting
Post by: julkas on July 16, 2019, 09:07:00 am
Possible approach - ordered associative array with uniq keys.
GMap from fcl-stl - https://github.com/graemeg/freepascal/blob/master/packages/fcl-stl/doc/main.pdf
Title: Re: Sorting and Counting
Post by: avk on July 16, 2019, 09:11:17 am
A data structure called "multiset" is well suited to your task.
Title: Re: Sorting and Counting
Post by: mangakissa on July 16, 2019, 04:20:30 pm
Possible approach - ordered associative array with uniq keys.
GMap from fcl-stl - https://github.com/graemeg/freepascal/blob/master/packages/fcl-stl/doc/main.pdf
FGL is standard with TFPGMap in it.

But this is easy to do with an 2 dimensional array. After finding the doubles, you can sort in every way yo want.
Title: Re: Sorting and Counting
Post by: jamie on July 16, 2019, 10:47:22 pm
Why do I think this is home work ?

In any case I would use a TStringList, load it, sort it using the features of the TStringList and maybe even use the object of each as a counter.

Title: Re: Sorting and Counting
Post by: winni on July 16, 2019, 11:29:05 pm
Yes, as Jamie said:

* load it into a stringlist
* set the stringlist.sorted to true - and wait, 20 MB mlight take some time
* count the duplicates
* and now: write them with your format: ( time - count  ) into a second stringlist
* save it to disk - done

In case of Linux/Unix: this goes really faster with the shell and sort and uniq …….....

Winni
Title: Re: Sorting and Counting
Post by: mpknap on July 17, 2019, 07:00:05 am
Why do I think this is home work ?

In any case I would use a TStringList, load it, sort it using the features of the TStringList and maybe even use the object of each as a counter.

In a sense, it's homework, but it's for me. I'm not a programmer by education, just a hobby. In life I am Bob the Builder :). I like to write something in my spare time, probably only to prove myself. I am interested in the CREDO project, I use their data. This time I want to juxtapose them with the detection of global earthquakes, I wonder if it has any connection, although nobody writes about it ;).

Here are some tools for the CREDO project https://github.com/credo-science/Windows-Tools

And the CREDO project itself here: https://credo.science/


Yes, I started, but I do not know how to count duplicates and how to do an array with the results. I have so much code for now. Good start?

Code: Pascal  [Select]
  1. procedure TForm1.Button1Click(Sender: TObject);
  2. const
  3.   BLOCK_SIZE = 10000;
  4.  
  5. var
  6.   f: textfile;
  7.   Times: TStringList;
  8.   det, separator: string;
  9.   i: integer;
  10.  
  11. begin
  12.   czas := TStringList.Create;
  13.   czas.Duplicates := dupIgnore;
  14.   czas.Sorted := True;
  15.  
  16.   AssignFile(f, 'Base_total.txt');
  17.   reset(f);
  18.  
  19.   SetLength(det, 0);
  20.   FillChar(det, SizeOf(det), 0);
  21.   while not EOF(f) do
  22.   begin
  23.  
  24.     if i mod BLOCK_SIZE = 0 then
  25.       SetLength(det, Length(det) + BLOCK_SIZE);
  26.  
  27.     readln(f, det);
  28.     czas.Add(det);
  29.     readln(f, separator);
  30.     Inc(i);
  31.   end;
  32.   closefile(f);
  33. end;    
Title: Re: Sorting and Counting
Post by: engkin on July 17, 2019, 08:47:31 am
Maybe this:
Code: Pascal  [Select]
  1. procedure CountDuplicates(AFileName, AResultFileName: String);
  2. var
  3.   List, Res: TStringList;
  4.   Item: String = '';
  5.   Count: integer = 0;
  6.   i: Integer = 0;
  7. begin
  8.   List := TStringList.Create;
  9.   Res := TStringList.Create;
  10.   try
  11.     List.LoadFromFile(AFileName);
  12.     List.Sorted := True;
  13.     while i<List.Count do
  14.     begin
  15.       // Get current item
  16.       Item := List[i];
  17.  
  18.       // Count similar items
  19.       Count := 0;
  20.       while (i<List.Count) and (List[i]=Item) do
  21.       begin
  22.         inc(i);
  23.         inc(count);
  24.       end;
  25.  
  26.       // Add the result
  27.       Res.Add(Format('%s - %d', [Item, Count]));
  28.     end;
  29.  
  30.     Res.SaveToFile(AResultFileName);
  31.   finally
  32.     List.Free;
  33.     Res.Free;
  34.   end;
  35. end;
  36.  

To use it:
Code: Pascal  [Select]
  1.   CountDuplicates('in.txt','out.txt');

The result would be saved in out.txt file.
Notice that the first result is going to be the number of empty lines, if any.
Title: Re: Sorting and Counting
Post by: mangakissa on July 17, 2019, 08:53:44 am
Code: Pascal  [Select]
  1. type
  2.  
  3.   TMyTime = packed record
  4.     Unixtime : string;
  5.     Counter  : word;
  6.   end;
  7.  
  8. implementation
  9.  
  10. procedure TForm1.Button1Click(Sender: TObject);
  11. var f      : textfile;
  12.     myline : string;
  13.     MyTime : array of TMyTime;
  14.     index  : integer;
  15. begin
  16.   assignfile(f,'file.txt');
  17.   reset(f);
  18.   index := 0;
  19.   while not eof(f) do
  20.   begin
  21.     readln(f,myline);
  22.     if not find(MyTime, myline) then
  23.     begin
  24.       index := index + 1;
  25.       setlength(MyTime,index);
  26.       MyTime[index - 1].Unixtime := myLine;
  27.       MyTime[index - 1].Counter := 1;
  28.     end;
  29.   end;
  30.   closefile(f);
  31.   //use bubblesort, quicksort, heapsort or other sort
  32.   for index := 0 to length(myTime) - 1 do
  33.     memo1.Lines.add(format('time : %s   duplicates : %3d',[myTime[index].Unixtime,myTime[index].Counter]));
  34. end;
  35.  
  36. function TForm1.Find(var aMyTime : array of TMytime; const aLine : string) : boolean;
  37. var index : integer;
  38. begin
  39.   result := false;
  40.   if length(aMyTime) > 0 then
  41.   begin
  42.     for index := low(aMyTime) to high(aMyTime) do
  43.     begin
  44.       if aMyTime[index].Unixtime = aLine then
  45.       begin
  46.         aMyTime[index].Counter := aMyTime[index].Counter + 1;
  47.         result := true;
  48.         break;
  49.       end;
  50.     end;
  51.   end;
  52. end;
  53.  
  54. end.
  55.  
stringlist is nice but for small files. and uses a lot of resources for nothing. Okay, things already built in like sort, but in this way you actually see what you're doing.
Title: Re: Sorting and Counting
Post by: avk on July 17, 2019, 01:32:40 pm
Well, I'm glad to see that I'm not the only one who believes that TStringList is not quite suitable for this task (if only because unixtime is a number, and sorting numbers is much faster than sorting strings). However, the task of counting duplicate values ​​is fairly common, and the standard library should have a ready solution for it.
Title: Re: Sorting and Counting
Post by: howardpc on July 17, 2019, 02:13:18 pm
I think TStringList is eminently suitable for this task.
Here's an alternative solution, which may use less resources.
Code: Pascal  [Select]
  1. unit mainSortCount;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.    Classes, SysUtils, Forms, StdCtrls;
  9.  
  10. type
  11.    TForm1 = class(TForm)
  12.      Memo1: TMemo;
  13.      procedure FormCreate(Sender: TObject);
  14.    end;
  15.  
  16. var
  17.    Form1: TForm1;
  18.  
  19.   procedure SortCount(const anInFile: String; out aList: TStringList);
  20.  
  21.   procedure ShowListInMemo(constref aList: TStringList; aMemo: TMemo);
  22.  
  23. implementation
  24.  
  25. {$R *.lfm}
  26.  
  27. { TForm1 }
  28.  
  29. procedure SortCount(const anInFile: String; out aList: TStringList);
  30. const
  31.   one = PtrUInt(1);
  32. var
  33.   textf: TextFile;
  34.   s: String;
  35.   idx: Integer;
  36.  
  37.   function GetSuccObj(anIntObj: TObject): TObject;
  38.   var
  39.     i: PtrUInt absolute anIntObj;
  40.   begin
  41.     Inc(i);
  42.     Exit(anIntObj);
  43.   end;
  44.  
  45. begin
  46.   Assert(FileExists(anInFile), 'cannot find file "'+anInFile+'"');
  47.   aList := TStringList.Create;
  48.   aList.Duplicates := dupError;
  49.   aList.Sorted := True;
  50.   AssignFile(textf, anInFile);
  51.   try
  52.     Reset(textf);
  53.     while not EOF(textf) do
  54.       begin
  55.         ReadLn(textf, s);
  56.         s := Trim(s);
  57.         idx := aList.IndexOf(s);
  58.         case idx of
  59.           -1: aList.AddObject(s, TObject(one));
  60.           else
  61.             aList.Objects[idx] := GetSuccObj(aList.Objects[idx]);
  62.         end;
  63.       end;
  64.   finally
  65.     CloseFile(textf);
  66.   end;
  67. end;
  68.  
  69. procedure ShowListInMemo(constref aList: TStringList; aMemo: TMemo);
  70. var
  71.   i: Integer;
  72. begin
  73.   if Assigned(aList) and Assigned(aMemo) then
  74.     begin
  75.       aMemo.Clear;
  76.       for i := 0 to aList.Count-1 do
  77.         aMemo.Lines.Add('%s - %d', [aList[i], PtrUInt(aList.Objects[i])]);
  78.     end;
  79. end;
  80.  
  81. procedure TForm1.FormCreate(Sender: TObject);
  82. var
  83.   sl: TStringList;
  84. begin
  85.   SortCount('infile.txt', sl);
  86.   try
  87.     ShowListInMemo(sl, Memo1);
  88.     Memo1.Lines.SaveToFile('outfile.txt');
  89.   finally
  90.     sl.Free;
  91.   end;
  92. end;
  93.  
  94. end.
Title: Re: Sorting and Counting
Post by: avk on July 17, 2019, 03:20:20 pm
Unfortunately, it’s not only a matter of resources: on a dataset of the size specified above, your solution is 4 times(90 s.) slower than engkin's  one(22 s.)  and it is prohibitively slow for a task of this size.
Title: Re: Sorting and Counting
Post by: julkas on July 17, 2019, 05:29:50 pm
GMap is fast enough. I will post my random test tomorrow.
Title: Re: Sorting and Counting
Post by: marcov on July 17, 2019, 05:56:51 pm
gmap might depend on the number of unique values. If that gets large it might also slow down.
Title: Re: Sorting and Counting
Post by: julkas on July 17, 2019, 06:07:38 pm
gmap might depend on the number of unique values. If that gets large it might also slow down.
About ~ 10000000 random longint values.
Title: Re: Sorting and Counting
Post by: avk on July 17, 2019, 06:40:45 pm
@julkas, to generate a test file, I used this:
Code: Pascal  [Select]
  1. procedure CreateTestFile(const aFileName: string);
  2. var
  3.   I: Integer;
  4. const
  5.   TestSize = 1750000;
  6. begin
  7.   with TStringList.Create do
  8.     try
  9.       for I := 1 to TestSize do
  10.         Add(IntToStr(1500000000 + Random(800000)));
  11.       SaveToFile(aFileName);
  12.     finally
  13.       Free;
  14.     end;
  15. end;
  16.  
The output is a ~20MB text file, containing ~700000 unique values.
Title: Re: Sorting and Counting
Post by: Akira1364 on July 17, 2019, 08:42:08 pm
Here's a example / benchmark of a version that uses Generics.Collections (available by default in trunk FPC, but also compatible with 3.0.4 if you just copy the sources to somewhere in whatever your unit search path is.)

Performance seems to be quite good (runs in around 0.8 - 1.1 seconds on average for me.)

Code: Pascal  [Select]
  1. program Example;
  2.  
  3. {$mode Delphi}
  4.  
  5. uses
  6.   SysUtils, DateUtils,
  7.   Generics.Defaults, Generics.Collections;
  8.  
  9. type
  10.   TIntPair = TPair<LongInt, LongInt>;
  11.   TIntMap = TDictionary<LongInt, LongInt>;
  12.  
  13.   function ComparePairs(constref L, R: TIntPair): LongInt;
  14.   begin
  15.     if L.Key < R.Key then
  16.       Result := -1
  17.     else if L.Key = R.Key then
  18.       Result := 0
  19.     else
  20.       Result := 1;
  21.   end;
  22.  
  23. var
  24.   I: LongInt;
  25.   Start: TDateTime;
  26.   InFile, OutFile: Text;
  27.   Map: TIntMap;
  28.   Pair: TIntPair;
  29.   Pairs: TArray<TIntPair>;
  30.  
  31. begin
  32.   // Generate a random test file first, with an adaptation of avk's method.
  33.   Randomize();
  34.   Assign(InFile, 'data.txt');
  35.   Rewrite(InFile);
  36.   for I := 1 to 1750000 do
  37.     WriteLn(InFile, 1500000000 + Random(800000));
  38.   Close(InFile);
  39.   I := 0;
  40.   // Start the timer only now, as this is where the real work starts.
  41.   Start := Now();
  42.   Map := TIntMap.Create();
  43.   // Allocate a big chunk of memory here in advance, for performance's sake.
  44.   // Doesn't matter if it's more than you end up needing, as Capacity is separate from Count.
  45.   Map.Capacity := 1750000;
  46.   Assign(InFile, 'data.txt');
  47.   Reset(InFile);
  48.   while not EOF(InFile) do begin
  49.     ReadLn(InFile, I);
  50.     if not Map.ContainsKey(I) then
  51.       Map.Add(I, 1)
  52.     else
  53.       Map[I] := Map[I] + 1;
  54.   end;
  55.   Close(InFile);
  56.   Pairs := Map.ToArray();
  57.   TArrayHelper<TIntPair>.Sort(
  58.     Pairs,
  59.     TComparer<TIntPair>.Construct(ComparePairs)
  60.   );
  61.   Assign(OutFile, 'output.txt');
  62.   Rewrite(OutFile);
  63.   for Pair in Pairs do with Pair do
  64.     WriteLn(OutFile, Key, ' - ', Value);
  65.   Close(OutFile);
  66.   Map.Free();
  67.   WriteLn(MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
  68. end.
Title: Re: Sorting and Counting
Post by: howardpc on July 17, 2019, 08:47:16 pm
Unfortunately, it’s not only a matter of resources: on a dataset of the size specified above, your solution is 4 times(90 s.) slower than engkin's  one(22 s.)  and it is prohibitively slow for a task of this size.
I have changed my mind.

A generic container like TStringList is not well suited to this task on large files.
Here follows a simple dynamic array solution. The following routine executes in less than 2s on my average 5-year-old machine, using a test file generated from your code.
Code: Pascal  [Select]
  1. program SortCountlpi;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   Classes, sysutils;
  7.  
  8. procedure CreateTestFile(const aFileName: string);
  9. var
  10.   I: Integer;
  11. const
  12.   TestSize = 1750000;
  13. begin
  14.   with TStringList.Create do
  15.     try
  16.       for I := 1 to TestSize do
  17.         Add(IntToStr(1500000000 + Random(800000)));
  18.       SaveToFile(aFileName);
  19.     finally
  20.       Free;
  21.     end;
  22. end;
  23.  
  24. procedure SortCountViaArray(const anInfile, anOutFile: String);
  25. var
  26.   arr: array of Integer;
  27.   textf: TextFile;
  28.   min: Integer = High(Integer);
  29.   max: Integer = -1;
  30.   i: Integer;
  31. begin
  32.   if FileExists(anInfile) then begin
  33.     AssignFile(textf, anInfile);
  34.     Reset(textf);
  35.     while not EOF(textf) do
  36.       begin
  37.         ReadLn(textf, i);
  38.         if i < min then
  39.           min := i;
  40.         if i > max then
  41.           max := i;
  42.       end;
  43.     SetLength(arr, max-min+1);
  44.  
  45.     Reset(textf);
  46.     while not EOF(textf) do
  47.       begin
  48.         ReadLn(textf, i);
  49.         Dec(i, min);
  50.         Inc(arr[i]);
  51.       end;
  52.     CloseFile(textf);
  53.  
  54.     AssignFile(textf, anOutFile);
  55.     Rewrite(textf);
  56.     for i := Low(arr) to High(arr) do
  57.       case (arr[i] > 0) of
  58.         True: WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
  59.       end;
  60.     CloseFile(textf);
  61.     SetLength(arr, 0);
  62.   end;
  63. end;
  64.  
  65. var
  66.   t: TDateTime;
  67.  
  68. begin
  69.   CreateTestFile('infile.txt');
  70.   t := Time;
  71.   SortCountViaArray('infile.txt', 'outfile.txt');
  72.   Writeln('Elapsed time: ',DateTimeToTimeStamp(Time -t).Time,' ms');
  73. end.
Title: Re: Sorting and Counting
Post by: mpknap on July 17, 2019, 09:28:26 pm
Thank you gentlemen. At the moment I'm testing the Engkin algorithm. The TXT file is large, sorting takes more than an hour. Tomorrow I will check your other suggestions. I do not really care about speed, because it's not for the program user only for me. It is important that the result is correct.

:)
Title: Re: Sorting and Counting
Post by: jamie on July 17, 2019, 11:13:25 pm
If you want faster you need to create a single chuck of memory that will house all of the file at once..

Load the file into that memory and write some code that shuffles the lines around. The idea is to not allow the memory manager to keep fragmentating the operation.

 Also putting it in a more simpler way, your fields are all the same size which makes this easy and since they are numbers only they can be converted into a Binary number and placed in a array to suites your needs..

  Once in the array you sort the array with simple algorithms.
 
 So step one is the read the file in line by line, convert each entry to a binary number and then store it in the
array..

 Once you have this data stored you then sort the array using something like a bubble sort.

 The array size can be calculated a head of time so you can dynamically create it..

 ArraySize := FileSize Div_NUmber_Chars_PerEntry_Plus2_For CRLF;

 So your array could be this
Array of Int64;
SetLength(MyArray, ArraySize);

etc.
the rest is just code...

Title: Re: Sorting and Counting
Post by: Akira1364 on July 18, 2019, 12:30:37 am
A generic container like TStringList is not well suited to this task on large files.

Well, an actually-generic hashmap that deals directly with the relevant type as opposed to unnecessarily stringifying the values is definitely well-suited to it. Your array solution is certainly quite fast though.

Here's both of ours wrapped up into a single command-line app with an option to tell it which to use:

Code: Pascal  [Select]
  1. program OccurrenceCounter;
  2.  
  3. {$mode Delphi}
  4. {$ImplicitExceptions Off}
  5.  
  6. uses
  7.   SysUtils, DateUtils,
  8.   Generics.Defaults, Generics.Collections;
  9.  
  10. type
  11.   TIntPair = TPair<LongInt, LongInt>;
  12.  
  13.   function ComparePairs(constref L, R: TIntPair): LongInt;
  14.   begin
  15.     if L.Key < R.Key then
  16.       Result := -1
  17.     else if L.Key = R.Key then
  18.       Result := 0
  19.     else
  20.       Result := 1;
  21.   end;
  22.  
  23.   procedure SortCountAkira;
  24.   var
  25.     I: LongInt;
  26.     InOut: Text;
  27.     Map: TDictionary<LongInt, LongInt>;
  28.     Pair: TIntPair;
  29.     Pairs: TArray<TIntPair>;
  30.   begin
  31.     if FileExists(ParamStr(2)) then begin
  32.       Map := TDictionary<LongInt, LongInt>.Create();
  33.       Map.Capacity := 10000000;
  34.       Assign(InOut, ParamStr(2));
  35.       Reset(InOut);
  36.       while not EOF(InOut) do begin
  37.         ReadLn(InOut, I);
  38.         if not Map.ContainsKey(I) then
  39.           Map.Add(I, 1)
  40.         else
  41.           Map[I] := Map[I] + 1;
  42.       end;
  43.       Close(InOut);
  44.       Pairs := Map.ToArray();
  45.       Map.Free();
  46.       TArrayHelper<TIntPair>.Sort(
  47.         Pairs,
  48.         TComparer<TIntPair>.Construct(ComparePairs)
  49.       );
  50.       Assign(InOut, ParamStr(3));
  51.       Rewrite(InOut);
  52.       for Pair in Pairs do with Pair do
  53.         WriteLn(InOut, Key, ' - ', Value);
  54.       Close(InOut);
  55.     end;
  56.   end;
  57.  
  58.   procedure SortCountHoward;
  59.   var
  60.     arr: array of Integer;
  61.     textf: TextFile;
  62.     min: Integer = High(Integer);
  63.     max: Integer = -1;
  64.     i: Integer;
  65.   begin
  66.     if FileExists(ParamStr(2)) then begin
  67.       AssignFile(textf, ParamStr(2));
  68.       Reset(textf);
  69.       while not EOF(textf) do
  70.         begin
  71.           ReadLn(textf, i);
  72.           if i < min then
  73.             min := i;
  74.           if i > max then
  75.             max := i;
  76.         end;
  77.       SetLength(arr, max-min+1);
  78.  
  79.       Reset(textf);
  80.       while not EOF(textf) do
  81.         begin
  82.           ReadLn(textf, i);
  83.           Dec(i, min);
  84.           Inc(arr[i]);
  85.         end;
  86.       CloseFile(textf);
  87.  
  88.       AssignFile(textf, ParamStr(3));
  89.       Rewrite(textf);
  90.       for i := Low(arr) to High(arr) do
  91.         case (arr[i] > 0) of
  92.           True: WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
  93.         end;
  94.       CloseFile(textf);
  95.       SetLength(arr, 0);
  96.     end;
  97.   end;
  98.  
  99. var
  100.   Start: TDateTime;
  101.  
  102. begin
  103.   if ParamCount() <> 3 then
  104.     WriteLn('Usage: occurrencecounter (-akira | -howard) infilename outfilename')
  105.   else if ParamStr(1) = '-akira' then
  106.   begin
  107.     Start := Now();
  108.     SortCountAkira();
  109.     WriteLn(MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
  110.   end
  111.   else if ParamStr(1) = '-howard' then
  112.   begin
  113.     Start := Now();
  114.     SortCountHoward();
  115.     WriteLn(MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
  116.   end
  117.   else
  118.     WriteLn('Usage: occurrencecounter (-akira | -howard) infilename outfilename');  
  119. end.

Also a generator program for the input file (which gives something a little more heavyweight than what people have been using so far):

Code: Pascal  [Select]
  1. program InputGenerator;
  2.  
  3. var
  4.   InFile: Text;
  5.   I: LongInt;
  6.  
  7. begin
  8.   Randomize();
  9.   Assign(InFile, 'infile.txt');
  10.   Rewrite(InFile);
  11.   for I := 1 to 10000000 do
  12.     WriteLn(InFile, 1500000000 + Random(800000));
  13.   Close(InFile);
  14. end.

The TXT file is large, sorting takes more than an hour.

Wait, really? That's thousands and thousands of times too long if it's only 20MB or so.
Title: Re: Sorting and Counting
Post by: avk on July 18, 2019, 03:41:54 am
@Akira1364, it seems your InputGenerator creates more than 99% of unique values.
Title: Re: Sorting and Counting
Post by: Akira1364 on July 18, 2019, 04:13:47 am
@Akira1364, it seems your InputGenerator creates more than 99% of unique values.

I was kind of going for testing the "worst case scenario", but it does seem a bit too worse case now that you point it out. I just edited it a bit in the comment.
Title: Re: Sorting and Counting
Post by: avk on July 18, 2019, 07:43:50 am
I slightly changed the Akira1364's program:
Code: Pascal  [Select]
  1. program OccurrenceCounter;
  2.  
  3. {$mode Delphi}
  4. {$ImplicitExceptions Off}
  5.  
  6. uses
  7.   SysUtils, DateUtils,
  8.   Generics.Defaults, Generics.Collections,
  9.   LGUtils, , LGAbstractContainer, LGHashMultiSet, LGArrayHelpers;
  10.  
  11. type
  12.   TIntPair = TPair<LongInt, LongInt>;
  13.  
  14.   function ComparePairs(constref L, R: TIntPair): LongInt;
  15.   begin
  16.     if L.Key < R.Key then
  17.       Result := -1
  18.     else if L.Key = R.Key then
  19.       Result := 0
  20.     else
  21.       Result := 1;
  22.   end;
  23.  
  24. var
  25.   Total, Unique: Integer;
  26.   Start: TDateTime;
  27.  
  28.   procedure SortCountAkira;
  29.   var
  30.     I: LongInt;
  31.     InOut: Text;
  32.     Map: TDictionary<LongInt, LongInt>;
  33.     Pair: TIntPair;
  34.     Pairs: TArray<TIntPair>;
  35.   begin
  36.     Map := TDictionary<LongInt, LongInt>.Create();
  37.     Map.Capacity := 10000000;
  38.     Assign(InOut, ParamStr(1));
  39.     Reset(InOut);
  40.     while not EOF(InOut) do begin
  41.       ReadLn(InOut, I);
  42.       Inc(Total);
  43.       if not Map.ContainsKey(I) then
  44.         begin
  45.           Map.Add(I, 1);
  46.           Inc(Unique);
  47.         end
  48.       else
  49.         Map[I] := Map[I] + 1;
  50.     end;
  51.     Close(InOut);
  52.     Pairs := Map.ToArray();
  53.     Map.Free();
  54.     TArrayHelper<TIntPair>.Sort(
  55.       Pairs,
  56.       TComparer<TIntPair>.Construct(ComparePairs)
  57.     );
  58.     Assign(InOut, ParamStr(2));
  59.     Rewrite(InOut);
  60.     for Pair in Pairs do with Pair do
  61.       WriteLn(InOut, Key, ' - ', Value);
  62.     Close(InOut);
  63.   end;
  64.  
  65.   procedure SortCountHoward;
  66.   var
  67.     arr: array of Integer;
  68.     textf: TextFile;
  69.     min: Integer = High(Integer);
  70.     max: Integer = -1;
  71.     i: Integer;
  72.   begin
  73.     AssignFile(textf, ParamStr(1));
  74.     Reset(textf);
  75.     while not EOF(textf) do
  76.       begin
  77.         ReadLn(textf, i);
  78.         Inc(Total);
  79.         if i < min then
  80.           min := i;
  81.         if i > max then
  82.           max := i;
  83.       end;
  84.     SetLength(arr, max-min+1);
  85.  
  86.     Reset(textf);
  87.     while not EOF(textf) do
  88.       begin
  89.         ReadLn(textf, i);
  90.         Dec(i, min);
  91.         Inc(arr[i]);
  92.       end;
  93.     CloseFile(textf);
  94.  
  95.     AssignFile(textf, ParamStr(2));
  96.     Rewrite(textf);
  97.     for i := Low(arr) to High(arr) do
  98.       case (arr[i] > 0) of
  99.         True:
  100.           begin
  101.             WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
  102.             Inc(Unique);
  103.           end;
  104.       end;
  105.     CloseFile(textf);
  106.     SetLength(arr, 0);
  107.   end;
  108.  
  109.   function EntryCmp(constref L, R: TGMultisetEntry<Integer>): SizeInt;
  110.   begin
  111.     if L.Key > R.Key then
  112.       Result := 1
  113.     else
  114.       if L.Key < R.Key then
  115.         Result := -1
  116.       else
  117.         Result := 0;
  118.   end;  
  119.  
  120.   procedure SortCountAvk1;
  121.   type
  122.     TCounter  = TGHashMultiSetLP<Integer>;
  123.     TCountRef = TGAutoRef<TCounter>;
  124.     TEntry    = TCounter.TEntry;
  125.   var
  126.     CountRef: TCountRef;
  127.     InOut: Text;
  128.     Counter: TCounter;
  129.     e: TEntry;
  130.     I: Integer;
  131.   begin
  132.     Counter := CountRef;
  133.     Assign(InOut, ParamStr(1));
  134.     Reset(InOut);
  135.     while not EOF(InOut) do
  136.       begin
  137.         ReadLn(InOut, I);
  138.         Counter.Add(I);
  139.       end;
  140.     Close(InOut);
  141.     Total := Counter.Count;
  142.     Unique := Counter.EntryCount;
  143.     if Counter.NonEmpty then
  144.       begin
  145.         Assign(InOut, ParamStr(2));
  146.         Rewrite(InOut);
  147.         for e in Counter.Entries.Sorted(@EntryCmp) do
  148.           with e do
  149.              WriteLn(InOut, Key, ' - ', Count);
  150.         Close(InOut);
  151.       end;
  152.   end;
  153.  
  154.   procedure SortCountAvk2;
  155.   var
  156.     List: array of Integer;
  157.     InOut: Text;
  158.     I, J, Count, DupCount: Integer;
  159.   begin
  160.     Assign(InOut, ParamStr(1));
  161.     Reset(InOut);
  162.     SetLength(List, 4096);
  163.     I := 0;
  164.     while not EOF(InOut) do
  165.       begin
  166.         ReadLn(InOut, J);
  167.         Inc(Total);
  168.         if Length(List) = I then
  169.           SetLength(List, I * 2);
  170.         List[I] := J;
  171.         Inc(I);
  172.       end;
  173.     Close(InOut);
  174.     SetLength(List, I);
  175.     if List = nil then
  176.       exit;
  177.     TGOrdinalArrayHelper<Integer>.Sort(List);
  178.     Count := I;
  179.     DupCount := 0;
  180.     I := 0;
  181.     Assign(InOut, ParamStr(2));
  182.     Rewrite(InOut);
  183.     repeat
  184.       J := List[I];
  185.       while (I < Count) and (List[I] = J) do
  186.         begin
  187.           Inc(DupCount);
  188.           Inc(I);
  189.         end;
  190.       WriteLn(InOut, J, ' - ', DupCount);
  191.       Inc(Unique);
  192.       DupCount := 0;
  193.     until I = Count;
  194.     Close(InOut);
  195.   end;
  196.  
  197.   procedure Run(aProc: TProcedure);
  198.   begin
  199.     Total := 0;
  200.     Unique := 0;
  201.     Start := Now;
  202.     try
  203.       aProc();
  204.       WriteLn('elapsed time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
  205.       WriteLn('total values: ', Total, ', unique values: ', Unique);
  206.     except
  207.       on e: Exception do
  208.         WriteLn('crashes with message "', e.Message, '"');
  209.     end;
  210.   end;
  211.  
  212. begin
  213.   if ParamCount <> 2 then
  214.     begin
  215.       WriteLn('Usage: occurrencecounter infilename outfilename');
  216.       exit;
  217.     end;
  218.   if not FileExists(ParamStr(1)) then
  219.     begin
  220.       WriteLn('Input file "', ParamStr(1), '" not found');
  221.       exit;
  222.     end;
  223.  
  224.   WriteLn('running SortCountAkira:');
  225.   Run(@SortCountAkira);
  226.   WriteLn;
  227.  
  228.   WriteLn('running SortCountHoward:');
  229.   Run(@SortCountHoward);
  230.   WriteLn;
  231.  
  232.  
  233.   WriteLn('running SortCountAvk1:');
  234.   Run(@SortCountAvk1);
  235.   WriteLn;
  236.  
  237.   WriteLn('running SortCountAvk2:');
  238.   Run(@SortCountAvk2);
  239. end.
  240.  
using current version of his InputGenerator output is(32-bit compiler):
Code: Text  [Select]
  1. running SortCountAkira:
  2. elapsed time: 4.7730
  3. total values: 10000000, unique values: 799999
  4.  
  5. running SortCountHoward:
  6. elapsed time: 6.0220
  7. total values: 10000000, unique values: 799999
  8.  
  9. running SortCountAvk1:
  10. elapsed time: 3.7910
  11. total values: 10000000, unique values: 799999
  12.  
  13. running SortCountAvk2:
  14. elapsed time: 2.9480
  15. total values: 10000000, unique values: 799999
  16.  
using previous version:
Code: Text  [Select]
  1. running SortCountAkira:
  2. elapsed time: 14.5700
  3. total values: 10000000, unique values: 9976599
  4.  
  5. running SortCountHoward:
  6. crashes with message "Invalid pointer operation"
  7.  
  8. running SortCountAvk1:
  9. elapsed time: 8.1750
  10. total values: 10000000, unique values: 9976599
  11.  
  12. running SortCountAvk2:
  13. elapsed time: 6.6450
  14. total values: 10000000, unique values: 9976599
  15.  
Title: Re: Sorting and Counting
Post by: Thaddy on July 18, 2019, 08:37:51 am
@ AVK
You can optimize a little with {$I-}{$H-} or did you already do that? {$I-} speeds up writeln consideably.
Also I wouldn't use Randomize() but use a fixed seed, so the file becomes reproducable.
Title: Re: Sorting and Counting
Post by: avk on July 18, 2019, 09:25:34 am
@Thaddy, I think it does not make much sense. The idea was to show that for this task there are ways to solve without shamanic dances with a tambourine and in a reasonable time.
Title: Re: Sorting and Counting
Post by: julkas on July 18, 2019, 10:00:23 am
GMap random test. Out file size ~ 146 MB
Code: Pascal  [Select]
  1. program sc;
  2. {$mode delphi}
  3. uses gmap, gutil, SysUtils;
  4. const
  5.   keyNum = 10000000;
  6. type
  7.   TIntLess = TLess<LongInt>;
  8.   TDict = TMap<LongInt, LongInt, TIntLess>;
  9. var
  10.   sc: TDict;
  11.   scit: TDict.TIterator;
  12.   i: LongInt;
  13.   key, cnt: LongInt;
  14.   start: QWord;
  15.   outFile: Text;
  16. begin
  17.   start := GetTickCount64();
  18.   sc := TDict.Create;
  19.   for i := 0 to keyNum do
  20.   begin
  21.     key := Random(2147483647);
  22.     cnt := 0;
  23.     sc.TryGetValue(key, cnt);
  24.     sc[key] := cnt + 1;
  25.   end;
  26.   WriteLn('Populated (ticks) - ', GetTickCount64() - start);
  27.   WriteLn('Uniq keys - ', sc.Size, ', out of - ', keyNum);
  28.   Assign(outFile, 'out.txt');
  29.   Rewrite(outFile);
  30.   scit := sc.Min;
  31.   repeat
  32.     WriteLn(outFile, scit.Key, ' - ', scit.Value)
  33.   until not scit.Next;
  34.   Close(outFile);
  35.   scit.Destroy;
  36.   sc.Destroy;
  37.   WriteLn('Total (ticks) - ', GetTickCount64() - start);
  38.   ReadLn;
  39. end.
  40.  
Console output -
Code: Text  [Select]
  1. Populated (ticks) - 17297
  2. Uniq keys - 9976566, out of - 10000000
  3. Total (ticks) - 23906
Title: Re: Sorting and Counting
Post by: julkas on July 18, 2019, 11:06:58 am
Thank you gentlemen. At the moment I'm testing the Engkin algorithm. The TXT file is large, sorting takes more than an hour. Tomorrow I will check your other suggestions. I do not really care about speed, because it's not for the program user only for me. It is important that the result is correct.
Can you share input file?
Title: Re: Sorting and Counting
Post by: SymbolicFrank on July 18, 2019, 12:39:15 pm
I did this for more than 25 million postal codes. It was really slow with the default containers and took a lot of memory. I made a custom class that used a smart way to insert new values, which reduced it to less than 15 minutes and 2 GB memory usage. But I stored more than one number in more than one container. Requesting a postal code was nearly instant.

I kept a small, loose index and remembered the last few insertions. After that, I just tried to insert a new entry halfway between the two nearest indexed items, and then repeatedly halfway until a hit was found. Which took less than 5 tries on average.
Title: Re: Sorting and Counting
Post by: Thaddy on July 18, 2019, 12:49:02 pm
It may be a lot quicker with a memory mapped file approach. Especially on very long files. Then the readln is from memory most of the time, not from disk.
Title: Re: Sorting and Counting
Post by: SymbolicFrank on July 18, 2019, 01:22:15 pm
I made my own SaveToFile and LoadFromFile, which created a text file that was 4.5 GB in size, IIRC. It took 2 minutes to load that into memory. But less than a millisecond to query.

Btw, this was on a fast Linux sever with lots of memory and SSDs.
Title: Re: Sorting and Counting
Post by: julkas on July 18, 2019, 01:48:32 pm
In a sense, it's homework, but it's for me. I'm not a programmer by education, just a hobby.
Check also this - https://en.wikipedia.org/wiki/External_sorting.
Title: Re: Sorting and Counting
Post by: mangakissa on July 18, 2019, 02:19:49 pm
My sample with quicksort and a 30 mb file size takes 69 seconds WITHOUT ANY TSTRINGLIST.
Just old program developping.
Title: Re: Sorting and Counting
Post by: SymbolicFrank on July 18, 2019, 02:37:03 pm
Storage in memory is also interesting. A linked list allows fast inserts, but is slow to access. An array has to be shifted. And indexes consist of arrays as well.

IIRC, I ended up with 100 separate arrays that all did a subset. That was fastest overall. And finding the right one was based on quicksort.
Title: Re: Sorting and Counting
Post by: Akira1364 on July 18, 2019, 02:49:45 pm
I slightly changed the Akira1364's program:

Yeah, LGenerics is really good. The MultiSet is definitely an even more appropriate data structure than a hashmap (which TDictionary is) for this.
Title: Re: Sorting and Counting
Post by: avk on July 18, 2019, 04:05:49 pm
Yeah, LGenerics is really good...
Nice to hear kindly words from you. :)
Title: Re: Sorting and Counting
Post by: Akira1364 on July 18, 2019, 04:19:41 pm
Nice to hear kindly words from you. :)

Oh, I didn't realize you were the author! Great job on it, again. It's by far the most advanced / feature-complete generics library ever written in Object Pascal.

On another note, I'm still extremely curious as to what "mpknap" is doing with his file currently that takes over an hour... is he using BogoSort or something? That's just crazily long.
Title: Re: Sorting and Counting
Post by: avk on July 18, 2019, 04:28:05 pm
Thank you very much.
@mangakissa, can you submit full version of your solution?
Title: Re: Sorting and Counting
Post by: mangakissa on July 18, 2019, 04:52:20 pm
the whole project  :D
Title: Re: Sorting and Counting
Post by: avk on July 18, 2019, 06:33:03 pm
@julkas, @mangakissa
I changed your code a bit and pasted it into the Akira1364's app:
Code: Pascal  [Select]
  1. program OccurrenceCounter;
  2.  
  3. {$mode Delphi}
  4. {$ImplicitExceptions Off}
  5. {$MODESWITCH NESTEDPROCVARS}
  6.  
  7. uses
  8.   SysUtils, DateUtils,
  9.   Generics.Defaults, Generics.Collections,
  10.   LGUtils, LGHashMultiSet, LGArrayHelpers,
  11.   gutil, gmap;
  12.  
  13. type
  14.   TIntPair = TPair<LongInt, LongInt>;
  15.  
  16.   function ComparePairs(constref L, R: TIntPair): LongInt;
  17.   begin
  18.     if L.Key < R.Key then
  19.       Result := -1
  20.     else if L.Key = R.Key then
  21.       Result := 0
  22.     else
  23.       Result := 1;
  24.   end;
  25.  
  26. var
  27.   Total, Unique: Integer;
  28.   Start: TDateTime;
  29.  
  30.   procedure SortCountAkira;
  31.   var
  32.     I: LongInt;
  33.     InOut: Text;
  34.     Map: TDictionary<LongInt, LongInt>;
  35.     Pair: TIntPair;
  36.     Pairs: TArray<TIntPair>;
  37.   begin
  38.     Map := TDictionary<LongInt, LongInt>.Create();
  39.     Map.Capacity := 10000000;
  40.     Assign(InOut, ParamStr(1));
  41.     Reset(InOut);
  42.     while not EOF(InOut) do begin
  43.       ReadLn(InOut, I);
  44.       Inc(Total);
  45.       if not Map.ContainsKey(I) then
  46.         begin
  47.           Map.Add(I, 1);
  48.           Inc(Unique);
  49.         end
  50.       else
  51.         Map[I] := Map[I] + 1;
  52.     end;
  53.     Close(InOut);
  54.     Pairs := Map.ToArray();
  55.     Map.Free();
  56.     TArrayHelper<TIntPair>.Sort(
  57.       Pairs,
  58.       TComparer<TIntPair>.Construct(ComparePairs)
  59.     );
  60.     Assign(InOut, ParamStr(2));
  61.     Rewrite(InOut);
  62.     for Pair in Pairs do with Pair do
  63.       WriteLn(InOut, Key, ' - ', Value);
  64.     Close(InOut);
  65.   end;
  66.  
  67.   procedure SortCountHoward;
  68.   var
  69.     arr: array of Integer;
  70.     textf: TextFile;
  71.     min: Integer = High(Integer);
  72.     max: Integer = -1;
  73.     i: Integer;
  74.   begin
  75.     AssignFile(textf, ParamStr(1));
  76.     Reset(textf);
  77.     while not EOF(textf) do
  78.       begin
  79.         ReadLn(textf, i);
  80.         Inc(Total);
  81.         if i < min then
  82.           min := i;
  83.         if i > max then
  84.           max := i;
  85.       end;
  86.     SetLength(arr, max-min+1);
  87.  
  88.     Reset(textf);
  89.     while not EOF(textf) do
  90.       begin
  91.         ReadLn(textf, i);
  92.         Dec(i, min);
  93.         Inc(arr[i]);
  94.       end;
  95.     CloseFile(textf);
  96.  
  97.     AssignFile(textf, ParamStr(2));
  98.     Rewrite(textf);
  99.     for i := Low(arr) to High(arr) do
  100.       case (arr[i] > 0) of
  101.         True:
  102.           begin
  103.             WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
  104.             Inc(Unique);
  105.           end;
  106.       end;
  107.     CloseFile(textf);
  108.     SetLength(arr, 0);
  109.   end;
  110.  
  111.   procedure SortCountAvk1;
  112.   type
  113.     TCounter  = TGHashMultiSetLP<Integer>;
  114.     TCountRef = TGAutoRef<TCounter>;
  115.     TEntry    = TCounter.TEntry;
  116.  
  117.     function EntryCmp(constref L, R: TEntry): SizeInt;
  118.     begin
  119.       if L.Key > R.Key then
  120.         Result := 1
  121.       else
  122.         if L.Key < R.Key then
  123.           Result := -1
  124.         else
  125.           Result := 0;
  126.     end;
  127.  
  128.   var
  129.     CountRef: TCountRef;
  130.     InOut: Text;
  131.     Counter: TCounter;
  132.     e: TEntry;
  133.     I: Integer;
  134.   begin
  135.     Counter := CountRef;
  136.     Counter.LoadFactor := 0.7;
  137.     Assign(InOut, ParamStr(1));
  138.     Reset(InOut);
  139.     while not EOF(InOut) do
  140.       begin
  141.         ReadLn(InOut, I);
  142.         Counter.Add(I);
  143.       end;
  144.     Close(InOut);
  145.     Total := Counter.Count;
  146.     Unique := Counter.EntryCount;
  147.     if Counter.NonEmpty then
  148.       begin
  149.         Assign(InOut, ParamStr(2));
  150.         Rewrite(InOut);
  151.         for e in Counter.Entries.Sorted(EntryCmp) do
  152.           with e do
  153.             WriteLn(InOut, Key, ' - ', Count);
  154.         Close(InOut);
  155.       end;
  156.   end;
  157.  
  158.   procedure SortCountAvk2;
  159.   var
  160.     List: array of Integer;
  161.     InOut: Text;
  162.     I, J, Count, DupCount: Integer;
  163.   begin
  164.     Assign(InOut, ParamStr(1));
  165.     Reset(InOut);
  166.     SetLength(List, 4096);
  167.     I := 0;
  168.     while not EOF(InOut) do
  169.       begin
  170.         ReadLn(InOut, J);
  171.         Inc(Total);
  172.         if Length(List) = I then
  173.           SetLength(List, I * 2);
  174.         List[I] := J;
  175.         Inc(I);
  176.       end;
  177.     Close(InOut);
  178.     SetLength(List, I);
  179.     if List = nil then
  180.       exit;
  181.     TGOrdinalArrayHelper<Integer>.Sort(List);
  182.     Count := I;
  183.     DupCount := 0;
  184.     I := 0;
  185.     Assign(InOut, ParamStr(2));
  186.     Rewrite(InOut);
  187.     repeat
  188.       J := List[I];
  189.       while (I < Count) and (List[I] = J) do
  190.         begin
  191.           Inc(DupCount);
  192.           Inc(I);
  193.         end;
  194.       WriteLn(InOut, J, ' - ', DupCount);
  195.       Inc(Unique);
  196.       DupCount := 0;
  197.     until I = Count;
  198.     Close(InOut);
  199.   end;
  200.  
  201.   procedure SortCountJulkas;
  202.   type
  203.     TIntLess = TLess<LongInt>;
  204.     TDict = TMap<LongInt, LongInt, TIntLess>;
  205.   var
  206.     sc: TDict;
  207.     scit: TDict.TIterator;
  208.     InOut: Text;
  209.     key, cnt: LongInt;
  210.   begin
  211.     sc := TDict.Create;
  212.     Assign(InOut, ParamStr(1));
  213.     Reset(InOut);
  214.     while not EOF(InOut) do
  215.       begin
  216.         ReadLn(InOut, key);
  217.         Inc(Total);
  218.         cnt := 0;
  219.         sc.TryGetValue(key, cnt);
  220.         sc[key] := cnt + 1;
  221.       end;
  222.     Close(InOut);
  223.     Unique := sc.Size;
  224.     if Unique > 0 then
  225.       begin
  226.         Assign(InOut, ParamStr(2));
  227.         Rewrite(InOut);
  228.         scit := sc.Min;
  229.         repeat
  230.           WriteLn(InOut, scit.Key, ' - ', scit.Value);
  231.         until not scit.Next;
  232.         Close(InOut);
  233.         scit.Free;
  234.       end;
  235.     sc.Free;
  236.   end;
  237.  
  238.   procedure SortCountMangakissa;
  239.   type
  240.     TMyTime = packed record
  241.       Unixtime : Integer;
  242.       Counter  : word;
  243.     end;
  244.  
  245.     function Find(var aMyTime : array of TMytime; aLine : Integer) : boolean;
  246.     var index : integer;
  247.     begin
  248.       result := false;
  249.       if length(aMyTime) > 0 then
  250.       begin
  251.         for index := low(aMyTime) to high(aMyTime) do
  252.         begin
  253.           if aMyTime[index].Unixtime = aLine then
  254.           begin
  255.             aMyTime[index].Counter := aMyTime[index].Counter + 1;
  256.             result := true;
  257.             break;
  258.           end;
  259.         end;
  260.       end;
  261.     end;
  262.  
  263.     procedure QuickSort(var A: array of tMytime; iLo, iHi: Integer) ;
  264.      var
  265.        Lo, Hi, Pivot : Integer;
  266.        T             : TMyTime;
  267.      begin
  268.        Lo := iLo;
  269.        Hi := iHi;
  270.        Pivot := A[(Lo + Hi) div 2].Unixtime;
  271.        repeat
  272.          while A[Lo].Unixtime < Pivot do Inc(Lo) ;
  273.          while A[Hi].Unixtime > Pivot do Dec(Hi) ;
  274.          if Lo <= Hi then
  275.          begin
  276.            T := A[Lo];
  277.            A[Lo] := A[Hi];
  278.            A[Hi] := T;
  279.            Inc(Lo) ;
  280.            Dec(Hi) ;
  281.          end;
  282.        until Lo > Hi;
  283.        if Hi > iLo then QuickSort(A, iLo, Hi) ;
  284.        if Lo < iHi then QuickSort(A, Lo, iHi) ;
  285.      end;
  286.   var
  287.     InOut         : Text;
  288.     MyTime        : array of TMyTime = nil;
  289.     Item          : TMyTime;
  290.     myline, index : integer;
  291.   begin
  292.     Assign(InOut, ParamStr(1));
  293.     Reset(InOut);
  294.     index := 0;
  295.     while not EOF(InOut) do
  296.       begin
  297.         ReadLn(InOut, myline);
  298.         Inc(Total);
  299.         if not find(MyTime, myline) then
  300.         begin
  301.           index := index + 1;
  302.           setlength(MyTime,index);
  303.           MyTime[index - 1].Unixtime := myLine;
  304.           MyTime[index - 1].Counter := 1;
  305.         end;
  306.       end;
  307.     Close(InOut);
  308.     if MyTime = nil then
  309.       exit;
  310.     Unique := Length(MyTime);
  311.     QuickSort(Mytime, low(MyTime), high(mytime));
  312.     Assign(InOut, ParamStr(2));
  313.     Rewrite(InOut);
  314.     for Item in Mytime do
  315.       with Item do
  316.         WriteLn(InOut, Unixtime, ' - ', Counter);
  317.     Close(InOut);
  318.   end;
  319.  
  320.   procedure Run(aProc: TProcedure);
  321.   begin
  322.     Total := 0;
  323.     Unique := 0;
  324.     Start := Now;
  325.     try
  326.       aProc();
  327.       WriteLn('elapsed time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
  328.       WriteLn('#total: ', Total, ', #unique: ', Unique);
  329.     except
  330.       on e: Exception do
  331.         WriteLn('crashes with message "', e.Message, '"');
  332.     end;
  333.   end;
  334.  
  335. begin
  336.   if ParamCount <> 2 then
  337.     begin
  338.       WriteLn('Usage: OccurrenceCounter infilename outfilename');
  339.       exit;
  340.     end;
  341.   if not FileExists(ParamStr(1)) then
  342.     begin
  343.       WriteLn('Input file "', ParamStr(1), '" not found');
  344.       exit;
  345.     end;
  346.  
  347.   WriteLn('running SortCountAkira:');
  348.   Run(SortCountAkira);
  349.   WriteLn;
  350.  
  351.   WriteLn('running SortCountHoward:');
  352.   Run(@SortCountHoward);
  353.   WriteLn;
  354.  
  355.  
  356.   WriteLn('running SortCountAvk1:');
  357.   Run(SortCountAvk1);
  358.   WriteLn;
  359.  
  360.   WriteLn('running SortCountAvk2:');
  361.   Run(SortCountAvk2);
  362.   WriteLn;
  363.  
  364.   WriteLn('running SortCountJulkas:');
  365.   Run(SortCountJulkas);
  366.   WriteLn;
  367.  
  368.   WriteLn('running SortCountMangakissa:');
  369.   Run(SortCountMangakissa);
  370. end.
  371.  
Results for 20MB file:
Code: Text  [Select]
  1. running SortCountAkira:
  2. elapsed time: 1.6700
  3. #total: 1750000, #unique: 710248
  4.  
  5. running SortCountHoward:
  6. elapsed time: 1.5500
  7. #total: 1750000, #unique: 710248
  8.  
  9. running SortCountAvk1:
  10. elapsed time: 1.0100
  11. #total: 1750000, #unique: 710248
  12.  
  13. running SortCountAvk2:
  14. elapsed time: 0.7200
  15. #total: 1750000, #unique: 710248
  16.  
  17. running SortCountJulkas:
  18. elapsed time: 2.6600
  19. #total: 1750000, #unique: 710248
  20.  
  21. running SortCountMangakissa:
  22. elapsed time: 346.6510
  23. #total: 1750000, #unique: 710248
  24.  
  25.  
Title: Re: Sorting and Counting
Post by: Akira1364 on July 18, 2019, 07:44:00 pm
@Avk:

What do you get if you change the

Code: Pascal  [Select]
  1. Map.Capacity := 10000000;

in mine to something closer to the expected total for the 20MB file (as opposed to the 100+ MB one.)

Like:

Code: Pascal  [Select]
  1. Map.Capacity := 1750000;

or whatever. Or even if you just comment out that line completely, as for that much smaller expected input it may not be worth it at all over how TDictionary pre-allocates internally already.

Also FWIW, I'm quite sure that for Mangakissa's version, it's this part:

Code: Pascal  [Select]
  1. index := index + 1;
  2. setlength(MyTime,index);

that hurts the performance.
Title: Re: Sorting and Counting
Post by: julkas on July 18, 2019, 07:44:44 pm
@avk thanks for benchmark.
It's old but very good problem. You can learn something new - sorting algorithm, hashing, counting, ...
My approach is not fastest, but clean and short. I use only one class from FPC 3.0.4. May be tomorrow I will post faster variant.
Title: Re: Sorting and Counting
Post by: 440bx on July 18, 2019, 07:56:39 pm
@avk

did you use Akira's InputGenerator program to generate the input file ?

One reason I'm asking is because the code I copied from Akira's post generates a 120MB file.  Did you use a modified version ?  If so, could you also post whatever program you used to generate the input file ?

Thank you.


Title: Re: Sorting and Counting
Post by: Akira1364 on July 18, 2019, 08:12:57 pm
did you use Akira's InputGenerator program to generate the input file ?

I imagine he used the original "20MB" code of his from an earlier comment for that one.

If you change my generator program to look like this:

Code: Pascal  [Select]
  1. program InputGenerator;
  2.  
  3. var
  4.   InFile: Text;
  5.   I: LongInt;
  6.  
  7. begin
  8.   Randomize();
  9.   Assign(InFile, 'infile.txt');
  10.   Rewrite(InFile);
  11.   for I := 1 to 1750000 do
  12.     WriteLn(InFile, 1500000000 + Random(800000));
  13.   Close(InFile);
  14. end.

It will create the smaller 20MB file (with 1750000 total entries, versus the 10000000 total entries that the 100+ MB file has.)
Title: Re: Sorting and Counting
Post by: 440bx on July 18, 2019, 08:25:35 pm
I imagine he used the original "20MB" code of his from an earlier comment for that one.
I will use that. Thank you Akira.
Title: Re: Sorting and Counting
Post by: avk on July 18, 2019, 08:30:02 pm
What do you get if you change the ...

just commented:
Code: Text  [Select]
  1. running SortCountAkira:
  2. elapsed time: 1.3890
  3. #total: 1750000, #unique: 710248
  4.  
  5. running SortCountHoward:
  6. elapsed time: 1.5600
  7. #total: 1750000, #unique: 710248
  8.  
  9. running SortCountAvk1:
  10. elapsed time: 0.9980
  11. #total: 1750000, #unique: 710248
  12.  
  13. running SortCountAvk2:
  14. elapsed time: 0.7490
  15. #total: 1750000, #unique: 710248
  16.  
  17. running SortCountJulkas:
  18. elapsed time: 2.6210
  19. #total: 1750000, #unique: 710248
  20.  
  21. running SortCountMangakissa:
  22. elapsed time: 342.3730
  23. #total: 1750000, #unique: 710248
  24.  

Map.Capacity := 1750000:
Code: Text  [Select]
  1. running SortCountAkira:
  2. elapsed time: 1.4040
  3. #total: 1750000, #unique: 710248
  4.  
  5. running SortCountHoward:
  6. elapsed time: 1.6130
  7. #total: 1750000, #unique: 710248
  8.  
  9. running SortCountAvk1:
  10. elapsed time: 1.0440
  11. #total: 1750000, #unique: 710248
  12.  
  13. running SortCountAvk2:
  14. elapsed time: 0.7430
  15. #total: 1750000, #unique: 710248
  16.  
  17. running SortCountJulkas:
  18. elapsed time: 2.7800
  19. #total: 1750000, #unique: 710248
  20.  

Also FWIW, I'm quite sure that for Mangakissa's version, it's this part ...
It also seems that the array search has a quadratic complexity.

@440bx
These results was for 20MB file because of SortCountMangakissa.

For current version Akira's InputGenerator:
Code: Text  [Select]
  1. running SortCountAkira:
  2. elapsed time: 5.0910
  3. #total: 10000000, #unique: 799999
  4.  
  5. running SortCountHoward:
  6. elapsed time: 6.3260
  7. #total: 10000000, #unique: 799999
  8.  
  9. running SortCountAvk1:
  10. elapsed time: 4.1010
  11. #total: 10000000, #unique: 799999
  12.  
  13. running SortCountAvk2:
  14. elapsed time: 3.0760
  15. #total: 10000000, #unique: 799999
  16.  
  17. running SortCountJulkas:
  18. elapsed time: 14.3140
  19. #total: 10000000, #unique: 799999
  20.  
Title: Re: Sorting and Counting
Post by: mpknap on July 18, 2019, 08:41:06 pm
Gentlemen, I do not have time to deal with testing your algorithms. I will deal with Monday, on holidays;).
I am also glad that the topic is interesting.

Yesterday I wrote that the Engkin function has been working for an hour. She worked longer until the night was blue screen because the laptop was reset. But the fault was the TXT file. There was an empty line between the lines with the time Stamp. Today I made a TXT file without empty lines and after a few minutes the Out.txt file appeared.
I'm happy because that's what I meant :)

Soon I will check your other ideas. I was unsure that I will get such an encyclopedia from you;)
Title: Re: Sorting and Counting
Post by: 440bx on July 18, 2019, 08:54:57 pm
@440bx
For current version Akira's InputGenerator:
Excellent!.  Thank you Avk.
Title: Re: Sorting and Counting
Post by: howardpc on July 19, 2019, 12:14:51 am
To see what effect varying the size of the data file, and varying the number of unique values to be counted might have on the faster algorithms presented here I adapted Akira1364's program as follows.
Code: Pascal  [Select]
  1. program OccurrenceCounter;
  2.  
  3. {$mode delphi}
  4. {$ImplicitExceptions Off}
  5. {$MODESWITCH NESTEDPROCVARS}
  6.  
  7. uses
  8.   SysUtils, DateUtils,
  9.   Generics.Defaults, Generics.Collections,
  10.   LGUtils, LGHashMultiSet, LGArrayHelpers,
  11.   gutil, gmap;
  12.  
  13. type
  14.   TIntPair = TPair<LongInt, LongInt>;
  15.   TProcedureArray = array of procedure;
  16.  
  17.   function ComparePairs(constref L, R: TIntPair): LongInt;
  18.   begin
  19.     if L.Key < R.Key then
  20.       Result := -1
  21.     else if L.Key = R.Key then
  22.       Result := 0
  23.     else
  24.       Result := 1;
  25.   end;
  26.  
  27. var
  28.   Total, Unique, repeatCount, randomrange: Integer;
  29.   Start: TDateTime;
  30.   inFilename: String = 'data.txt';
  31.   outFilename: String = 'sorted.txt';
  32.   routineName: String;
  33.   procedures: TProcedureArray;
  34.   proc: procedure;
  35.  
  36.   procedure GenerateData(randomRange: Integer=8; repeatMillionsCount: Integer=2);
  37.   var
  38.     InFile: Text;
  39.     I: LongInt;
  40.   begin
  41.     Assign(InFile, inFilename);
  42.     Rewrite(InFile);
  43.     for I := 1 to repeatMillionsCount * 1000000 do
  44.       WriteLn(InFile, 1500000000 + Random(randomRange * 100000));
  45.     Close(InFile);
  46.   end;
  47.  
  48.   procedure SortCountAkira;
  49.   var
  50.     I: LongInt;
  51.     InOut: Text;
  52.     Map: TDictionary<LongInt, LongInt>;
  53.     Pair: TIntPair;
  54.     Pairs: TArray<TIntPair>;
  55.   begin
  56.     routineName := {$I %currentroutine%};
  57.     Map := TDictionary<LongInt, LongInt>.Create();
  58.     Map.Capacity := 10000000;
  59.     Assign(InOut, inFilename);
  60.     Reset(InOut);
  61.     while not EOF(InOut) do begin
  62.       ReadLn(InOut, I);
  63.       Inc(Total);
  64.       if not Map.ContainsKey(I) then
  65.         begin
  66.           Map.Add(I, 1);
  67.           Inc(Unique);
  68.         end
  69.       else
  70.         Map[I] := Map[I] + 1;
  71.     end;
  72.     Close(InOut);
  73.     Pairs := Map.ToArray();
  74.     Map.Free();
  75.     TArrayHelper<TIntPair>.Sort(
  76.       Pairs,
  77.       TComparer<TIntPair>.Construct(ComparePairs)
  78.     );
  79.     Assign(InOut, outFilename);
  80.     Rewrite(InOut);
  81.     for Pair in Pairs do with Pair do
  82.       WriteLn(InOut, Key, ' - ', Value);
  83.     Close(InOut);
  84.   end;
  85.  
  86.   procedure SortCountHoward;
  87.   var
  88.     arr: array of Integer;
  89.     textf: TextFile;
  90.     min: Integer = High(Integer);
  91.     max: Integer = -1;
  92.     i: Integer;
  93.   begin
  94.     routineName := {$I %currentroutine%};
  95.     AssignFile(textf, inFilename);
  96.     Reset(textf);
  97.     while not EOF(textf) do
  98.       begin
  99.         ReadLn(textf, i);
  100.         Inc(Total);
  101.         if i < min then
  102.           min := i;
  103.         if i > max then
  104.           max := i;
  105.       end;
  106.     SetLength(arr, max-min+1);
  107.  
  108.     Reset(textf);
  109.     while not EOF(textf) do
  110.       begin
  111.         ReadLn(textf, i);
  112.         Dec(i, min);
  113.         Inc(arr[i]);
  114.       end;
  115.     CloseFile(textf);
  116.  
  117.     AssignFile(textf, outFilename);
  118.     Rewrite(textf);
  119.     for i := Low(arr) to High(arr) do
  120.       case (arr[i] > 0) of
  121.         True:
  122.           begin
  123.             WriteLn(textf, i+min, ' - ', arr[i]);
  124.             Inc(Unique);
  125.           end;
  126.       end;
  127.     CloseFile(textf);
  128.     SetLength(arr, 0);
  129.   end;
  130.  
  131.   procedure SortCountAvk1;
  132.   type
  133.     TCounter  = TGHashMultiSetLP<Integer>;
  134.     TCountRef = TGAutoRef<TCounter>;
  135.     TEntry    = TCounter.TEntry;
  136.  
  137.     function EntryCmp(constref L, R: TEntry): SizeInt;
  138.     begin
  139.       if L.Key > R.Key then
  140.         Result := 1
  141.       else
  142.         if L.Key < R.Key then
  143.           Result := -1
  144.         else
  145.           Result := 0;
  146.     end;
  147.  
  148.   var
  149.     CountRef: TCountRef;
  150.     InOut: Text;
  151.     Counter: TCounter;
  152.     e: TEntry;
  153.     I: Integer;
  154.   begin
  155.     routineName := {$I %currentroutine%};
  156.     Counter := CountRef;
  157.     Counter.LoadFactor := 0.7;
  158.     Assign(InOut, inFilename);
  159.     Reset(InOut);
  160.     while not EOF(InOut) do
  161.       begin
  162.         ReadLn(InOut, I);
  163.         Counter.Add(I);
  164.       end;
  165.     Close(InOut);
  166.     Total := Counter.Count;
  167.     Unique := Counter.EntryCount;
  168.     if Counter.NonEmpty then
  169.       begin
  170.         Assign(InOut, outFilename);
  171.         Rewrite(InOut);
  172.         for e in Counter.Entries.Sorted(EntryCmp) do
  173.           with e do
  174.             WriteLn(InOut, Key, ' - ', Count);
  175.         Close(InOut);
  176.       end;
  177.   end;
  178.  
  179.   procedure SortCountAvk2;
  180.   var
  181.     List: array of Integer;
  182.     InOut: Text;
  183.     I, J, Count, DupCount: Integer;
  184.   begin
  185.     routineName := {$I %currentroutine%};
  186.     Assign(InOut, inFilename);
  187.     Reset(InOut);
  188.     SetLength(List, 4096);
  189.     I := 0;
  190.     while not EOF(InOut) do
  191.       begin
  192.         ReadLn(InOut, J);
  193.         Inc(Total);
  194.         if Length(List) = I then
  195.           SetLength(List, I * 2);
  196.         List[I] := J;
  197.         Inc(I);
  198.       end;
  199.     Close(InOut);
  200.     SetLength(List, I);
  201.     if List = nil then
  202.       exit;
  203.     TGOrdinalArrayHelper<Integer>.Sort(List);
  204.     Count := I;
  205.     DupCount := 0;
  206.     I := 0;
  207.     Assign(InOut, outFilename);
  208.     Rewrite(InOut);
  209.     repeat
  210.       J := List[I];
  211.       while (I < Count) and (List[I] = J) do
  212.         begin
  213.           Inc(DupCount);
  214.           Inc(I);
  215.         end;
  216.       WriteLn(InOut, J, ' - ', DupCount);
  217.       Inc(Unique);
  218.       DupCount := 0;
  219.     until I = Count;
  220.     Close(InOut);
  221.   end;
  222.  
  223.   procedure SortCountJulkas;
  224.   type
  225.     TIntLess = TLess<LongInt>;
  226.     TDict = TMap<LongInt, LongInt, TIntLess>;
  227.   var
  228.     sc: TDict;
  229.     scit: TDict.TIterator;
  230.     InOut: Text;
  231.     key, cnt: LongInt;
  232.   begin
  233.     routineName := {$I %currentroutine%};
  234.     sc := TDict.Create;
  235.     Assign(InOut, inFilename);
  236.     Reset(InOut);
  237.     while not EOF(InOut) do
  238.       begin
  239.         ReadLn(InOut, key);
  240.         Inc(Total);
  241.         cnt := 0;
  242.         sc.TryGetValue(key, cnt);
  243.         sc[key] := cnt + 1;
  244.       end;
  245.     Close(InOut);
  246.     Unique := sc.Size;
  247.     if Unique > 0 then
  248.       begin
  249.         Assign(InOut, outFilename);
  250.         Rewrite(InOut);
  251.         scit := sc.Min;
  252.         repeat
  253.           WriteLn(InOut, scit.Key, ' - ', scit.Value);
  254.         until not scit.Next;
  255.         Close(InOut);
  256.         scit.Free;
  257.       end;
  258.     sc.Free;
  259.   end;
  260.  
  261.   procedure Run(aProc: TProcedure);
  262.   begin
  263.     Total := 0;
  264.     Unique := 0;
  265.     Start := Now;
  266.     try
  267.       aProc();
  268.       WriteLn(Copy(routineName, 10, 20):7,'''s time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4,' #unique: ',Unique,' #total: ',Total);
  269.     except
  270.       on e: Exception do
  271.         WriteLn('crashes with message "', e.Message, '"');
  272.     end;
  273.   end;
  274.  
  275. begin
  276.   Randomize;
  277.  
  278.   procedures := TProcedureArray.Create(SortCountJulkas, SortCountAkira, SortCountHoward, SortCountAvk1, SortCountAvk2);
  279.  
  280.   for randomrange := 1 to 10 do
  281.     begin
  282.       GenerateData(randomrange);
  283.       WriteLn(#10'RandomRange = ',randomrange);
  284.       for proc in procedures do
  285.         Run(proc);
  286.     end;
  287.  
  288.   for repeatCount := 1 to 10 do
  289.     begin
  290.       GenerateData(8, 2*repeatCount);
  291.       WriteLn(#10'repeatMillionsCount = ', 2*repeatCount);
  292.       for proc in procedures do
  293.         Run(proc);
  294.     end;
  295. end.
Some typical output is as follows:
Code: Pascal  [Select]
  1. repeatMillionsCount = 16
  2.  Julkas's time: 15.8210 #unique: 800000 #total: 16000000
  3.  Akira's time: 4.7230 #unique: 800000 #total: 16000000
  4.  Howard's time: 4.4930 #unique: 800000 #total: 16000000
  5.   Avk1's time: 3.9390 #unique: 800000 #total: 16000000
  6.    Avk2's time: 2.4480 #unique: 800000 #total: 16000000
  7.  
  8. repeatMillionsCount = 18
  9. Julkas's time: 17.4640 #unique: 800000 #total: 18000000
  10.   Akira's time: 5.2670 #unique: 800000 #total: 18000000
  11. Howard's time: 5.0560 #unique: 800000 #total: 18000000
  12.    Avk1's time: 4.4020 #unique: 800000 #total: 18000000
  13.   Avk2's time: 2.7890 #unique: 800000 #total: 18000000
  14.  
  15. repeatMillionsCount = 20
  16.  Julkas's time: 19.2770 #unique: 800000 #total: 20000000
  17.  Akira's time: 5.7910 #unique: 800000 #total: 20000000
  18.  Howard's time: 5.6010 #unique: 800000 #total: 20000000
  19.   Avk1's time: 4.8510 #unique: 800000 #total: 20000000
  20.    Avk2's time: 3.0690 #unique: 800000 #total: 20000000
For avk to design an algorithm that can analyse a 120 MB text file of 20,000,000 items, sort and count its data and and write out the result to a second file in 3 seconds on my ageing machine is pretty impressive.
Title: Re: Sorting and Counting
Post by: mangakissa on July 19, 2019, 08:36:47 am
Can this be done with {$mode objfpc}?
Title: Re: Sorting and Counting
Post by: Thaddy on July 19, 2019, 09:20:20 am
Can this be done with {$mode objfpc}?
Obviously, yes!, See Howards comments above.
Title: Re: Sorting and Counting
Post by: avk on July 19, 2019, 09:27:31 am
@howardpc, nice, very suitable.
Heh, highly likely my machine is even older than yours(comparing runtimes).
But as for my "algorithm", it seems that you are flattering me, it's just sorting an array and then counting duplicates.
Array helpers do about the same to extract unique(distinct) values.
Your counting sort does the same thing, only at the same time.
I'm more glad for the multiset, it seems it shows itself well.

@mangakissa, sure, just changed my version:
Code: Pascal  [Select]
  1. program OccurrenceCounter;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$ImplicitExceptions Off}
  5. {$MODESWITCH NESTEDPROCVARS}
  6.  
  7. uses
  8.   SysUtils, DateUtils,
  9.   Generics.Defaults, Generics.Collections,
  10.   LGUtils, LGHashMultiSet, LGArrayHelpers,
  11.   gutil, gmap;
  12.  
  13. type
  14.   TIntPair = specialize TPair<LongInt, LongInt>;
  15.  
  16.   function ComparePairs(constref L, R: TIntPair): LongInt;
  17.   begin
  18.     if L.Key < R.Key then
  19.       Result := -1
  20.     else if L.Key = R.Key then
  21.       Result := 0
  22.     else
  23.       Result := 1;
  24.   end;
  25.  
  26. var
  27.   Total, Unique: Integer;
  28.   Start: TDateTime;
  29.  
  30.   procedure SortCountAkira;
  31.   var
  32.     I: LongInt;
  33.     InOut: Text;
  34.     Map: specialize TDictionary<LongInt, LongInt>;
  35.     Pair: TIntPair;
  36.     Pairs: specialize TArray<TIntPair>;
  37.   begin
  38.     Map := specialize TDictionary<LongInt, LongInt>.Create();
  39.     Map.Capacity := 10000000;
  40.     Assign(InOut, ParamStr(1));
  41.     Reset(InOut);
  42.     while not EOF(InOut) do begin
  43.       ReadLn(InOut, I);
  44.       Inc(Total);
  45.       if not Map.ContainsKey(I) then
  46.         begin
  47.           Map.Add(I, 1);
  48.           Inc(Unique);
  49.         end
  50.       else
  51.         Map[I] := Map[I] + 1;
  52.     end;
  53.     Close(InOut);
  54.     Pairs := Map.ToArray();
  55.     Map.Free();
  56.     specialize TArrayHelper<TIntPair>.Sort(
  57.       Pairs,
  58.       specialize TComparer<TIntPair>.Construct(@ComparePairs)
  59.     );
  60.     Assign(InOut, ParamStr(2));
  61.     Rewrite(InOut);
  62.     for Pair in Pairs do with Pair do
  63.       WriteLn(InOut, Key, ' - ', Value);
  64.     Close(InOut);
  65.   end;
  66.  
  67.   procedure SortCountHoward;
  68.   var
  69.     arr: array of Integer;
  70.     textf: TextFile;
  71.     min: Integer = High(Integer);
  72.     max: Integer = -1;
  73.     i: Integer;
  74.   begin
  75.     AssignFile(textf, ParamStr(1));
  76.     Reset(textf);
  77.     while not EOF(textf) do
  78.       begin
  79.         ReadLn(textf, i);
  80.         Inc(Total);
  81.         if i < min then
  82.           min := i;
  83.         if i > max then
  84.           max := i;
  85.       end;
  86.     SetLength(arr, max-min+1);
  87.  
  88.     Reset(textf);
  89.     while not EOF(textf) do
  90.       begin
  91.         ReadLn(textf, i);
  92.         Dec(i, min);
  93.         Inc(arr[i]);
  94.       end;
  95.     CloseFile(textf);
  96.  
  97.     AssignFile(textf, ParamStr(2));
  98.     Rewrite(textf);
  99.     for i := Low(arr) to High(arr) do
  100.       case (arr[i] > 0) of
  101.         True:
  102.           begin
  103.             WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
  104.             Inc(Unique);
  105.           end;
  106.       end;
  107.     CloseFile(textf);
  108.     SetLength(arr, 0);
  109.   end;
  110.  
  111.   procedure SortCountAvk1;
  112.   type
  113.     TCounter  = specialize TGHashMultiSetLP<Integer>;
  114.     TCountRef = specialize TGAutoRef<TCounter>;
  115.     TEntry    = TCounter.TEntry;
  116.  
  117.     function EntryCmp(constref L, R: TEntry): SizeInt;
  118.     begin
  119.       if L.Key > R.Key then
  120.         Result := 1
  121.       else
  122.         if L.Key < R.Key then
  123.           Result := -1
  124.         else
  125.           Result := 0;
  126.     end;
  127.  
  128.   var
  129.     CountRef: TCountRef;
  130.     InOut: Text;
  131.     Counter: TCounter;
  132.     e: TEntry;
  133.     I: Integer;
  134.   begin
  135.     Counter := CountRef;
  136.     //Counter.LoadFactor := 0.7;
  137.     Assign(InOut, ParamStr(1));
  138.     Reset(InOut);
  139.     while not EOF(InOut) do
  140.       begin
  141.         ReadLn(InOut, I);
  142.         Counter.Add(I);
  143.       end;
  144.     Close(InOut);
  145.     Total := Counter.Count;
  146.     Unique := Counter.EntryCount;
  147.     if Counter.NonEmpty then
  148.       begin
  149.         Assign(InOut, ParamStr(2));
  150.         Rewrite(InOut);
  151.         for e in Counter.Entries.Sorted(@EntryCmp) do
  152.           with e do
  153.             WriteLn(InOut, Key, ' - ', Count);
  154.         Close(InOut);
  155.       end;
  156.   end;
  157.  
  158.   procedure SortCountAvk2;
  159.   var
  160.     List: array of Integer = nil;
  161.     InOut: Text;
  162.     I, J, Count, DupCount: Integer;
  163.   begin
  164.     Assign(InOut, ParamStr(1));
  165.     Reset(InOut);
  166.     SetLength(List, 4096);
  167.     I := 0;
  168.     while not EOF(InOut) do
  169.       begin
  170.         ReadLn(InOut, J);
  171.         Inc(Total);
  172.         if Length(List) = I then
  173.           SetLength(List, I * 2);
  174.         List[I] := J;
  175.         Inc(I);
  176.       end;
  177.     Close(InOut);
  178.     SetLength(List, I);
  179.     if List = nil then
  180.       exit;
  181.     specialize TGOrdinalArrayHelper<Integer>.Sort(List);
  182.     Count := I;
  183.     DupCount := 0;
  184.     I := 0;
  185.     Assign(InOut, ParamStr(2));
  186.     Rewrite(InOut);
  187.     repeat
  188.       J := List[I];
  189.       while (I < Count) and (List[I] = J) do
  190.         begin
  191.           Inc(DupCount);
  192.           Inc(I);
  193.         end;
  194.       WriteLn(InOut, J, ' - ', DupCount);
  195.       Inc(Unique);
  196.       DupCount := 0;
  197.     until I = Count;
  198.     Close(InOut);
  199.   end;
  200.  
  201.   procedure SortCountJulkas;
  202.   type
  203.     TIntLess = specialize TLess<LongInt>;
  204.     TDict = specialize TMap<LongInt, LongInt, TIntLess>;
  205.   var
  206.     sc: TDict;
  207.     scit: TDict.TIterator;
  208.     InOut: Text;
  209.     key, cnt: LongInt;
  210.   begin
  211.     sc := TDict.Create;
  212.     Assign(InOut, ParamStr(1));
  213.     Reset(InOut);
  214.     while not EOF(InOut) do
  215.       begin
  216.         ReadLn(InOut, key);
  217.         Inc(Total);
  218.         cnt := 0;
  219.         sc.TryGetValue(key, cnt);
  220.         sc[key] := cnt + 1;
  221.       end;
  222.     Close(InOut);
  223.     Unique := sc.Size;
  224.     if Unique > 0 then
  225.       begin
  226.         Assign(InOut, ParamStr(2));
  227.         Rewrite(InOut);
  228.         scit := sc.Min;
  229.         repeat
  230.           WriteLn(InOut, scit.Key, ' - ', scit.Value);
  231.         until not scit.Next;
  232.         Close(InOut);
  233.         scit.Free;
  234.       end;
  235.     sc.Free;
  236.   end;
  237.  
  238.   procedure SortCountMangakissa;
  239.   type
  240.     TMyTime = packed record
  241.       Unixtime : Integer;
  242.       Counter  : word;
  243.     end;
  244.  
  245.     function Find(var aMyTime : array of TMytime; aLine : Integer) : boolean;
  246.     var index : integer;
  247.     begin
  248.       result := false;
  249.       if length(aMyTime) > 0 then
  250.       begin
  251.         for index := low(aMyTime) to high(aMyTime) do
  252.         begin
  253.           if aMyTime[index].Unixtime = aLine then
  254.           begin
  255.             aMyTime[index].Counter := aMyTime[index].Counter + 1;
  256.             result := true;
  257.             break;
  258.           end;
  259.         end;
  260.       end;
  261.     end;
  262.  
  263.     procedure QuickSort(var A: array of tMytime; iLo, iHi: Integer) ;
  264.      var
  265.        Lo, Hi, Pivot : Integer;
  266.        T             : TMyTime;
  267.      begin
  268.        Lo := iLo;
  269.        Hi := iHi;
  270.        Pivot := A[(Lo + Hi) div 2].Unixtime;
  271.        repeat
  272.          while A[Lo].Unixtime < Pivot do Inc(Lo) ;
  273.          while A[Hi].Unixtime > Pivot do Dec(Hi) ;
  274.          if Lo <= Hi then
  275.          begin
  276.            T := A[Lo];
  277.            A[Lo] := A[Hi];
  278.            A[Hi] := T;
  279.            Inc(Lo) ;
  280.            Dec(Hi) ;
  281.          end;
  282.        until Lo > Hi;
  283.        if Hi > iLo then QuickSort(A, iLo, Hi) ;
  284.        if Lo < iHi then QuickSort(A, Lo, iHi) ;
  285.      end;
  286.   var
  287.     InOut         : Text;
  288.     MyTime        : array of TMyTime = nil;
  289.     Item          : TMyTime;
  290.     myline, index : integer;
  291.   begin
  292.     Assign(InOut, ParamStr(1));
  293.     Reset(InOut);
  294.     index := 0;
  295.     while not EOF(InOut) do
  296.       begin
  297.         ReadLn(InOut, myline);
  298.         Inc(Total);
  299.         if not find(MyTime, myline) then
  300.         begin
  301.           index := index + 1;
  302.           setlength(MyTime,index);
  303.           MyTime[index - 1].Unixtime := myLine;
  304.           MyTime[index - 1].Counter := 1;
  305.         end;
  306.       end;
  307.     Close(InOut);
  308.     if MyTime = nil then
  309.       exit;
  310.     Unique := Length(MyTime);
  311.     QuickSort(Mytime, low(MyTime), high(mytime));
  312.     Assign(InOut, ParamStr(2));
  313.     Rewrite(InOut);
  314.     for Item in Mytime do
  315.       with Item do
  316.         WriteLn(InOut, Unixtime, ' - ', Counter);
  317.     Close(InOut);
  318.   end;
  319.  
  320.   procedure Run(aProc: TProcedure);
  321.   begin
  322.     Total := 0;
  323.     Unique := 0;
  324.     Start := Now;
  325.     try
  326.       aProc();
  327.       WriteLn('elapsed time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
  328.       WriteLn('#total: ', Total, ', #unique: ', Unique);
  329.     except
  330.       on e: Exception do
  331.         WriteLn('crashes with message "', e.Message, '"');
  332.     end;
  333.   end;
  334.  
  335. begin
  336.   if ParamCount <> 2 then
  337.     begin
  338.       WriteLn('Usage: OccurrenceCounter infilename outfilename');
  339.       exit;
  340.     end;
  341.   if not FileExists(ParamStr(1)) then
  342.     begin
  343.       WriteLn('Input file "', ParamStr(1), '" not found');
  344.       exit;
  345.     end;
  346.  
  347.   WriteLn('running SortCountAkira:');
  348.   Run(@SortCountAkira);
  349.   WriteLn;
  350.  
  351.   WriteLn('running SortCountHoward:');
  352.   Run(@SortCountHoward);
  353.   WriteLn;
  354.  
  355.  
  356.   WriteLn('running SortCountAvk1:');
  357.   Run(@SortCountAvk1);
  358.   WriteLn;
  359.  
  360.   WriteLn('running SortCountAvk2:');
  361.   Run(@SortCountAvk2);
  362.   WriteLn;
  363.  
  364.   WriteLn('running SortCountJulkas:');
  365.   Run(@SortCountJulkas);
  366.   WriteLn;
  367.  
  368.   WriteLn('running SortCountMangakissa:');
  369.   Run(@SortCountMangakissa);
  370. end.  
  371.  

output:
Code: Text  [Select]
  1. running SortCountAkira:
  2. elapsed time: 1.4130
  3. #total: 1750000, #unique: 710248
  4.  
  5. running SortCountHoward:
  6. elapsed time: 1.5600
  7. #total: 1750000, #unique: 710248
  8.  
  9. running SortCountAvk1:
  10. elapsed time: 1.0770
  11. #total: 1750000, #unique: 710248
  12.  
  13. running SortCountAvk2:
  14. elapsed time: 0.7170
  15. #total: 1750000, #unique: 710248
  16.  
  17. running SortCountJulkas:
  18. elapsed time: 2.6370
  19. #total: 1750000, #unique: 710248
  20.  
  21. running SortCountMangakissa:
  22. elapsed time: 375.5530
  23. #total: 1750000, #unique: 710248
  24.  
Title: Re: Sorting and Counting
Post by: 440bx on July 19, 2019, 09:31:39 am
Here are a couple of implementations using the Windows API.

The first version does not use Readln.

The second version does not use Readln nor Writeln except for 2 writeln(s) to the console.

NOTE: the input file name is hardcoded to be "infile.txt" for convenience (easy to change that if needed.)
Title: Re: Sorting and Counting
Post by: julkas on July 19, 2019, 10:28:35 am
Here is my faster (more compicated) version with TVector. Output file ~ 147 MB. Uniq count fixed.
Code: Pascal  [Select]
  1. program sc2;
  2. {$mode delphi}
  3. uses gvector, gutil, garrayutils, SysUtils;
  4. const
  5.   keyNum = 10000000;
  6. type
  7.   TIntLess = TLess<LongInt>;
  8.   TIntVect = TVector<LongInt>;
  9.   TOrd = TOrderingArrayUtils<TIntVect, LongInt, TIntLess>;
  10. var
  11.   sc: array[0..21474] of TIntVect;
  12.   i: LongInt;
  13.   pkey, key, cnt, uniq: LongInt;
  14.   offset: LongInt;
  15.   start: QWord;
  16.   outFile: Text;
  17. begin
  18.   start := GetTickCount64();
  19.   for i := Low(sc) to High(sc) do sc[i] := TIntVect.Create;
  20.  
  21.   for i := 0 to keyNum do
  22.   begin
  23.     key := Random(2147483647);
  24.     sc[key div 100000].PushBack(key mod 100000);
  25.   end;
  26.   WriteLn('Populated (ticks) - ', GetTickCount64() - start);
  27.  
  28.   Assign(outFile, 'out.txt');
  29.   Rewrite(outFile);
  30.  
  31.   for i := Low(sc) to High(sc) do if sc[i].Size > 1 then TOrd.Sort(sc[i], sc[i].Size);
  32.  
  33.   offset := -100000;
  34.   uniq := 0;
  35.   for i := Low(sc) to High(sc) do
  36.   begin
  37.     Inc(offset, 100000);
  38.     pkey := -1;
  39.     cnt := 0;
  40.     for key in sc[i] do
  41.     begin
  42.       if pkey <> key then
  43.       begin
  44.         if cnt <> 0 then
  45.         begin
  46.           WriteLn(outFile, offset + pkey, ' - ', cnt);
  47.           Inc(uniq);
  48.         end;
  49.         pkey := key;
  50.         cnt := 0;
  51.       end;
  52.       Inc(cnt);
  53.     end;
  54.     if cnt <> 0 then
  55.     begin
  56.       WriteLn(outFile, offset + pkey, ' - ', cnt);
  57.       Inc(uniq);
  58.     end;
  59.   end;
  60.  
  61.   Close(outFile);
  62.   for i := Low(sc) to High(sc) do
  63.     sc[i].Destroy;
  64.  
  65.   WriteLn('Total (ticks) - ', GetTickCount64() - start);
  66.   WriteLn('Uniq keys - ', uniq, ', out of - ', keyNum);
  67.   ReadLn;
  68. end.
Console output.
Code: Text  [Select]
  1. Populated (ticks) - 1312
  2. Total (ticks) - 4922
  3. Uniq keys - 9976566, out of - 10000000
Title: Re: Sorting and Counting
Post by: julkas on July 19, 2019, 02:43:43 pm
@Akira1364, @avk can you include my second algo in benchmark and run again? I have 3.0.4 and can't use LGenerics.
Thanks.
Title: Re: Sorting and Counting
Post by: avk on July 19, 2019, 08:10:44 pm
@julkas, done
@440bx:
So, do you offer to compete with the Windows file mapping and ntdll? Okay, why not. :)
To run your SortCount (from SortCount2) inside a Howard's benchmark, I put the SortCount2 code in a separate unit:
Code: Pascal  [Select]
  1. // LONGSTRINGS ON needed for ParamStr to return a null terminated string
  2.  
  3. {$LONGSTRINGS ON}
  4.  
  5. unit WinSortCount2;
  6.  
  7. interface
  8. uses
  9.   windows,
  10.   sysutils,
  11.   dateutils
  12.   ;
  13.  
  14. const
  15.   ntdll            = 'ntdll.dll';
  16.   kernel32         = 'kernel32.dll';
  17.  
  18. var
  19.   // constants to replace ParamStr(1) and ParamStr(2) respectively
  20.  
  21.   InFileName: string = '';
  22.   OutFileName: string = '';
  23.  
  24.   DataCount : ptruint;
  25.   Unique    : ptruint;
  26.  
  27.   procedure SortCount;
  28.  
  29. implementation
  30. // -----------------------------------------------------------------------------
  31. // input file related types
  32.  
  33. type
  34.   // the input file is made up of unixtimes occupying 10 bytes followed by
  35.   // a CR/LF (on Windows)  To avoid comparing strings (which is slow) we define
  36.   // the unixtime characters as being composed on a qword and a word followed
  37.   // the the CR/LF ending.
  38.  
  39.   TINPUT_FILE_ELEMENT = packed record
  40.     case boolean of
  41.     0 : (
  42.          ife_hi                : qword;       // first 8 bytes of unixtime
  43.          ife_lo                : word;        // trailing 2 bytes of unixtime
  44.  
  45.          ife_LineEnding        : word;        // CRLF
  46.         );
  47.     1 : (
  48.           ife_unixtime         : packed array[0..9] of char;
  49.           ife_crlf             : word;
  50.         );
  51.   end;
  52.   PINPUT_FILE_ELEMENT = ^TINPUT_FILE_ELEMENT;
  53.  
  54.  
  55.  
  56. // -----------------------------------------------------------------------------
  57. // kernel32 related types and functions
  58.  
  59. type
  60.   PSECURITY_ATTRIBUTES = ^TSECURITY_ATTRIBUTES;
  61.   TSECURITY_ATTRIBUTES = record
  62.     Length                      : DWORD;
  63.     PointerToSecurityDescriptor : pointer;
  64.     InheritHandle               : boolean32;
  65.   end;
  66.  
  67.  
  68. function GetFileSizeEx(    FileHandle        : THANDLE;
  69.                        var PointerToFileSize : qword)
  70.          : boolean32; stdcall; external kernel32;
  71.  
  72. function CreateFileA(PointerToFileName           : pchar;
  73.                      DesiredAccess               : DWORD;
  74.                      ShareMode                   : DWORD;
  75.                      PointerToSecurityAttributes : PSECURITY_ATTRIBUTES;
  76.                      CreationDisposition         : DWORD;
  77.                      FlagsAndAttributes          : DWORD;
  78.                      TemplateFileHandle          : THANDLE)
  79.          : THANDLE; stdcall; external kernel32;
  80.  
  81.  
  82. // -----------------------------------------------------------------------------
  83. // ntdll related types and functions
  84.  
  85. type
  86.   TCompareFunction = function (key : pointer; data : pointer) : ptrint; cdecl;
  87.  
  88. const
  89.   COMPARE_EQUAL    =  0;
  90.   COMPARE_GREATER  =  1;
  91.   COMPARE_LESS     = -1;
  92.  
  93.  
  94.  
  95. procedure qsort(Base            : pointer;
  96.                 ElementCount    : ptruint;
  97.                 ElementSize     : ptruint;
  98.                 CompareFunction : TCompareFunction);
  99.           cdecl; external ntdll;
  100.  
  101. procedure RtlMoveMemory(Destination : pointer;
  102.                         Source      : pointer;
  103.                         BytesToCopy : ptruint);
  104.           stdcall; external ntdll;
  105.  
  106.  
  107. const
  108.   CRLF                      = #13#10;
  109.   FORMAT            : pchar = '%10.10s  -  %d' + CRLF;
  110.  
  111.   // if there are no duplicates at all, then the output file will be the size
  112.   // of the input file plus 6 additional characters (see FORMAT above)
  113.  
  114.   ADDITIONAL_OUTPUT         = 6;
  115.  
  116. // NOTE : don't user user32.dll wsprintf, it's a paraplegic dog.
  117.  
  118. function sprintf(OutputDestination : pchar;
  119.                  Format            : pchar;
  120.                  UnixTime          : pchar;
  121.                  Count             : integer) : integer; cdecl;    { CDECL !!}
  122.   external ntdll name 'sprintf';
  123.  
  124. // -----------------------------------------------------------------------------
  125.  
  126. function CompareUnixTimes(EntryA, EntryB : PINPUT_FILE_ELEMENT)
  127.          : ptrint; cdecl;
  128. begin
  129.   if EntryA^.ife_hi > EntryB^.ife_hi then exit(COMPARE_GREATER);
  130.   if EntryA^.ife_hi < EntryB^.ife_hi then exit(COMPARE_LESS);
  131.  
  132.   // the first qword of both entries is the same, use the last 2 bytes of the
  133.   // unixtime
  134.  
  135.   if EntryA^.ife_lo > EntryB^.ife_lo then exit(COMPARE_GREATER);
  136.   if EntryA^.ife_lo < EntryB^.ife_lo then exit(COMPARE_LESS);
  137.  
  138.   // they are the same
  139.  
  140.   result := COMPARE_EQUAL;
  141. end;
  142.  
  143. // -----------------------------------------------------------------------------
  144.  
  145. procedure Error(Id : ptruint);
  146. begin
  147.   write('FATAL : ');
  148.   case Id of
  149.     1 : writeln('LoadInputFileIntoMemory failed');
  150.     2 : writeln('WriteOutputFile - failed to create output file');
  151.     3 : writeln('WriteOutputFile - failed to write to output file');
  152.   end;
  153.  
  154.   halt(Id);
  155. end;
  156.  
  157. // -----------------------------------------------------------------------------
  158.  
  159. function LoadInputFileIntoMemory (Filename     : pchar;
  160.                               var Filesize     : qword;
  161.                               var OutputBuffer : pchar)
  162.          : PINPUT_FILE_ELEMENT;
  163.   // maps the input file in memory, determines its size and copies the input
  164.   // file into a separate memory block (because the input file is not supposed
  165.   // to be sorted.)  Also allocates a buffer for the output file.
  166. var
  167.   FileHandle           : THANDLE =   0;
  168.   FileMapping          : THANDLE =   0;
  169.   FileMapAddress       : pointer = nil;
  170.  
  171.   // variables that will be returned upon success
  172.  
  173.   FileData             : pointer = nil;
  174.   Size                 : qword   =   0;
  175.   FileOutBuffer        : pointer = nil;
  176.  
  177.   // to create a local scope (strictly local)
  178.  
  179.   SCOPE                : integer =   0;
  180.  
  181.   UnixTimesCount       : qword   =   0;
  182.  
  183. const
  184.   // constants used by CreateFile
  185.  
  186.   NO_TEMPLATE_FILE     =       0;
  187.  
  188.   // constants used by CreateFileMapping
  189.  
  190.   NO_MAXIMUM_SIZE_HIGH =       0;      // 0 indicates to use the size of the
  191.   NO_MAXIMUM_SIZE_LOW  =       0;      //   file
  192.  
  193.   // constants used by MapViewOfFileEx
  194.  
  195.   FILE_OFFSET_HIGH     =       0;      //   file offset to map from
  196.   FILE_OFFSET_LOW      =       0;
  197.  
  198.   BEGINNING_TO_END     =       0;
  199.  
  200. begin
  201.   // initialize return values
  202.  
  203.   result       := nil;
  204.   Filesize     := 0;
  205.   OutputBuffer := nil;
  206.  
  207.   // map the input file and allocate necessary resources.
  208.  
  209.   for SCOPE := 1 to 1 do       // trick to create a scope one can break out of
  210.   begin
  211.     FileHandle := CreateFileA(Filename,
  212.                               GENERIC_READ,
  213.                               FILE_SHARE_READ,
  214.                               nil,
  215.                               OPEN_EXISTING,
  216.                               FILE_ATTRIBUTE_NORMAL,
  217.                               NO_TEMPLATE_FILE);
  218.  
  219.     if FileHandle = INVALID_HANDLE_VALUE   then break;
  220.  
  221.     if not GetFileSizeEx(FileHandle, Size) then break;
  222.     if Size = 0                            then break;       // empty file
  223.  
  224.     // with the file handle, create a mapping for it
  225.  
  226.     FileMapping := CreateFileMappingA(FileHandle,
  227.                                       nil,
  228.                                       PAGE_READONLY,
  229.                                       NO_MAXIMUM_SIZE_HIGH,  // use file size
  230.                                       NO_MAXIMUM_SIZE_LOW,
  231.                                       nil);
  232.  
  233.     if (FileMapping = 0)                   then break;
  234.  
  235.     FileMapAddress := MapViewOfFileEx(FileMapping,
  236.                                       FILE_MAP_READ,
  237.                                       FILE_OFFSET_HIGH,      // from beginning
  238.                                       FILE_OFFSET_LOW,
  239.                                       BEGINNING_TO_END,      // to end
  240.                                       nil);                  // map anywhere
  241.  
  242.     if FileMapAddress = nil                then break;
  243.  
  244.     // allocate a memory block to hold the file data since the file itself
  245.     // won't be sorted
  246.  
  247.     FileData := HeapAlloc(GetProcessHeap(),
  248.                           0,
  249.                           Size);
  250.  
  251.     if FileData = nil                      then break;
  252.  
  253.  
  254.     // copy the data in the file to the block we just allocated
  255.  
  256.     RtlMoveMemory(FileData, FileMapAddress, Size);
  257.  
  258.     // allocate a buffer for the output file
  259.  
  260.     UnixTimesCount := Size div sizeof(TINPUT_FILE_ELEMENT);
  261.     FileOutBuffer  := HeapAlloc(GetProcessHeap(),
  262.                                 0,
  263.                                 Size + (UnixTimesCount * ADDITIONAL_OUTPUT));
  264.  
  265.     if FileOutBuffer = nil then break;        // just in case additional
  266.                                               // instructions are added at a
  267.                                               // later time.
  268.  
  269.   end;
  270.  
  271.   if (FileHandle     <> INVALID_HANDLE_VALUE) then CloseHandle(FileHandle);
  272.   if (FileMapping    <>                    0) then CloseHandle(FileMapping);
  273.  
  274.   if (FileMapAddress <> nil) then UnmapViewOfFile(FileMapAddress);
  275.  
  276.   if (FileData <> nil) then
  277.   begin
  278.     Filesize     := Size;
  279.     OutputBuffer := FileOutBuffer;
  280.     result       := FileData;
  281.   end;
  282. end;
  283.  
  284. // -----------------------------------------------------------------------------
  285.  
  286. procedure WriteOutputFile(OutputFilename               : pchar;
  287.                           DataOutputBuffer, DataOutPtr : pchar);
  288. const
  289.   FILE_NO_SHARE    = 0;
  290.   NO_TEMPLATE_FILE = 0;
  291.  
  292. var
  293.   FileHandle : THANDLE;
  294.   ByteCount  : ptruint;
  295.  
  296.   Ok           : BOOL  = FALSE;
  297.   BytesWritten : DWORD = 0;
  298.  
  299. begin
  300.   FileHandle := CreateFileA(OutputFilename,
  301.                             GENERIC_READ or GENERIC_WRITE,
  302.                             FILE_NO_SHARE,
  303.                             nil,
  304.                             CREATE_ALWAYS,
  305.                             FILE_ATTRIBUTE_NORMAL,
  306.                             NO_TEMPLATE_FILE);
  307.  
  308.   if FileHandle = INVALID_HANDLE_VALUE then Error(2);
  309.  
  310.   ByteCount := DataOutPtr - DataOutputBuffer;
  311.  
  312.  
  313.   Ok := WriteFile(FileHandle,
  314.                   DataOutputBuffer^,
  315.                   ByteCount,
  316.                   BytesWritten,
  317.                   nil);
  318.  
  319.   if not Ok then Error(3);
  320.  
  321.   CloseHandle(FileHandle);
  322. end;
  323.  
  324. // -----------------------------------------------------------------------------