### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook (preview only)

### Author Topic: Sorting and Counting  (Read 4435 times)

#### julkas

• Sr. Member
• Posts: 314
• KISS principle / Lazarus 2.0.0 / FPC 3.0.4
##### Re: Sorting and Counting
« Reply #15 on: July 17, 2019, 06:07:38 pm »
gmap might depend on the number of unique values. If that gets large it might also slow down.
About ~ 10000000 random longint values.
procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
mov rax, a
mov rdx, b
mul rdx
mov [clo], rax
mov [chi], rdx
end;
(* Pointer game *) Inc(ptr, 1); (* vs *) ptr := ptr + 1;

#### avk

• Jr. Member
• Posts: 95
##### Re: Sorting and Counting
« Reply #16 on: July 17, 2019, 06:40:45 pm »
@julkas, to generate a test file, I used this:
Code: Pascal  [Select]
1. procedure CreateTestFile(const aFileName: string);
2. var
3.   I: Integer;
4. const
5.   TestSize = 1750000;
6. begin
7.   with TStringList.Create do
8.     try
9.       for I := 1 to TestSize do
11.       SaveToFile(aFileName);
12.     finally
13.       Free;
14.     end;
15. end;
16.
The output is a ~20MB text file, containing ~700000 unique values.

#### Akira1364

• Hero Member
• Posts: 522
##### Re: Sorting and Counting
« Reply #17 on: July 17, 2019, 08:42:08 pm »
Here's a example / benchmark of a version that uses Generics.Collections (available by default in trunk FPC, but also compatible with 3.0.4 if you just copy the sources to somewhere in whatever your unit search path is.)

Performance seems to be quite good (runs in around 0.8 - 1.1 seconds on average for me.)

Code: Pascal  [Select]
1. program Example;
2.
3. {\$mode Delphi}
4.
5. uses
6.   SysUtils, DateUtils,
7.   Generics.Defaults, Generics.Collections;
8.
9. type
10.   TIntPair = TPair<LongInt, LongInt>;
11.   TIntMap = TDictionary<LongInt, LongInt>;
12.
13.   function ComparePairs(constref L, R: TIntPair): LongInt;
14.   begin
15.     if L.Key < R.Key then
16.       Result := -1
17.     else if L.Key = R.Key then
18.       Result := 0
19.     else
20.       Result := 1;
21.   end;
22.
23. var
24.   I: LongInt;
25.   Start: TDateTime;
26.   InFile, OutFile: Text;
27.   Map: TIntMap;
28.   Pair: TIntPair;
29.   Pairs: TArray<TIntPair>;
30.
31. begin
32.   // Generate a random test file first, with an adaptation of avk's method.
33.   Randomize();
34.   Assign(InFile, 'data.txt');
35.   Rewrite(InFile);
36.   for I := 1 to 1750000 do
37.     WriteLn(InFile, 1500000000 + Random(800000));
38.   Close(InFile);
39.   I := 0;
40.   // Start the timer only now, as this is where the real work starts.
41.   Start := Now();
42.   Map := TIntMap.Create();
43.   // Allocate a big chunk of memory here in advance, for performance's sake.
44.   // Doesn't matter if it's more than you end up needing, as Capacity is separate from Count.
45.   Map.Capacity := 1750000;
46.   Assign(InFile, 'data.txt');
47.   Reset(InFile);
48.   while not EOF(InFile) do begin
50.     if not Map.ContainsKey(I) then
52.     else
53.       Map[I] := Map[I] + 1;
54.   end;
55.   Close(InFile);
56.   Pairs := Map.ToArray();
57.   TArrayHelper<TIntPair>.Sort(
58.     Pairs,
59.     TComparer<TIntPair>.Construct(ComparePairs)
60.   );
61.   Assign(OutFile, 'output.txt');
62.   Rewrite(OutFile);
63.   for Pair in Pairs do with Pair do
64.     WriteLn(OutFile, Key, ' - ', Value);
65.   Close(OutFile);
66.   Map.Free();
67.   WriteLn(MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
68. end.
« Last Edit: July 18, 2019, 12:18:05 am by Akira1364 »

#### howardpc

• Hero Member
• Posts: 3075
##### Re: Sorting and Counting
« Reply #18 on: July 17, 2019, 08:47:16 pm »
Unfortunately, it’s not only a matter of resources: on a dataset of the size specified above, your solution is 4 times(90 s.) slower than engkin's  one(22 s.)  and it is prohibitively slow for a task of this size.
I have changed my mind.

A generic container like TStringList is not well suited to this task on large files.
Here follows a simple dynamic array solution. The following routine executes in less than 2s on my average 5-year-old machine, using a test file generated from your code.
Code: Pascal  [Select]
1. program SortCountlpi;
2.
3. {\$mode objfpc}{\$H+}
4.
5. uses
6.   Classes, sysutils;
7.
8. procedure CreateTestFile(const aFileName: string);
9. var
10.   I: Integer;
11. const
12.   TestSize = 1750000;
13. begin
14.   with TStringList.Create do
15.     try
16.       for I := 1 to TestSize do
18.       SaveToFile(aFileName);
19.     finally
20.       Free;
21.     end;
22. end;
23.
24. procedure SortCountViaArray(const anInfile, anOutFile: String);
25. var
26.   arr: array of Integer;
27.   textf: TextFile;
28.   min: Integer = High(Integer);
29.   max: Integer = -1;
30.   i: Integer;
31. begin
32.   if FileExists(anInfile) then begin
33.     AssignFile(textf, anInfile);
34.     Reset(textf);
35.     while not EOF(textf) do
36.       begin
38.         if i < min then
39.           min := i;
40.         if i > max then
41.           max := i;
42.       end;
43.     SetLength(arr, max-min+1);
44.
45.     Reset(textf);
46.     while not EOF(textf) do
47.       begin
49.         Dec(i, min);
50.         Inc(arr[i]);
51.       end;
52.     CloseFile(textf);
53.
54.     AssignFile(textf, anOutFile);
55.     Rewrite(textf);
56.     for i := Low(arr) to High(arr) do
57.       case (arr[i] > 0) of
58.         True: WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
59.       end;
60.     CloseFile(textf);
61.     SetLength(arr, 0);
62.   end;
63. end;
64.
65. var
66.   t: TDateTime;
67.
68. begin
69.   CreateTestFile('infile.txt');
70.   t := Time;
71.   SortCountViaArray('infile.txt', 'outfile.txt');
72.   Writeln('Elapsed time: ',DateTimeToTimeStamp(Time -t).Time,' ms');
73. end.
« Last Edit: July 17, 2019, 08:58:41 pm by howardpc »

#### mpknap

• Jr. Member
• Posts: 76
##### Re: Sorting and Counting
« Reply #19 on: July 17, 2019, 09:28:26 pm »
Thank you gentlemen. At the moment I'm testing the Engkin algorithm. The TXT file is large, sorting takes more than an hour. Tomorrow I will check your other suggestions. I do not really care about speed, because it's not for the program user only for me. It is important that the result is correct.

#### jamie

• Hero Member
• Posts: 1779
##### Re: Sorting and Counting
« Reply #20 on: July 17, 2019, 11:13:25 pm »
If you want faster you need to create a single chuck of memory that will house all of the file at once..

Load the file into that memory and write some code that shuffles the lines around. The idea is to not allow the memory manager to keep fragmentating the operation.

Also putting it in a more simpler way, your fields are all the same size which makes this easy and since they are numbers only they can be converted into a Binary number and placed in a array to suites your needs..

Once in the array you sort the array with simple algorithms.

So step one is the read the file in line by line, convert each entry to a binary number and then store it in the
array..

Once you have this data stored you then sort the array using something like a bubble sort.

The array size can be calculated a head of time so you can dynamically create it..

ArraySize := FileSize Div_NUmber_Chars_PerEntry_Plus2_For CRLF;

So your array could be this
Array of Int64;
SetLength(MyArray, ArraySize);

etc.
the rest is just code...

#### Akira1364

• Hero Member
• Posts: 522
##### Re: Sorting and Counting
« Reply #21 on: July 18, 2019, 12:30:37 am »
A generic container like TStringList is not well suited to this task on large files.

Well, an actually-generic hashmap that deals directly with the relevant type as opposed to unnecessarily stringifying the values is definitely well-suited to it. Your array solution is certainly quite fast though.

Here's both of ours wrapped up into a single command-line app with an option to tell it which to use:

Code: Pascal  [Select]
1. program OccurrenceCounter;
2.
3. {\$mode Delphi}
4. {\$ImplicitExceptions Off}
5.
6. uses
7.   SysUtils, DateUtils,
8.   Generics.Defaults, Generics.Collections;
9.
10. type
11.   TIntPair = TPair<LongInt, LongInt>;
12.
13.   function ComparePairs(constref L, R: TIntPair): LongInt;
14.   begin
15.     if L.Key < R.Key then
16.       Result := -1
17.     else if L.Key = R.Key then
18.       Result := 0
19.     else
20.       Result := 1;
21.   end;
22.
23.   procedure SortCountAkira;
24.   var
25.     I: LongInt;
26.     InOut: Text;
27.     Map: TDictionary<LongInt, LongInt>;
28.     Pair: TIntPair;
29.     Pairs: TArray<TIntPair>;
30.   begin
31.     if FileExists(ParamStr(2)) then begin
32.       Map := TDictionary<LongInt, LongInt>.Create();
33.       Map.Capacity := 10000000;
34.       Assign(InOut, ParamStr(2));
35.       Reset(InOut);
36.       while not EOF(InOut) do begin
38.         if not Map.ContainsKey(I) then
40.         else
41.           Map[I] := Map[I] + 1;
42.       end;
43.       Close(InOut);
44.       Pairs := Map.ToArray();
45.       Map.Free();
46.       TArrayHelper<TIntPair>.Sort(
47.         Pairs,
48.         TComparer<TIntPair>.Construct(ComparePairs)
49.       );
50.       Assign(InOut, ParamStr(3));
51.       Rewrite(InOut);
52.       for Pair in Pairs do with Pair do
53.         WriteLn(InOut, Key, ' - ', Value);
54.       Close(InOut);
55.     end;
56.   end;
57.
58.   procedure SortCountHoward;
59.   var
60.     arr: array of Integer;
61.     textf: TextFile;
62.     min: Integer = High(Integer);
63.     max: Integer = -1;
64.     i: Integer;
65.   begin
66.     if FileExists(ParamStr(2)) then begin
67.       AssignFile(textf, ParamStr(2));
68.       Reset(textf);
69.       while not EOF(textf) do
70.         begin
72.           if i < min then
73.             min := i;
74.           if i > max then
75.             max := i;
76.         end;
77.       SetLength(arr, max-min+1);
78.
79.       Reset(textf);
80.       while not EOF(textf) do
81.         begin
83.           Dec(i, min);
84.           Inc(arr[i]);
85.         end;
86.       CloseFile(textf);
87.
88.       AssignFile(textf, ParamStr(3));
89.       Rewrite(textf);
90.       for i := Low(arr) to High(arr) do
91.         case (arr[i] > 0) of
92.           True: WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
93.         end;
94.       CloseFile(textf);
95.       SetLength(arr, 0);
96.     end;
97.   end;
98.
99. var
100.   Start: TDateTime;
101.
102. begin
103.   if ParamCount() <> 3 then
104.     WriteLn('Usage: occurrencecounter (-akira | -howard) infilename outfilename')
105.   else if ParamStr(1) = '-akira' then
106.   begin
107.     Start := Now();
108.     SortCountAkira();
109.     WriteLn(MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
110.   end
111.   else if ParamStr(1) = '-howard' then
112.   begin
113.     Start := Now();
114.     SortCountHoward();
115.     WriteLn(MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
116.   end
117.   else
118.     WriteLn('Usage: occurrencecounter (-akira | -howard) infilename outfilename');
119. end.

Also a generator program for the input file (which gives something a little more heavyweight than what people have been using so far):

Code: Pascal  [Select]
1. program InputGenerator;
2.
3. var
4.   InFile: Text;
5.   I: LongInt;
6.
7. begin
8.   Randomize();
9.   Assign(InFile, 'infile.txt');
10.   Rewrite(InFile);
11.   for I := 1 to 10000000 do
12.     WriteLn(InFile, 1500000000 + Random(800000));
13.   Close(InFile);
14. end.

The TXT file is large, sorting takes more than an hour.

Wait, really? That's thousands and thousands of times too long if it's only 20MB or so.
« Last Edit: July 18, 2019, 04:16:25 am by Akira1364 »

#### avk

• Jr. Member
• Posts: 95
##### Re: Sorting and Counting
« Reply #22 on: July 18, 2019, 03:41:54 am »
@Akira1364, it seems your InputGenerator creates more than 99% of unique values.

#### Akira1364

• Hero Member
• Posts: 522
##### Re: Sorting and Counting
« Reply #23 on: July 18, 2019, 04:13:47 am »
@Akira1364, it seems your InputGenerator creates more than 99% of unique values.

I was kind of going for testing the "worst case scenario", but it does seem a bit too worse case now that you point it out. I just edited it a bit in the comment.
« Last Edit: July 18, 2019, 04:17:31 am by Akira1364 »

#### avk

• Jr. Member
• Posts: 95
##### Re: Sorting and Counting
« Reply #24 on: July 18, 2019, 07:43:50 am »
I slightly changed the Akira1364's program:
Code: Pascal  [Select]
1. program OccurrenceCounter;
2.
3. {\$mode Delphi}
4. {\$ImplicitExceptions Off}
5.
6. uses
7.   SysUtils, DateUtils,
8.   Generics.Defaults, Generics.Collections,
9.   LGUtils, , LGAbstractContainer, LGHashMultiSet, LGArrayHelpers;
10.
11. type
12.   TIntPair = TPair<LongInt, LongInt>;
13.
14.   function ComparePairs(constref L, R: TIntPair): LongInt;
15.   begin
16.     if L.Key < R.Key then
17.       Result := -1
18.     else if L.Key = R.Key then
19.       Result := 0
20.     else
21.       Result := 1;
22.   end;
23.
24. var
25.   Total, Unique: Integer;
26.   Start: TDateTime;
27.
28.   procedure SortCountAkira;
29.   var
30.     I: LongInt;
31.     InOut: Text;
32.     Map: TDictionary<LongInt, LongInt>;
33.     Pair: TIntPair;
34.     Pairs: TArray<TIntPair>;
35.   begin
36.     Map := TDictionary<LongInt, LongInt>.Create();
37.     Map.Capacity := 10000000;
38.     Assign(InOut, ParamStr(1));
39.     Reset(InOut);
40.     while not EOF(InOut) do begin
42.       Inc(Total);
43.       if not Map.ContainsKey(I) then
44.         begin
46.           Inc(Unique);
47.         end
48.       else
49.         Map[I] := Map[I] + 1;
50.     end;
51.     Close(InOut);
52.     Pairs := Map.ToArray();
53.     Map.Free();
54.     TArrayHelper<TIntPair>.Sort(
55.       Pairs,
56.       TComparer<TIntPair>.Construct(ComparePairs)
57.     );
58.     Assign(InOut, ParamStr(2));
59.     Rewrite(InOut);
60.     for Pair in Pairs do with Pair do
61.       WriteLn(InOut, Key, ' - ', Value);
62.     Close(InOut);
63.   end;
64.
65.   procedure SortCountHoward;
66.   var
67.     arr: array of Integer;
68.     textf: TextFile;
69.     min: Integer = High(Integer);
70.     max: Integer = -1;
71.     i: Integer;
72.   begin
73.     AssignFile(textf, ParamStr(1));
74.     Reset(textf);
75.     while not EOF(textf) do
76.       begin
78.         Inc(Total);
79.         if i < min then
80.           min := i;
81.         if i > max then
82.           max := i;
83.       end;
84.     SetLength(arr, max-min+1);
85.
86.     Reset(textf);
87.     while not EOF(textf) do
88.       begin
90.         Dec(i, min);
91.         Inc(arr[i]);
92.       end;
93.     CloseFile(textf);
94.
95.     AssignFile(textf, ParamStr(2));
96.     Rewrite(textf);
97.     for i := Low(arr) to High(arr) do
98.       case (arr[i] > 0) of
99.         True:
100.           begin
101.             WriteLn(textf, Format('%d - %d',[i+min, arr[i]]));
102.             Inc(Unique);
103.           end;
104.       end;
105.     CloseFile(textf);
106.     SetLength(arr, 0);
107.   end;
108.
109.   function EntryCmp(constref L, R: TGMultisetEntry<Integer>): SizeInt;
110.   begin
111.     if L.Key > R.Key then
112.       Result := 1
113.     else
114.       if L.Key < R.Key then
115.         Result := -1
116.       else
117.         Result := 0;
118.   end;
119.
120.   procedure SortCountAvk1;
121.   type
122.     TCounter  = TGHashMultiSetLP<Integer>;
123.     TCountRef = TGAutoRef<TCounter>;
124.     TEntry    = TCounter.TEntry;
125.   var
126.     CountRef: TCountRef;
127.     InOut: Text;
128.     Counter: TCounter;
129.     e: TEntry;
130.     I: Integer;
131.   begin
132.     Counter := CountRef;
133.     Assign(InOut, ParamStr(1));
134.     Reset(InOut);
135.     while not EOF(InOut) do
136.       begin
139.       end;
140.     Close(InOut);
141.     Total := Counter.Count;
142.     Unique := Counter.EntryCount;
143.     if Counter.NonEmpty then
144.       begin
145.         Assign(InOut, ParamStr(2));
146.         Rewrite(InOut);
147.         for e in Counter.Entries.Sorted(@EntryCmp) do
148.           with e do
149.              WriteLn(InOut, Key, ' - ', Count);
150.         Close(InOut);
151.       end;
152.   end;
153.
154.   procedure SortCountAvk2;
155.   var
156.     List: array of Integer;
157.     InOut: Text;
158.     I, J, Count, DupCount: Integer;
159.   begin
160.     Assign(InOut, ParamStr(1));
161.     Reset(InOut);
162.     SetLength(List, 4096);
163.     I := 0;
164.     while not EOF(InOut) do
165.       begin
167.         Inc(Total);
168.         if Length(List) = I then
169.           SetLength(List, I * 2);
170.         List[I] := J;
171.         Inc(I);
172.       end;
173.     Close(InOut);
174.     SetLength(List, I);
175.     if List = nil then
176.       exit;
177.     TGOrdinalArrayHelper<Integer>.Sort(List);
178.     Count := I;
179.     DupCount := 0;
180.     I := 0;
181.     Assign(InOut, ParamStr(2));
182.     Rewrite(InOut);
183.     repeat
184.       J := List[I];
185.       while (I < Count) and (List[I] = J) do
186.         begin
187.           Inc(DupCount);
188.           Inc(I);
189.         end;
190.       WriteLn(InOut, J, ' - ', DupCount);
191.       Inc(Unique);
192.       DupCount := 0;
193.     until I = Count;
194.     Close(InOut);
195.   end;
196.
197.   procedure Run(aProc: TProcedure);
198.   begin
199.     Total := 0;
200.     Unique := 0;
201.     Start := Now;
202.     try
203.       aProc();
204.       WriteLn('elapsed time: ', MilliSecondsBetween(Now(), Start) / 1000.0 : 0 : 4);
205.       WriteLn('total values: ', Total, ', unique values: ', Unique);
206.     except
207.       on e: Exception do
208.         WriteLn('crashes with message "', e.Message, '"');
209.     end;
210.   end;
211.
212. begin
213.   if ParamCount <> 2 then
214.     begin
215.       WriteLn('Usage: occurrencecounter infilename outfilename');
216.       exit;
217.     end;
218.   if not FileExists(ParamStr(1)) then
219.     begin
221.       exit;
222.     end;
223.
224.   WriteLn('running SortCountAkira:');
225.   Run(@SortCountAkira);
226.   WriteLn;
227.
228.   WriteLn('running SortCountHoward:');
229.   Run(@SortCountHoward);
230.   WriteLn;
231.
232.
233.   WriteLn('running SortCountAvk1:');
234.   Run(@SortCountAvk1);
235.   WriteLn;
236.
237.   WriteLn('running SortCountAvk2:');
238.   Run(@SortCountAvk2);
239. end.
240.
using current version of his InputGenerator output is(32-bit compiler):
Code: Text  [Select]
1. running SortCountAkira:
2. elapsed time: 4.7730
3. total values: 10000000, unique values: 799999
4.
5. running SortCountHoward:
6. elapsed time: 6.0220
7. total values: 10000000, unique values: 799999
8.
9. running SortCountAvk1:
10. elapsed time: 3.7910
11. total values: 10000000, unique values: 799999
12.
13. running SortCountAvk2:
14. elapsed time: 2.9480
15. total values: 10000000, unique values: 799999
16.
using previous version:
Code: Text  [Select]
1. running SortCountAkira:
2. elapsed time: 14.5700
3. total values: 10000000, unique values: 9976599
4.
5. running SortCountHoward:
6. crashes with message "Invalid pointer operation"
7.
8. running SortCountAvk1:
9. elapsed time: 8.1750
10. total values: 10000000, unique values: 9976599
11.
12. running SortCountAvk2:
13. elapsed time: 6.6450
14. total values: 10000000, unique values: 9976599
15.
« Last Edit: July 18, 2019, 11:34:51 am by avk »

• Hero Member
• Posts: 8521
##### Re: Sorting and Counting
« Reply #25 on: July 18, 2019, 08:37:51 am »
@ AVK
You can optimize a little with {\$I-}{\$H-} or did you already do that? {\$I-} speeds up writeln consideably.
Also I wouldn't use Randomize() but use a fixed seed, so the file becomes reproducable.
« Last Edit: July 18, 2019, 08:48:36 am by Thaddy »
Read the manuals and if you are a professional get a proper education in computer science. Makes the forum a lot cleaner.

#### avk

• Jr. Member
• Posts: 95
##### Re: Sorting and Counting
« Reply #26 on: July 18, 2019, 09:25:34 am »
@Thaddy, I think it does not make much sense. The idea was to show that for this task there are ways to solve without shamanic dances with a tambourine and in a reasonable time.

#### julkas

• Sr. Member
• Posts: 314
• KISS principle / Lazarus 2.0.0 / FPC 3.0.4
##### Re: Sorting and Counting
« Reply #27 on: July 18, 2019, 10:00:23 am »
GMap random test. Out file size ~ 146 MB
Code: Pascal  [Select]
1. program sc;
2. {\$mode delphi}
3. uses gmap, gutil, SysUtils;
4. const
5.   keyNum = 10000000;
6. type
7.   TIntLess = TLess<LongInt>;
8.   TDict = TMap<LongInt, LongInt, TIntLess>;
9. var
10.   sc: TDict;
11.   scit: TDict.TIterator;
12.   i: LongInt;
13.   key, cnt: LongInt;
14.   start: QWord;
15.   outFile: Text;
16. begin
17.   start := GetTickCount64();
18.   sc := TDict.Create;
19.   for i := 0 to keyNum do
20.   begin
21.     key := Random(2147483647);
22.     cnt := 0;
23.     sc.TryGetValue(key, cnt);
24.     sc[key] := cnt + 1;
25.   end;
26.   WriteLn('Populated (ticks) - ', GetTickCount64() - start);
27.   WriteLn('Uniq keys - ', sc.Size, ', out of - ', keyNum);
28.   Assign(outFile, 'out.txt');
29.   Rewrite(outFile);
30.   scit := sc.Min;
31.   repeat
32.     WriteLn(outFile, scit.Key, ' - ', scit.Value)
33.   until not scit.Next;
34.   Close(outFile);
35.   scit.Destroy;
36.   sc.Destroy;
37.   WriteLn('Total (ticks) - ', GetTickCount64() - start);
39. end.
40.
Console output -
Code: Text  [Select]
1. Populated (ticks) - 17297
2. Uniq keys - 9976566, out of - 10000000
3. Total (ticks) - 23906
procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
mov rax, a
mov rdx, b
mul rdx
mov [clo], rax
mov [chi], rdx
end;
(* Pointer game *) Inc(ptr, 1); (* vs *) ptr := ptr + 1;

#### julkas

• Sr. Member
• Posts: 314
• KISS principle / Lazarus 2.0.0 / FPC 3.0.4
##### Re: Sorting and Counting
« Reply #28 on: July 18, 2019, 11:06:58 am »
Thank you gentlemen. At the moment I'm testing the Engkin algorithm. The TXT file is large, sorting takes more than an hour. Tomorrow I will check your other suggestions. I do not really care about speed, because it's not for the program user only for me. It is important that the result is correct.
Can you share input file?
procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
mov rax, a
mov rdx, b
mul rdx
mov [clo], rax
mov [chi], rdx
end;
(* Pointer game *) Inc(ptr, 1); (* vs *) ptr := ptr + 1;

#### SymbolicFrank

• Hero Member
• Posts: 635
##### Re: Sorting and Counting
« Reply #29 on: July 18, 2019, 12:39:15 pm »
I did this for more than 25 million postal codes. It was really slow with the default containers and took a lot of memory. I made a custom class that used a smart way to insert new values, which reduced it to less than 15 minutes and 2 GB memory usage. But I stored more than one number in more than one container. Requesting a postal code was nearly instant.

I kept a small, loose index and remembered the last few insertions. After that, I just tried to insert a new entry halfway between the two nearest indexed items, and then repeatedly halfway until a hit was found. Which took less than 5 tries on average.