### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook

### Author Topic: Sorting and Counting  (Read 21098 times)

#### mpknap

• Full Member
• Posts: 155
##### Sorting and Counting
« on: July 16, 2019, 07:35:20 am »
Hello.
I have a text file and in it unixtime times,
this is how it looks:

file.txt
1562551080

1562551140

1562551260

1562551260

1562551260

1562551260

1562551320

1562551380
.......

The file is very large. about 20MB.

Many lines are repeated, others are only once. The task that I want to do is to calculate how many times it is the same. So make an array with Unix time and the number of repetitions for a given time.

After sorting and calculation it should be like this:
1562551080 - 1

1562551140 - 1

1562551260 - 4

1562551320 -1

1562551380 -1
......

Any ideas?

#### mangakissa

• Hero Member
• Posts: 1113
##### Re: Sorting and Counting
« Reply #1 on: July 16, 2019, 08:17:48 am »
You give the solution by yourself; a two dimensional array.
Lazarus 2.06 (64b) / FPC 3.0.4 / Windows 10
stucked on Delphi 10.3.1

#### julkas

• Guest
##### Re: Sorting and Counting
« Reply #2 on: July 16, 2019, 09:07:00 am »
Possible approach - ordered associative array with uniq keys.
GMap from fcl-stl - https://github.com/graemeg/freepascal/blob/master/packages/fcl-stl/doc/main.pdf

#### avk

• Sr. Member
• Posts: 407
##### Re: Sorting and Counting
« Reply #3 on: July 16, 2019, 09:11:17 am »

#### mangakissa

• Hero Member
• Posts: 1113
##### Re: Sorting and Counting
« Reply #4 on: July 16, 2019, 04:20:30 pm »
Possible approach - ordered associative array with uniq keys.
GMap from fcl-stl - https://github.com/graemeg/freepascal/blob/master/packages/fcl-stl/doc/main.pdf
FGL is standard with TFPGMap in it.

But this is easy to do with an 2 dimensional array. After finding the doubles, you can sort in every way yo want.
Lazarus 2.06 (64b) / FPC 3.0.4 / Windows 10
stucked on Delphi 10.3.1

#### jamie

• Hero Member
• Posts: 4342
##### Re: Sorting and Counting
« Reply #5 on: July 16, 2019, 10:47:22 pm »
Why do I think this is home work ?

In any case I would use a TStringList, load it, sort it using the features of the TStringList and maybe even use the object of each as a counter.

The only true wisdom is knowing you know nothing

#### winni

• Hero Member
• Posts: 2280
##### Re: Sorting and Counting
« Reply #6 on: July 16, 2019, 11:29:05 pm »
Yes, as Jamie said:

* load it into a stringlist
* set the stringlist.sorted to true - and wait, 20 MB mlight take some time
* count the duplicates
* and now: write them with your format: ( time - count  ) into a second stringlist
* save it to disk - done

In case of Linux/Unix: this goes really faster with the shell and sort and uniq …….....

Winni

#### mpknap

• Full Member
• Posts: 155
##### Re: Sorting and Counting
« Reply #7 on: July 17, 2019, 07:00:05 am »
Why do I think this is home work ?

In any case I would use a TStringList, load it, sort it using the features of the TStringList and maybe even use the object of each as a counter.

In a sense, it's homework, but it's for me. I'm not a programmer by education, just a hobby. In life I am Bob the Builder . I like to write something in my spare time, probably only to prove myself. I am interested in the CREDO project, I use their data. This time I want to juxtapose them with the detection of global earthquakes, I wonder if it has any connection, although nobody writes about it .

Here are some tools for the CREDO project https://github.com/credo-science/Windows-Tools

And the CREDO project itself here: https://credo.science/

Yes, I started, but I do not know how to count duplicates and how to do an array with the results. I have so much code for now. Good start?

Code: Pascal  [Select][+][-]
1. procedure TForm1.Button1Click(Sender: TObject);
2. const
3.   BLOCK_SIZE = 10000;
4.
5. var
6.   f: textfile;
7.   Times: TStringList;
8.   det, separator: string;
9.   i: integer;
10.
11. begin
12.   czas := TStringList.Create;
13.   czas.Duplicates := dupIgnore;
14.   czas.Sorted := True;
15.
16.   AssignFile(f, 'Base_total.txt');
17.   reset(f);
18.
19.   SetLength(det, 0);
20.   FillChar(det, SizeOf(det), 0);
21.   while not EOF(f) do
22.   begin
23.
24.     if i mod BLOCK_SIZE = 0 then
25.       SetLength(det, Length(det) + BLOCK_SIZE);
26.
30.     Inc(i);
31.   end;
32.   closefile(f);
33. end;
« Last Edit: July 17, 2019, 07:12:51 am by mpknap »

#### engkin

• Hero Member
• Posts: 2659
##### Re: Sorting and Counting
« Reply #8 on: July 17, 2019, 08:47:31 am »
Maybe this:
Code: Pascal  [Select][+][-]
1. procedure CountDuplicates(AFileName, AResultFileName: String);
2. var
3.   List, Res: TStringList;
4.   Item: String = '';
5.   Count: integer = 0;
6.   i: Integer = 0;
7. begin
8.   List := TStringList.Create;
9.   Res := TStringList.Create;
10.   try
12.     List.Sorted := True;
13.     while i<List.Count do
14.     begin
15.       // Get current item
16.       Item := List[i];
17.
18.       // Count similar items
19.       Count := 0;
20.       while (i<List.Count) and (List[i]=Item) do
21.       begin
22.         inc(i);
23.         inc(count);
24.       end;
25.
27.       Res.Add(Format('%s - %d', [Item, Count]));
28.     end;
29.
30.     Res.SaveToFile(AResultFileName);
31.   finally
32.     List.Free;
33.     Res.Free;
34.   end;
35. end;
36.

To use it:
Code: Pascal  [Select][+][-]
1.   CountDuplicates('in.txt','out.txt');

The result would be saved in out.txt file.
Notice that the first result is going to be the number of empty lines, if any.
« Last Edit: July 17, 2019, 08:51:37 am by engkin »

#### mangakissa

• Hero Member
• Posts: 1113
##### Re: Sorting and Counting
« Reply #9 on: July 17, 2019, 08:53:44 am »
Code: Pascal  [Select][+][-]
1. type
2.
3.   TMyTime = packed record
4.     Unixtime : string;
5.     Counter  : word;
6.   end;
7.
8. implementation
9.
10. procedure TForm1.Button1Click(Sender: TObject);
11. var f      : textfile;
12.     myline : string;
13.     MyTime : array of TMyTime;
14.     index  : integer;
15. begin
16.   assignfile(f,'file.txt');
17.   reset(f);
18.   index := 0;
19.   while not eof(f) do
20.   begin
22.     if not find(MyTime, myline) then
23.     begin
24.       index := index + 1;
25.       setlength(MyTime,index);
26.       MyTime[index - 1].Unixtime := myLine;
27.       MyTime[index - 1].Counter := 1;
28.     end;
29.   end;
30.   closefile(f);
31.   //use bubblesort, quicksort, heapsort or other sort
32.   for index := 0 to length(myTime) - 1 do
33.     memo1.Lines.add(format('time : %s   duplicates : %3d',[myTime[index].Unixtime,myTime[index].Counter]));
34. end;
35.
36. function TForm1.Find(var aMyTime : array of TMytime; const aLine : string) : boolean;
37. var index : integer;
38. begin
39.   result := false;
40.   if length(aMyTime) > 0 then
41.   begin
42.     for index := low(aMyTime) to high(aMyTime) do
43.     begin
44.       if aMyTime[index].Unixtime = aLine then
45.       begin
46.         aMyTime[index].Counter := aMyTime[index].Counter + 1;
47.         result := true;
48.         break;
49.       end;
50.     end;
51.   end;
52. end;
53.
54. end.
55.
stringlist is nice but for small files. and uses a lot of resources for nothing. Okay, things already built in like sort, but in this way you actually see what you're doing.
Lazarus 2.06 (64b) / FPC 3.0.4 / Windows 10
stucked on Delphi 10.3.1

#### avk

• Sr. Member
• Posts: 407
##### Re: Sorting and Counting
« Reply #10 on: July 17, 2019, 01:32:40 pm »
Well, I'm glad to see that I'm not the only one who believes that TStringList is not quite suitable for this task (if only because unixtime is a number, and sorting numbers is much faster than sorting strings). However, the task of counting duplicate values ​​is fairly common, and the standard library should have a ready solution for it.

#### howardpc

• Hero Member
• Posts: 3672
##### Re: Sorting and Counting
« Reply #11 on: July 17, 2019, 02:13:18 pm »
I think TStringList is eminently suitable for this task.
Here's an alternative solution, which may use less resources.
Code: Pascal  [Select][+][-]
1. unit mainSortCount;
2.
3. {\$mode objfpc}{\$H+}
4.
5. interface
6.
7. uses
8.    Classes, SysUtils, Forms, StdCtrls;
9.
10. type
11.    TForm1 = class(TForm)
12.      Memo1: TMemo;
13.      procedure FormCreate(Sender: TObject);
14.    end;
15.
16. var
17.    Form1: TForm1;
18.
19.   procedure SortCount(const anInFile: String; out aList: TStringList);
20.
21.   procedure ShowListInMemo(constref aList: TStringList; aMemo: TMemo);
22.
23. implementation
24.
25. {\$R *.lfm}
26.
27. { TForm1 }
28.
29. procedure SortCount(const anInFile: String; out aList: TStringList);
30. const
31.   one = PtrUInt(1);
32. var
33.   textf: TextFile;
34.   s: String;
35.   idx: Integer;
36.
37.   function GetSuccObj(anIntObj: TObject): TObject;
38.   var
39.     i: PtrUInt absolute anIntObj;
40.   begin
41.     Inc(i);
42.     Exit(anIntObj);
43.   end;
44.
45. begin
46.   Assert(FileExists(anInFile), 'cannot find file "'+anInFile+'"');
47.   aList := TStringList.Create;
48.   aList.Duplicates := dupError;
49.   aList.Sorted := True;
50.   AssignFile(textf, anInFile);
51.   try
52.     Reset(textf);
53.     while not EOF(textf) do
54.       begin
56.         s := Trim(s);
57.         idx := aList.IndexOf(s);
58.         case idx of
60.           else
61.             aList.Objects[idx] := GetSuccObj(aList.Objects[idx]);
62.         end;
63.       end;
64.   finally
65.     CloseFile(textf);
66.   end;
67. end;
68.
69. procedure ShowListInMemo(constref aList: TStringList; aMemo: TMemo);
70. var
71.   i: Integer;
72. begin
73.   if Assigned(aList) and Assigned(aMemo) then
74.     begin
75.       aMemo.Clear;
76.       for i := 0 to aList.Count-1 do
77.         aMemo.Lines.Add('%s - %d', [aList[i], PtrUInt(aList.Objects[i])]);
78.     end;
79. end;
80.
81. procedure TForm1.FormCreate(Sender: TObject);
82. var
83.   sl: TStringList;
84. begin
85.   SortCount('infile.txt', sl);
86.   try
87.     ShowListInMemo(sl, Memo1);
88.     Memo1.Lines.SaveToFile('outfile.txt');
89.   finally
90.     sl.Free;
91.   end;
92. end;
93.
94. end.

#### avk

• Sr. Member
• Posts: 407
##### Re: Sorting and Counting
« Reply #12 on: July 17, 2019, 03:20:20 pm »
Unfortunately, it’s not only a matter of resources: on a dataset of the size specified above, your solution is 4 times(90 s.) slower than engkin's  one(22 s.)  and it is prohibitively slow for a task of this size.

#### julkas

• Guest
##### Re: Sorting and Counting
« Reply #13 on: July 17, 2019, 05:29:50 pm »
GMap is fast enough. I will post my random test tomorrow.

#### marcov

• Global Moderator
• Hero Member
• Posts: 9210
• FPC developer.
##### Re: Sorting and Counting
« Reply #14 on: July 17, 2019, 05:56:51 pm »
gmap might depend on the number of unique values. If that gets large it might also slow down.