### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook

### Author Topic: Count occurrences of elements in a stringlist  (Read 1858 times)

#### maurobio

• Sr. Member
• Posts: 304
• Ecology is everything.
##### Count occurrences of elements in a stringlist
« on: September 23, 2021, 02:45:47 pm »
Dear ALL,

Given a list of names (eg. MARY, JANE, LUCY, MARY, PAM, ALIX, JANE) I would like to create another list where each element would be a name/value pair comprising the name and the number of times it appears in the list, eg. MARY=2, JANE=2, LUCY=1, PAM=1, ALIX=1.

Here it my code to do it:

Code: Pascal  [Select][+][-]
1. program Tally;
2.
3. {\$APPTYPE CONSOLE}
4. {\$mode objfpc}{\$H+}
5.
6. uses SysUtils, Classes;
7.
8. var
9.   names, counts: TStringList;
10.   i, t: integer;
11.
12. function search(arr: TStringList; s: string; n: integer): integer;
13. var
14.   j, counter: integer;
15. begin
16.   counter := 0;
17.   for j := 0 to n - 1 do
18.         if arr[j] = s then
19.                 Inc(counter);
20.   Result := counter;
21. end;
22.
23. begin
24.   counts := TStringList.Create;
25.   counts.Sorted := True;
26.   counts.Duplicates := dupIgnore;
27.   names := TStringList.Create;
35.   for i := 0 to names.Count - 1 do
36.   begin
37.     t := search(names, names[i], names.Count);
38.         counts.Add(names[i] + '=' + IntToStr(t));
39.   end;
40.   WriteLn(counts.DelimitedText);
41.   names.Free;
42.   counts.Free;
43. end.
44.
The code above works correctly, but it looks a little bit ugly. Could someone suggest a more elegrant (and possibly shorter) way of doing this?

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

#### avk

• Hero Member
• Posts: 505
##### Re: Count occurrences of elements in a stringlist
« Reply #1 on: September 23, 2021, 03:28:12 pm »
Don't know if you will like something like this:
Code: Pascal  [Select][+][-]
1. program Tally;
2. {\$mode objfpc}{\$H+}
3. uses
4.   Classes, SysUtils, lgUtils, lgHashMultiSet;
5. type
6.   TCounter = specialize TGLiteChainHashMultiSet<string, string>.TMultiSet;
7. var
8.   List: specialize TGAutoRef<TStringList>;
9.   Counter: TCounter;
10.   s: string;
11.   e: TCounter.TEntry;
12. begin
13.   with List.Instance do
14.     begin
22.     end;
23.   for s in List.Instance do
25.   for e in Counter.Entries do
26.     WriteLn(e.Key, ', ', e.Count);
27. end.
28.

Just add LGenerics dependency to the project.

#### maurobio

• Sr. Member
• Posts: 304
• Ecology is everything.
##### Re: Count occurrences of elements in a stringlist
« Reply #2 on: September 23, 2021, 04:08:57 pm »
Hi,@avk!

Thanks for your suggestion. It is shorter, but not necessarily simpler, than my own solution. Really, I would prefer working only with TStringLists...

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

#### Zvoni

• Hero Member
• Posts: 809
##### Re: Count occurrences of elements in a stringlist
« Reply #3 on: September 23, 2021, 04:09:19 pm »
Another way would be before adding an item to use TStringList.Find to lookup if it exists, and then use AddPair (inherited vom TStrings)
Aircode
Code: Pascal  [Select][+][-]
1. If MyStringList.Find('MARY', OutIndex) Then
2.    MyStringList.AddPair('MARY',MyStringList.ValueFromIndex[OutIndex]+1)  //Second Param has to be casted to String
3. Else
5.
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

#### maurobio

• Sr. Member
• Posts: 304
• Ecology is everything.
##### Re: Count occurrences of elements in a stringlist
« Reply #4 on: September 23, 2021, 04:15:13 pm »
Hi, @Zvoni!

Thanks, that seems to be the sort of 'minimalist' solution I was looking for! I'll try it ASAP.

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

#### Zvoni

• Hero Member
• Posts: 809
##### Re: Count occurrences of elements in a stringlist
« Reply #5 on: September 23, 2021, 04:17:52 pm »
Maybe even derive your own class from TStringList, and Override the Add-Function.
Inside that Function you call inherited Find, if Yes/No then call inherited AddPair with the Params above
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

#### avk

• Hero Member
• Posts: 505
##### Re: Count occurrences of elements in a stringlist
« Reply #6 on: September 23, 2021, 04:26:59 pm »
...
Thanks for your suggestion. It is shorter, but not necessarily simpler, than my own solution. Really, I would prefer working only with TStringLists...
...

It could be made even shorter
Code: Pascal  [Select][+][-]
1. program Tally;
2. {\$mode objfpc}{\$H+}
3. uses
4.   Classes, SysUtils, lgUtils, lgHashMultiSet;
5. type
6.   TCounter = specialize TGLiteChainHashMultiSet<string, string>.TMultiSet;
7. var
8.   Counter: TCounter;
9.   e: TCounter.TEntry;
10. begin
11.   Counter.AddAll(['MARY', 'JANE', 'LUCY', 'MARY', 'ALIX', 'JANE']);
12.   for e in Counter.Entries do
13.     WriteLn(e.Key, ', ', e.Count);
14. end.
15.

In fact, sooner or later, the data size usually becomes unacceptable for TStringLists, so ...

#### maurobio

• Sr. Member
• Posts: 304
• Ecology is everything.
##### Re: Count occurrences of elements in a stringlist
« Reply #7 on: September 23, 2021, 04:51:40 pm »
Dear @avk,

At first, I would avoid the use of generics (as I do not have much experience with using them anyway), but you made an important point by calling attention to the degradation in performance of stringlists for larger lists. As I will be analyzing short DNA sequences (on the order of 500-800 base pairs), where the 'words' are codons (DNA triplets), the stringlists limits may indeed become apparent. I will have to run tests with real data to be sure.

Thanks a lot for your time!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

#### maurobio

• Sr. Member
• Posts: 304
• Ecology is everything.
##### Re: Count occurrences of elements in a stringlist
« Reply #8 on: September 23, 2021, 05:00:51 pm »
Another way would be before adding an item to use TStringList.Find to lookup if it exists, and then use AddPair (inherited vom TStrings)
Aircode
Code: Pascal  [Select][+][-]
1. If MyStringList.Find('MARY', OutIndex) Then
2.    MyStringList.AddPair('MARY',MyStringList.ValueFromIndex[OutIndex]+1)  //Second Param has to be casted to String
3. Else
5.

@Zvoni,

I got a compilation error (Error: identifier idents no member "AddPair") when trying your suggestion. As you pointed out, this method is inherited from TStrings, so what may be wrong?

Best regards,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

#### Zvoni

• Hero Member
• Posts: 809
##### Re: Count occurrences of elements in a stringlist
« Reply #9 on: September 23, 2021, 05:50:45 pm »
1) i have to ammend my solution
Line 2 above should be something like
Code: Pascal  [Select][+][-]
1. MyStringList.ValueFromIndex[OutIndex]:= MyStringList.ValueFromIndex[OutIndex]+1;
2.
2) Strange. AddPair is definitely a public function of TStrings, so No idea
One System to rule them all, One Code to find them,
One IDE to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
Code is like a joke: If you have to explain it, it's bad

#### skalogryz

• Global Moderator
• Hero Member
• Posts: 2718
##### Re: Count occurrences of elements in a stringlist
« Reply #10 on: September 23, 2021, 06:38:12 pm »
Search is sorting, sorting is search:
Code: Pascal  [Select][+][-]
1. procedure DoCount(names, count: TStringList);
2. var
3.   t : TStringList;
4.   i : integer;
5.   c : integer;
6.   n : string;
7. begin
8.   if names.Count=0 then Exit;
9.   t := TStringList.Create;
10.   try
11.     t.Assign(names);
12.     t.Sort;
13.     n:=t[0];
14.     c:=1;
15.     for i:=1 to t.Count-1 do begin
16.       if n <> t[i] then begin
18.         n:=t[i];
19.         c:=1;
20.       end else
21.         inc(c);
22.     end;
24.   finally
25.     t.Free;
26.   end;
27. end;
28.
29. begin
30.   counts := TStringList.Create;
31.   names := TStringList.Create;
39.   DoCount(names, counts);
40.   WriteLn(counts.DelimitedText);
41.   names.Free;
42.   counts.Free;
43. end.
44.

#### wp

• Hero Member
• Posts: 9051
##### Re: Count occurrences of elements in a stringlist
« Reply #11 on: September 23, 2021, 11:54:33 pm »
At first, I would avoid the use of generics (as I do not have much experience with using them anyway), but you made an important point by calling attention to the degradation in performance of stringlists for larger lists. As I will be analyzing short DNA sequences (on the order of 500-800 base pairs), where the 'words' are codons (DNA triplets), the stringlists limits may indeed become apparent. I will have to run tests with real data to be sure.
Yes, I try to keep away from generics, too.

The attached demo (not as elegant as skalogryz's) shows that a standard stringlist can count 1,000,000 words of your first commit within less than a second. Sorting the destination list is much faster in case of a long word list due to enabling the quick sort algorithm for finding duplicates, but it is hardly noticeable with the short word list of this example.

In the example the word count is stuffed into the Objects of the stringlist items after casting the pointer to PtrInt integers. I agree that this is an ugly hack which I should not show you. But something like that is used quite often for quick-and-dirty solutions.
« Last Edit: September 23, 2021, 11:58:31 pm by wp »
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

#### maurobio

• Sr. Member
• Posts: 304
• Ecology is everything.
##### Re: Count occurrences of elements in a stringlist
« Reply #12 on: September 24, 2021, 02:09:23 am »
Hi, @skalogryz, @wp!

@skalogryz's solution is in fact elegant and easier to understand (I like the idea of "Search is sorting, sorting is search").

In another application, I have been able to handle successfully stringlists of about two thousands records, so I do agree to @wp that such lists can in fact be used in real applications without any (or too much) degradation in performance,

Thank you ALL very much!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

#### avk

• Hero Member
• Posts: 505
##### Re: Count occurrences of elements in a stringlist
« Reply #13 on: September 24, 2021, 03:41:43 pm »
At first, I would avoid the use of generics (as I do not have much experience with using them anyway), but you made an important point by calling attention to the degradation in performance of stringlists for larger lists. As I will be analyzing short DNA sequences (on the order of 500-800 base pairs), where the 'words' are codons (DNA triplets), the stringlists limits may indeed become apparent. I will have to run tests with real data to be sure.
Yes, I try to keep away from generics, too.
...

Why?
I would like to believe that generics is one of the best things that has happened to Pascal in recent times.

As for using TStringList for such tasks, it may seem that the elapsed time of a little less than a second is a good result, but it is not.
I slightly changed your example(in the attachment) added multiset and DoCount() from skalogryz and the ability to set different sizes of the dictionary.
Here is an example of the output on my machine:
Code: Text  [Select][+][-]
1. Dictionary size: 5
2. List with 1000000 words created, duration 0.070 s
3. TStringList(wp): 5 distinct words were found, duration 0.00.840 s
4. DoCount(): 5 distinct words were found, duration 06.755 s
5. HashMultiset: 5 distinct words were found, duration 0.030 s
6.
7. Dictionary size: 500
8. List with 1000000 words created, duration 0.070 s
9. TStringList(wp): 500 distinct words were found, duration 0.03.180 s
10. DoCount(): 500 distinct words were found, duration 08.085 s
11. HashMultiset: 500 distinct words were found, duration 0.040 s
12.
13. Dictionary size: 500000
14. List with 1000000 words created, duration 0.200 s
15. TStringList(wp): 432189 distinct words were found, duration 1.09.537 s
16. DoCount(): 432189 distinct words were found, duration 13.411 s
17. HashMultiset: 432189 distinct words were found, duration 0.250 s
18.

#### y.ivanov

• Sr. Member
• Posts: 306
##### Re: Count occurrences of elements in a stringlist
« Reply #14 on: September 24, 2021, 05:28:22 pm »
Why?
I would like to believe that generics is one of the best things that has happened to Pascal in recent times.
I'm sharing the same belief, generics rocks!

Anyway, if you want to avoid them, the same is achievable with the containers library:
Code: Pascal  [Select][+][-]
1. program Project1;
2.
3. uses
4.   Contnrs;
5.
6. var
7.   HT: TFPDataHashTable;
8.
10.   begin
11.     HT[AKey] := Pointer(PtrUInt(HT[AKey]) + 1);
12.   end;
13.
14.   procedure Print(Item: Pointer; const Key: string; var Continue: Boolean);
15.   begin
16.     WriteLn(Key, ', ', PtrUInt(Item));
17.   end;
18.
19. begin
20.   HT := TFPDataHashTable.Create;
21.