Recent

Author Topic: Count occurrences of elements in a stringlist  (Read 1615 times)

maurobio

  • Sr. Member
  • ****
  • Posts: 294
  • Ecology is everything.
    • GitHub
Count occurrences of elements in a stringlist
« on: September 23, 2021, 02:45:47 pm »
Dear ALL,

Given a list of names (eg. MARY, JANE, LUCY, MARY, PAM, ALIX, JANE) I would like to create another list where each element would be a name/value pair comprising the name and the number of times it appears in the list, eg. MARY=2, JANE=2, LUCY=1, PAM=1, ALIX=1.

Here it my code to do it:

Code: Pascal  [Select][+][-]
  1. program Tally;
  2.  
  3. {$APPTYPE CONSOLE}
  4. {$mode objfpc}{$H+}
  5.  
  6. uses SysUtils, Classes;
  7.  
  8. var
  9.   names, counts: TStringList;
  10.   i, t: integer;
  11.  
  12. function search(arr: TStringList; s: string; n: integer): integer;
  13. var
  14.   j, counter: integer;
  15. begin
  16.   counter := 0;
  17.   for j := 0 to n - 1 do
  18.         if arr[j] = s then
  19.                 Inc(counter);
  20.   Result := counter;
  21. end;  
  22.  
  23. begin
  24.   counts := TStringList.Create;
  25.   counts.Sorted := True;
  26.   counts.Duplicates := dupIgnore;
  27.   names := TStringList.Create;
  28.   names.Add('MARY');
  29.   names.Add('JANE');
  30.   names.Add('LUCY');
  31.   names.Add('MARY');
  32.   names.Add('PAM');
  33.   names.Add('ALIX');
  34.   names.Add('JANE');
  35.   for i := 0 to names.Count - 1 do
  36.   begin
  37.     t := search(names, names[i], names.Count);
  38.         counts.Add(names[i] + '=' + IntToStr(t));
  39.   end;
  40.   WriteLn(counts.DelimitedText);
  41.   names.Free;
  42.   counts.Free;
  43. end.
  44.  
The code above works correctly, but it looks a little bit ugly. Could someone suggest a more elegrant (and possibly shorter) way of doing this?

Thanks in advance!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

avk

  • Sr. Member
  • ****
  • Posts: 489
    • my self-education project
Re: Count occurrences of elements in a stringlist
« Reply #1 on: September 23, 2021, 03:28:12 pm »
Don't know if you will like something like this:
Code: Pascal  [Select][+][-]
  1. program Tally;
  2. {$mode objfpc}{$H+}
  3. uses
  4.   Classes, SysUtils, lgUtils, lgHashMultiSet;
  5. type
  6.   TCounter = specialize TGLiteChainHashMultiSet<string, string>.TMultiSet;
  7. var
  8.   List: specialize TGAutoRef<TStringList>;
  9.   Counter: TCounter;
  10.   s: string;
  11.   e: TCounter.TEntry;
  12. begin
  13.   with List.Instance do
  14.     begin
  15.       Add('MARY');
  16.       Add('JANE');
  17.       Add('LUCY');
  18.       Add('MARY');
  19.       Add('PAM');
  20.       Add('ALIX');
  21.       Add('JANE');
  22.     end;
  23.   for s in List.Instance do
  24.     Counter.Add(s);
  25.   for e in Counter.Entries do
  26.     WriteLn(e.Key, ', ', e.Count);
  27. end.
  28.  

Just add LGenerics dependency to the project.

maurobio

  • Sr. Member
  • ****
  • Posts: 294
  • Ecology is everything.
    • GitHub
Re: Count occurrences of elements in a stringlist
« Reply #2 on: September 23, 2021, 04:08:57 pm »
Hi,@avk!

Thanks for your suggestion. It is shorter, but not necessarily simpler, than my own solution. Really, I would prefer working only with TStringLists...  :-[

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Zvoni

  • Hero Member
  • *****
  • Posts: 732
Re: Count occurrences of elements in a stringlist
« Reply #3 on: September 23, 2021, 04:09:19 pm »
Another way would be before adding an item to use TStringList.Find to lookup if it exists, and then use AddPair (inherited vom TStrings)
Aircode
Code: Pascal  [Select][+][-]
  1. If MyStringList.Find('MARY', OutIndex) Then
  2.    MyStringList.AddPair('MARY',MyStringList.ValueFromIndex[OutIndex]+1)  //Second Param has to be casted to String
  3. Else
  4.    MyStringList.AddPair('MARY','1');
  5.  
One System to rule them all, One IDE to find them,
One Code to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
People call me crazy, because i'm jumping out of perfectly fine aircraft

maurobio

  • Sr. Member
  • ****
  • Posts: 294
  • Ecology is everything.
    • GitHub
Re: Count occurrences of elements in a stringlist
« Reply #4 on: September 23, 2021, 04:15:13 pm »
Hi, @Zvoni!

Thanks, that seems to be the sort of 'minimalist' solution I was looking for! I'll try it ASAP.

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Zvoni

  • Hero Member
  • *****
  • Posts: 732
Re: Count occurrences of elements in a stringlist
« Reply #5 on: September 23, 2021, 04:17:52 pm »
Maybe even derive your own class from TStringList, and Override the Add-Function.
Inside that Function you call inherited Find, if Yes/No then call inherited AddPair with the Params above
One System to rule them all, One IDE to find them,
One Code to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
People call me crazy, because i'm jumping out of perfectly fine aircraft

avk

  • Sr. Member
  • ****
  • Posts: 489
    • my self-education project
Re: Count occurrences of elements in a stringlist
« Reply #6 on: September 23, 2021, 04:26:59 pm »
...
Thanks for your suggestion. It is shorter, but not necessarily simpler, than my own solution. Really, I would prefer working only with TStringLists...  :-[
...

It could be made even shorter
Code: Pascal  [Select][+][-]
  1. program Tally;
  2. {$mode objfpc}{$H+}
  3. uses
  4.   Classes, SysUtils, lgUtils, lgHashMultiSet;
  5. type
  6.   TCounter = specialize TGLiteChainHashMultiSet<string, string>.TMultiSet;
  7. var
  8.   Counter: TCounter;
  9.   e: TCounter.TEntry;
  10. begin
  11.   Counter.AddAll(['MARY', 'JANE', 'LUCY', 'MARY', 'ALIX', 'JANE']);
  12.   for e in Counter.Entries do
  13.     WriteLn(e.Key, ', ', e.Count);
  14. end.
  15.  

In fact, sooner or later, the data size usually becomes unacceptable for TStringLists, so ...

maurobio

  • Sr. Member
  • ****
  • Posts: 294
  • Ecology is everything.
    • GitHub
Re: Count occurrences of elements in a stringlist
« Reply #7 on: September 23, 2021, 04:51:40 pm »
Dear @avk,

At first, I would avoid the use of generics (as I do not have much experience with using them anyway), but you made an important point by calling attention to the degradation in performance of stringlists for larger lists. As I will be analyzing short DNA sequences (on the order of 500-800 base pairs), where the 'words' are codons (DNA triplets), the stringlists limits may indeed become apparent. I will have to run tests with real data to be sure.

Thanks a lot for your time!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

maurobio

  • Sr. Member
  • ****
  • Posts: 294
  • Ecology is everything.
    • GitHub
Re: Count occurrences of elements in a stringlist
« Reply #8 on: September 23, 2021, 05:00:51 pm »
Another way would be before adding an item to use TStringList.Find to lookup if it exists, and then use AddPair (inherited vom TStrings)
Aircode
Code: Pascal  [Select][+][-]
  1. If MyStringList.Find('MARY', OutIndex) Then
  2.    MyStringList.AddPair('MARY',MyStringList.ValueFromIndex[OutIndex]+1)  //Second Param has to be casted to String
  3. Else
  4.    MyStringList.AddPair('MARY','1');
  5.  

@Zvoni,

I got a compilation error (Error: identifier idents no member "AddPair") when trying your suggestion. As you pointed out, this method is inherited from TStrings, so what may be wrong?

Best regards,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Zvoni

  • Hero Member
  • *****
  • Posts: 732
Re: Count occurrences of elements in a stringlist
« Reply #9 on: September 23, 2021, 05:50:45 pm »
1) i have to ammend my solution
Line 2 above should be something like
Code: Pascal  [Select][+][-]
  1. MyStringList.ValueFromIndex[OutIndex]:= MyStringList.ValueFromIndex[OutIndex]+1;
  2.  
2) Strange. AddPair is definitely a public function of TStrings, so No idea
One System to rule them all, One IDE to find them,
One Code to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
People call me crazy, because i'm jumping out of perfectly fine aircraft

skalogryz

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2710
    • havefunsoft.com
Re: Count occurrences of elements in a stringlist
« Reply #10 on: September 23, 2021, 06:38:12 pm »
Search is sorting, sorting is search:
Code: Pascal  [Select][+][-]
  1. procedure DoCount(names, count: TStringList);
  2. var
  3.   t : TStringList;
  4.   i : integer;
  5.   c : integer;
  6.   n : string;
  7. begin
  8.   if names.Count=0 then Exit;
  9.   t := TStringList.Create;
  10.   try
  11.     t.Assign(names);
  12.     t.Sort;
  13.     n:=t[0];
  14.     c:=1;
  15.     for i:=1 to t.Count-1 do begin
  16.       if n <> t[i] then begin
  17.         count.Add( n+'='+IntToStr(c));
  18.         n:=t[i];
  19.         c:=1;
  20.       end else
  21.         inc(c);
  22.     end;
  23.     count.Add( n+'='+IntToStr(c));
  24.   finally
  25.     t.Free;
  26.   end;
  27. end;
  28.  
  29. begin
  30.   counts := TStringList.Create;
  31.   names := TStringList.Create;
  32.   names.Add('MARY');
  33.   names.Add('JANE');
  34.   names.Add('LUCY');
  35.   names.Add('MARY');
  36.   names.Add('PAM');
  37.   names.Add('ALIX');
  38.   names.Add('JANE');
  39.   DoCount(names, counts);
  40.   WriteLn(counts.DelimitedText);
  41.   names.Free;
  42.   counts.Free;
  43. end.
  44.  

wp

  • Hero Member
  • *****
  • Posts: 8873
Re: Count occurrences of elements in a stringlist
« Reply #11 on: September 23, 2021, 11:54:33 pm »
At first, I would avoid the use of generics (as I do not have much experience with using them anyway), but you made an important point by calling attention to the degradation in performance of stringlists for larger lists. As I will be analyzing short DNA sequences (on the order of 500-800 base pairs), where the 'words' are codons (DNA triplets), the stringlists limits may indeed become apparent. I will have to run tests with real data to be sure.
Yes, I try to keep away from generics, too.

The attached demo (not as elegant as skalogryz's) shows that a standard stringlist can count 1,000,000 words of your first commit within less than a second. Sorting the destination list is much faster in case of a long word list due to enabling the quick sort algorithm for finding duplicates, but it is hardly noticeable with the short word list of this example.

In the example the word count is stuffed into the Objects of the stringlist items after casting the pointer to PtrInt integers. I agree that this is an ugly hack which I should not show you. But something like that is used quite often for quick-and-dirty solutions.
« Last Edit: September 23, 2021, 11:58:31 pm by wp »
Mainly Lazarus trunk / fpc 3.2.0 / all 32-bit on Win-10, but many more...

maurobio

  • Sr. Member
  • ****
  • Posts: 294
  • Ecology is everything.
    • GitHub
Re: Count occurrences of elements in a stringlist
« Reply #12 on: September 24, 2021, 02:09:23 am »
Hi, @skalogryz, @wp!

Thanks for your helpful suggestions!

@skalogryz's solution is in fact elegant and easier to understand (I like the idea of "Search is sorting, sorting is search").

In another application, I have been able to handle successfully stringlists of about two thousands records, so I do agree to @wp that such lists can in fact be used in real applications without any (or too much) degradation in performance,

Thank you ALL very much!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

avk

  • Sr. Member
  • ****
  • Posts: 489
    • my self-education project
Re: Count occurrences of elements in a stringlist
« Reply #13 on: September 24, 2021, 03:41:43 pm »
At first, I would avoid the use of generics (as I do not have much experience with using them anyway), but you made an important point by calling attention to the degradation in performance of stringlists for larger lists. As I will be analyzing short DNA sequences (on the order of 500-800 base pairs), where the 'words' are codons (DNA triplets), the stringlists limits may indeed become apparent. I will have to run tests with real data to be sure.
Yes, I try to keep away from generics, too.
...

Why?
I would like to believe that generics is one of the best things that has happened to Pascal in recent times.

As for using TStringList for such tasks, it may seem that the elapsed time of a little less than a second is a good result, but it is not.
I slightly changed your example(in the attachment) added multiset and DoCount() from skalogryz and the ability to set different sizes of the dictionary.
Here is an example of the output on my machine:
Code: Text  [Select][+][-]
  1. Dictionary size: 5
  2. List with 1000000 words created, duration 0.070 s
  3. TStringList(wp): 5 distinct words were found, duration 0.00.840 s
  4. DoCount(): 5 distinct words were found, duration 06.755 s
  5. HashMultiset: 5 distinct words were found, duration 0.030 s
  6.  
  7. Dictionary size: 500
  8. List with 1000000 words created, duration 0.070 s
  9. TStringList(wp): 500 distinct words were found, duration 0.03.180 s
  10. DoCount(): 500 distinct words were found, duration 08.085 s
  11. HashMultiset: 500 distinct words were found, duration 0.040 s
  12.  
  13. Dictionary size: 500000
  14. List with 1000000 words created, duration 0.200 s
  15. TStringList(wp): 432189 distinct words were found, duration 1.09.537 s
  16. DoCount(): 432189 distinct words were found, duration 13.411 s
  17. HashMultiset: 432189 distinct words were found, duration 0.250 s
  18.  

y.ivanov

  • Sr. Member
  • ****
  • Posts: 281
Re: Count occurrences of elements in a stringlist
« Reply #14 on: September 24, 2021, 05:28:22 pm »
Why?
I would like to believe that generics is one of the best things that has happened to Pascal in recent times.
I'm sharing the same belief, generics rocks!

Anyway, if you want to avoid them, the same is achievable with the containers library:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. uses
  4.   Contnrs;
  5.  
  6. var
  7.   HT: TFPDataHashTable;
  8.  
  9.   procedure Add(AKey: String); inline;
  10.   begin
  11.     HT[AKey] := Pointer(PtrUInt(HT[AKey]) + 1);
  12.   end;
  13.  
  14.   procedure Print(Item: Pointer; const Key: string; var Continue: Boolean);
  15.   begin
  16.     WriteLn(Key, ', ', PtrUInt(Item));
  17.   end;
  18.  
  19. begin
  20.   HT := TFPDataHashTable.Create;
  21.  
  22.   Add('MARY');
  23.   Add('JANE');
  24.   Add('LUCY');
  25.   Add('MARY');
  26.   Add('PAM');
  27.   Add('ALIX');
  28.   Add('JANE');
  29.  
  30.   HT.Iterate(@Print);
  31.   HT.Free;
  32. end.

 

TinyPortal © 2005-2018