Recent

Author Topic: [SOLVED] Itterating Stringlist and totalling occurances of same words  (Read 2388 times)

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Hi all

Long story so I'll try and keep it to the relevant bit.

I have code that links with the API of a mainstream program. My code iterates thousands of items from the mainstream program and returns the type categorisation of the file, e.g. pictures, documents, text, programs. Currently, for each item it examines, I am storing the type category  in a stringlist, so the finished list might look like this :

documents
text
text
programs
text
pictures
programs
documents
documents
pictures
programs
pictures
documents
....
and so on, hundreds or thousands of times.

What I need to do next is count all the occurances of each type (but there are hundreds of actual types, not just those I have listed above, so I cant create a simple text lookup from a known, hardcoed, list. My code need to work out how many times each category is mentioned. 

So it needs to read the list and count each time a value appears


Example :

Documents : 4
Text : 3
Programs : 3
Pictures : 3


How might I achieve this? Please?
« Last Edit: February 05, 2019, 03:33:17 pm by Gizmo »

lainz

  • Hero Member
  • *****
  • Posts: 4738
  • Web, Desktop & Android developer
    • https://lainz.github.io/
Re: Itterating Stringlist and totalling occurances of same words
« Reply #1 on: February 04, 2019, 11:58:23 pm »
I used a Memo1 with input data (your TStringList), and Memo2 to show data.

Code: Pascal  [Select][+][-]
  1. var
  2.   x, i: integer;
  3.   s: TSTringList;
  4. begin
  5.   s := TStringList.Create;
  6.   for i:=0 to Memo1.Lines.Count-1 do
  7.   begin
  8.     if s.Values[Memo1.Lines[i]] <> '' then
  9.     begin
  10.        x := StrToInt(s.Values[Memo1.Lines[i]]);
  11.        s.Values[Memo1.Lines[i]] := IntToStr(x + 1);
  12.     end
  13.     else
  14.       s.AddPair(Memo1.Lines[i], '0');
  15.   end;
  16.  
  17.   for i:=0 to s.Count-1 do
  18.     Memo2.Lines.Add(s.Names[i] + ' ' + s.Values[s.Names[i]]);
  19.  

But you must replace your lines where you add all of them to your first TStringList (in my case Memo1) and count them directly if you want.
« Last Edit: February 05, 2019, 12:01:37 am by lainz »

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: Itterating Stringlist and totalling occurances of same words
« Reply #2 on: February 05, 2019, 12:43:20 am »
That looks superb, thankyou, and I can see how it would work with a memo list. But I am actually making a DLL which has no GUI elements.

So how would I tweak your code for say two stringlists? I have the originating stringlist mentioned above (lets call it sl1), so I guess I could create a second stringlist (sl2), and add the result of the iteration to it. But Im unclear how I would modify your example of memo lists to that?

e.g

In SL1 there is the word 'document' on line 1. So see if that is in SL2. If not, add it, and add counter to '1' for document.
In SL1 there is the word 'text' on line 2. So see if that is in SL2. If not, add it and add counter to '1' for text.
In SL1 there is the word document on line 3. Document is in SL2, current at counter 1, so now make counter 2 for documwent
In SL1 there is the word document text again on line 4. Text is in SL2, currently at counter 1, so now make counter 2 for text
and so on...

This seems quite tricky to me, but you seem to have a good grasp of how this works.


Josh

  • Hero Member
  • *****
  • Posts: 1428
Re: Itterating Stringlist and totalling occurances of same words
« Reply #3 on: February 05, 2019, 12:51:41 am »
Hi

You could sort the stringlist with the sl.sort function;
then loop through the list once checking for differences and note when it changes.

SOmething like the attached

Just an idea
The best way to get accurate information on the forum is to post something wrong and wait for corrections.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11923
  • Debugger - SynEdit - and more
    • wiki
Re: Itterating Stringlist and totalling occurances of same words
« Reply #4 on: February 05, 2019, 12:59:15 am »
If this is time critical, and your input is large (10 or 100 thousands) you may want to use some sort of HashMap. (fpc may have some, need to google)

SynEdit also has an example of this, search for TSynPluginSyncroEditWordsHash.

If the input comes from a database, use the database to do the job. (And ensure you have the right index)

lainz

  • Hero Member
  • *****
  • Posts: 4738
  • Web, Desktop & Android developer
    • https://lainz.github.io/
Re: Itterating Stringlist and totalling occurances of same words
« Reply #5 on: February 05, 2019, 02:29:56 am »
That looks superb, thankyou, and I can see how it would work with a memo list. But I am actually making a DLL which has no GUI elements.

So how would I tweak your code for say two stringlists? I have the originating stringlist mentioned above (lets call it sl1), so I guess I could create a second stringlist (sl2), and add the result of the iteration to it. But Im unclear how I would modify your example of memo lists to that?

e.g

In SL1 there is the word 'document' on line 1. So see if that is in SL2. If not, add it, and add counter to '1' for document.
In SL1 there is the word 'text' on line 2. So see if that is in SL2. If not, add it and add counter to '1' for text.
In SL1 there is the word document on line 3. Document is in SL2, current at counter 1, so now make counter 2 for documwent
In SL1 there is the word document text again on line 4. Text is in SL2, currently at counter 1, so now make counter 2 for text
and so on...

This seems quite tricky to me, but you seem to have a good grasp of how this works.

Memo uses a TStringList with the Lines property. So the code I provided is exactly what you need, just replace memo1.lines with your TStringList SL1. In my code the variable S is your SL2.


jamie

  • Hero Member
  • *****
  • Posts: 7405
Re: Itterating Stringlist and totalling occurances of same words
« Reply #6 on: February 05, 2019, 02:43:29 am »
You should be able to use AddObject where as you can add the string and use the Object you are adding
as a counter instead of a object. A little casting will be needed but it will be attached to the newly added
string and you can set it to One on the initial start.

 When you search the list for a match and if found, you can increment that object value..

 When all of this is complete you then have a list of found words with the number of times found.
 it would most likely be faster than altering the text in the list to show a value on the line.
The only true wisdom is knowing you know nothing

balazsszekely

  • Guest
Re: Itterating Stringlist and totalling occurances of same words
« Reply #7 on: February 05, 2019, 06:56:47 am »
Hi Gizmo,

I would also go with a hash list, the speed difference is nocticable above 100000 items. For more details please download attached project:
Code: Pascal  [Select][+][-]
  1. uses contnrs, md5;
  2.  
  3. type
  4.   PData = ^TData;
  5.   TData = record
  6.     FName: String;
  7.     FCount: Int64;
  8.   end;
  9.  
  10. var
  11.   HashList: TFPHashList;
  12.  
  13. //create the hash list
  14. HashList := TFPHashList.Create;
  15.  
  16. //count occurence in the stringlist
  17. var
  18.   I: Integer;
  19.   Data: PData;
  20.   Hash: ShortString;
  21.   Index: Integer;
  22. begin
  23.   for I := 0 to SL.Count - 1 do
  24.   begin
  25.     Hash := MD5Print(MD5String(LowerCase(SL.Strings[I])));
  26.     Index := HashList.FindIndexOf(Hash); //way faster then a loop through a string list
  27.     if Index = -1 then //new occurence, add to the hash list
  28.     begin
  29.       New(Data);
  30.       Data^.FName := SL.Strings[I];
  31.       Data^.FCount := 1;
  32.       HashList.Add(Hash, Data);
  33.     end
  34.     else //already in the list, inc count
  35.       Inc(TData(HashList[Index]^).FCount);
  36.   end;
  37.  
  38. //clear and free the list
  39. var
  40.   I: Integer;
  41.   Data: PData;
  42. begin
  43.   for I := 0 to HashList.Count - 1 do
  44.   begin
  45.     Data := HashList.Items[I];
  46.     Dispose(Data);
  47.   end;
  48.   HashList.Clear;
  49.   HashList.Free;
  50. end;
  51.  

PS: To really speed up things, the list returned by the mainstream application should also be a hash list.
« Last Edit: February 05, 2019, 09:30:15 am by GetMem »

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: Itterating Stringlist and totalling occurances of same words
« Reply #8 on: February 05, 2019, 03:33:07 pm »
Awesome Getmem and IainZ. Super kind of you both to help me out. I've plumbed for a slightly tweaked version of GetMems solution because this code will be dealing with very high volumes of data. Thank you both very much for your time and contribution.

 

TinyPortal © 2005-2018