Lazarus

Programming => General => Topic started by: maurobio on September 16, 2021, 03:06:01 pm

Title: Comparing a string to the next in a list
Post by: maurobio on September 16, 2021, 03:06:01 pm
Dear ALL,

I want to compare each string in a list with the next string in the same list, printing it if they are identical.

I tried two versions, as shown below:

version 1

Code: Pascal  [Select][+][-]
  1. program comp1;
  2.  
  3. {$APPTYPE CONSOLE}
  4.  
  5. uses Classes;
  6.  
  7. var
  8.   s: TStringList;
  9.   i: integer;
  10.  
  11. begin
  12.   s := TStringList.Create;
  13.   s.Add('this');
  14.   s.Add('this');
  15.   s.Add('and');
  16.   s.Add('that');
  17.   for i := 1 to s.Count do
  18.   begin
  19.     if s[i] = s[i - 1] then
  20.           WriteLn(s[i]);
  21.   end;
  22.   s.Free;
  23. end.
  24.  

version 2:

Code: Pascal  [Select][+][-]
  1. program comp2;
  2.  
  3. {$APPTYPE CONSOLE}
  4.  
  5. uses Classes;
  6.  
  7. var
  8.   s: TStringList;
  9.   i: integer;
  10.  
  11. begin
  12.   s := TStringList.Create;
  13.   s.Add('this');
  14.   s.Add('this');
  15.   s.Add('and');
  16.   s.Add('that');
  17.   for i := 0 to s.Count - 1 do
  18.   begin
  19.     if s[i] = s[i + 1] then
  20.           WriteLn(s[i]);
  21.   end;
  22.   s.Free;
  23. end.
  24.  
Both versions give me out-of-bounds errors!  %)

Could someone point out what is wrong? I feel rather dumb today.

Thanks in advance!
Title: Re: Comparing a string to the next in a list
Post by: Zvoni on September 16, 2021, 03:45:04 pm
Version 1:
You run i from 1 to s.count which is 4 in your sample. You compare s[ i] with s[i-1] --> There is no Member with Index 4

The same in Version 2:
you run i from 0 to s.count-1 which equals 3. You compare s[ i] with s[i+1] --> There is no Member with Index 4
Title: Re: Comparing a string to the next in a list
Post by: marcov on September 16, 2021, 03:46:04 pm
first can do a s[ i] with i=s.count, which is out of bounds.

second can do s[i+1] with i=s.count-1 which is effectively also  s[ i] with i=s.count, which is out of bounds.
Title: Re: Comparing a string to the next in a list
Post by: Zvoni on September 16, 2021, 03:48:40 pm
No Out-Of-Bounds-Error:
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. Uses
  4.   Classes;
  5.  
  6. Var
  7.   s:TStringList;
  8.   i:Integer;
  9.  
  10. begin
  11.   s:=TStringList.Create;
  12.   s.Add('this');
  13.   s.add('this');
  14.   s.add('and');
  15.   s.add('that');
  16.   For i:=s.count-1 DownTo 1 Do
  17.       If s[i]=s[i-1] Then WriteLn(s[i]);
  18. //The other direction
  19.   For i:=0 To s.count-2 Do
  20.       If s[i]=s[i+1] Then WriteLn(s[i]);
  21.   s.Free;
  22. end.
  23.  

EDIT: What's the purpose of this comparison?
Title: Re: Comparing a string to the next in a list
Post by: maurobio on September 16, 2021, 09:37:37 pm
@Zvoni,

Thanks a lot for your suggestions, they were quite helpful.

That sample code had no other purpose than illustrate my problem. But what I really want is to perform a statistical comparison of each string in a list with all others in the same list, using one of the indexes provided in the FuzzyWuzzy library (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas)).

Here is another sample, in which the list contains botanical names:

Code: Pascal  [Select][+][-]
  1. program comp;
  2.  
  3. {$APPTYPE CONSOLE}
  4.  
  5. uses Classes, FuzzyWuzzy;
  6.  
  7. var
  8.   s: TStringList;
  9.   i: integer;
  10.  
  11. begin
  12.   s := TStringList.Create;
  13.   s.Add('Anacardium occidentale');
  14.   s.Add('Unknown 1');
  15.   s.Add('Unknown 2');
  16.   s.Add('Eriotheca sp.');
  17.   s.Add('Dead dead');
  18.   s.Add('Myrcia lengua');
  19.   s.Add('Myrcia lingua');
  20.   s.Add('Myrcia sp1');
  21.   s.Add('Myrcia sp2');
  22.   s.Add('Myrcia sp3');
  23.   s.Add('Pseudobombax sp.');
  24.   s.Add('Trema micranthum');
  25.  
  26.   for i := 0 to s.Count - 2 do
  27.   begin
  28.     if PartialRatio(s[i], s[i + 1]) > 90 then
  29.           WriteLn(s[i]);
  30.   end;
  31.   s.Free;
  32. end.
  33.  

It works, incorporating one of your suggestions.

With best wishes,
Title: Re: Comparing a string to the next in a list
Post by: Zvoni on September 17, 2021, 11:57:14 am
That sample code had no other purpose than illustrate my problem. But what I really want is to perform a statistical comparison of each string in a list with all others in the same list, using one of the indexes provided in the FuzzyWuzzy library (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas)).
Uh, you do realize, that's not what you're doing with your simple loop?
Title: Re: [SOLVED] Comparing a string to the next in a list
Post by: maurobio on September 17, 2021, 01:29:50 pm
@Zvoni,

Of course, as I wrote in my previous message, it is working fine.

Thank you!

With best wishes,
Title: Re: [SOLVED] Comparing a string to the next in a list
Post by: Zvoni on September 17, 2021, 01:31:02 pm
You sure?
In your sample above, you never compare "Myrcia lengua" with "Myrcia sp3"
Title: Re: [SOLVED] Comparing a string to the next in a list
Post by: maurobio on September 17, 2021, 01:34:05 pm
@Zvoni,

Why not? As far as I can see, the algorithm seems to be working fine - "Myrcia lengua" is correctly identified as an ortographic error.

Regards,
Title: Re: [SOLVED] Comparing a string to the next in a list
Post by: Zvoni on September 17, 2021, 01:50:52 pm
You misunderstood:
You compare "Myrcia lengua" with "Dead dead" and "Myrcia lingua" but never with "Myrcia sp1"
It's contrary to what you wrote: to compare EACH string with EVERY other string of the List.
Nevermind sorted vs. unsorted List
Title: Re: [SOLVED] Comparing a string to the next in a list
Post by: maurobio on September 17, 2021, 01:55:09 pm
@Zvoni,

Perhaps you could provide an example of how the algorithm should better work?

The lists which I use are always sorted by species name.

Regards,
Title: Re: [SOLVED] Comparing a string to the next in a list
Post by: Zvoni on September 17, 2021, 02:16:23 pm
After thinking about it: The question is not sorted vs. unsorted List, but "Do you really need to compare EACH string with EVERY other String of the List"?
Basically it boils down to running through your List and compare each entry with every other entry.

Of the top of my head, i'd probably go along these lines:
2 Loops.
2 StringLists (initially identical)
Outer Loop: Take (current) entry of List1 and remove this entry from List2 (to avoid comparing it with itself)
Inner Loop: Run through List2 and do your comparisons. --> If you need to save the result you have to provide a place to store it (The associated Object of the StringList-Entry?) or you process the Result directly.

The result of the above is, that List2 shrinks consecutively, since you're removing the current entry from List2, and there is no difference in result for, e.g.
PartialRatio('Myrcia lengua','Myrcia sp3') and PartialRatio('Myrcia sp3,'Myrcia lengua') (commutative).

As i said: From the top of my head. Pretty sure someone else might chip in with something better.

btw: In my above example both Lists have to be in the same order (that's where sorted gives an advantage) and I'd run through both Lists Top-Down, since you'd always remove the last entry (no overhead moving memory around) in List2

EDIT: Maybe remove the "[SOLVED]" from the Subject?
Title: Re: Comparing a string to the next in a list
Post by: maurobio on September 17, 2021, 02:36:48 pm
@Zvoni,

Not sure if I understood correctly, anyway I removed the [SOLVED] tag from the message subject per your suggestion.

I will have to try the algorithm with a larger sample of names.

Regards,
Title: Re: Comparing a string to the next in a list
Post by: Gustavo 'Gus' Carreno on September 17, 2021, 02:42:19 pm
Boas Maurobio,

Attached is a sample project that will compare all the strings to each other, skipping the ones with the same index.

I made it visually appealing with two List Boxes that animate where the comparison is made.

Points of interest on the code:
Code: Pascal  [Select][+][-]
  1. procedure TfrmMain.FormCreate(Sender: TObject);
  2. begin
  3.   FWordList:= TStringList.Create;
  4.   FWordList.Add('Anacardium occidentale');
  5.   FWordList.Add('Unknown 1');
  6.   FWordList.Add('Unknown 2');
  7.   FWordList.Add('Eriotheca sp.');
  8.   FWordList.Add('Dead dead');
  9.   FWordList.Add('Myrcia lengua');
  10.   FWordList.Add('Myrcia lingua');
  11.   FWordList.Add('Myrcia sp1');
  12.   FWordList.Add('Myrcia sp2');
  13.   FWordList.Add('Myrcia sp3');
  14.   FWordList.Add('Pseudobombax sp.');
  15.   FWordList.Add('Trema micranthum');
  16.  
  17.   lbLeft.Items.Text:= FWordList.Text;
  18.   lbRight.Items.Text:= FWordList.Text;
  19. end;
  20.  
  21. procedure TfrmMain.actCompareGoExecute(Sender: TObject);
  22. var
  23.   indexLeft, indexRight: Integer;
  24. begin
  25.   actCompareGo.Enabled:= False;
  26.   Application.ProcessMessages;
  27.   try
  28.     for indexLeft:= 0 to Pred(FWordList.Count) do
  29.     begin
  30.       for indexRight:= 0 to Pred(FwordList.Count) do
  31.       begin
  32.         if indexLeft = indexRight then continue;
  33.         lbLeft.ItemIndex:= indexLeft;
  34.         lbRight.ItemIndex:= indexRight;
  35.         {
  36.           This is where you should perform your FuzzyWuzzy compare.
  37.           I'm just doing an equals.
  38.         }
  39.         if FWordList[indexLeft] = FWordList[indexRight] then
  40.         begin
  41.           memLog.Append(Format(
  42.             'String index %d is identical to String index %d ("%s" = "%s")',
  43.             [
  44.               indexLeft,
  45.               indexRight,
  46.               FWordList[indexLeft],
  47.               FWordList[indexRight]
  48.             ]
  49.           ));
  50.         end;
  51.         Application.ProcessMessages;
  52.         Sleep(80);
  53.       end;
  54.     end;
  55.   finally
  56.     lbLeft.ItemIndex:= -1;
  57.     lbRight.ItemIndex:= -1;
  58.     Application.ProcessMessages;
  59.     actCompareGo.Enabled:= True;
  60.   end;
  61. end;
  62.  

Hope this helps clarify the compare every string in a list with all the strings in the list challenge you were trying.

Grande abraço,
Gus
Title: Re: Comparing a string to the next in a list
Post by: Zvoni on September 17, 2021, 02:52:57 pm
Gus,
you do realize (i admit, haven't looked too closely) that you compare strings twice?
e.g.
in your left List (outer loop) you reach Member 4 and compare it with each Member of the right List (inner Loop) except Member 4, for example member 7
Now, you reach in your left List Member 7 and compare it with each Member of the right list except Member 7, but you compare it (again) with Member 4!

It's why i assumed, FuzzyWuzzy returns the same result for (Member4,Member7) and (Member7,Member4) --> commutative
Title: Re: Comparing a string to the next in a list
Post by: Gustavo 'Gus' Carreno on September 17, 2021, 03:17:56 pm
Hey Zvoni,

you do realize (i admit, haven't looked too closely) that you compare strings twice?

Yeap, you are completely right!

I opted for the lazy, brute force approach of just doing it with a simple 2 nested loops and skipping comparing the same indexes.

There is of course a more complex way of doing it that will not make reverse comparisons.

I guess it's the permutations versus (Crap, I always forget the other name, the one with repeats, or is permutations the one with repeats? ARGHHH brain is not working ATM, sorry!!!) dilemma.

I'll let the more complex routine with no repeats as an exercise to the person that has the brain more switched on than I do :)

Cheers,
Gus
Title: Re: Comparing a string to the next in a list
Post by: maurobio on September 17, 2021, 03:39:25 pm
@Gus, @Zvoni,

Thanks for your time... I didn't think the thing would turn out to be so complex!

The original FuzzyWuzzy library, written in Python (https://github.com/seatgeek/fuzzywuzzy (https://github.com/seatgeek/fuzzywuzzy)) provides a handy 'extract' function which select the best match in a list or dictionary of choices; however, the author of the Free Pascal translation of that library (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas)) did not provide a corresponding translation of said function, therefore what we are trying to achieve in practice is implementing it from scratch in Pascal!

With best wishes,
Title: Re: Comparing a string to the next in a list
Post by: avk on September 17, 2021, 04:08:11 pm
@maurobio, just in case, I would like to draw your attention to the fact that the mentioned Pascal implementation of FuzzyWuzzy seems to be one continuous memory leak.
Title: Re: Comparing a string to the next in a list
Post by: maurobio on September 17, 2021, 04:19:49 pm
@avk,

Continuous memory leak?  :o

In this case, is there an alternative? I really just need to detect 'suspected' names (ie, names which may be ortographically incorrect).

Regards,
Title: Re: Comparing a string to the next in a list
Post by: Gustavo 'Gus' Carreno on September 17, 2021, 04:31:14 pm
Boas Maurobio,

If you don't need the ratio and all the other stuff implemented and are just looking for the Levenshtein Distance (https://en.wikipedia.org/wiki/Levenshtein_distance), the Free Pascal wiki has an article with code to implement it:
https://wiki.freepascal.org/Levenshtein_distance

I'm not sure that this version is memory leak free, since I haven't tested it, but it could be another avenue of pursuit?

Sorry if this doesn't help and just adds to the noise, but at least you have one more implementation to look at ;)

Grande abraço,
Gus
Title: Re: Comparing a string to the next in a list
Post by: avk on September 17, 2021, 04:57:17 pm
Quote from: maurobio on: Today at 04:19:49 pm
Continuous memory leak?  :o

IIRC, the Ratio() and PartialRatio() functions do without leaks, but do they seem the most interesting?

Quote from:  Gustavo 'Gus' Carreno on: Today at 04:31:14 pm
If you don't need the ratio and all the other stuff implemented and are just looking for the Levenshtein Distance, the Free Pascal wiki has an article with code to implement it:
https://wiki.freepascal.org/Levenshtein_distance

To be honest, I think this is the worst example of the Levenshtein distance function.
Title: Re: Comparing a string to the next in a list
Post by: Gustavo 'Gus' Carreno on September 17, 2021, 05:03:11 pm
Hey AVK,

Quote from:  Gustavo 'Gus' Carreno on: Today at 04:31:14 pm
If you don't need the ratio and all the other stuff implemented and are just looking for the Levenshtein Distance, the Free Pascal wiki has an article with code to implement it:
https://wiki.freepascal.org/Levenshtein_distance

To be honest, I think this is the worst example of the Levenshtein distance function.

OOOOPPPSSS!!!

Please disregard my post then, LOL!!!

@avk:
On this subject, do you have a better example?
Is the implementation on the FuzzyWuzzy.pas/Levenshtein.pas (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas/blob/master/Levenshtein.pas) repo better?

Sorry for the noise @Maurobio!!

Cheers,
Gus
Title: Re: Comparing a string to the next in a list
Post by: avk on September 17, 2021, 05:44:09 pm
Quote from: Gustavo 'Gus' Carreno on: Today at 05:03:11 pm
Is the implementation on the FuzzyWuzzy.pas/Levenshtein.pas repo better?

The implementation you mentioned calls as the Levenshtein distance a slightly different distance function.

Quote from: maurobio on: Today at 04:19:49 pm
In this case, is there an alternative?

I tried to implement some string algorithms, for example, here (https://github.com/avk959/LGenerics/blob/3b3340008f6affd03611869e0c1f175389b9a78d/lgenerics/lgstrhelpers.pas#L330), but not all are implemented for unicode strings :(.
Title: Re: Comparing a string to the next in a list
Post by: Zvoni on September 17, 2021, 05:52:10 pm
Hey Zvoni,

you do realize (i admit, haven't looked too closely) that you compare strings twice?

Yeap, you are completely right!

I opted for the lazy, brute force approach of just doing it with a simple 2 nested loops and skipping comparing the same indexes.

There is of course a more complex way of doing it that will not make reverse comparisons.

I guess it's the permutations versus (Crap, I always forget the other name, the one with repeats, or is permutations the one with repeats? ARGHHH brain is not working ATM, sorry!!!) dilemma.

I'll let the more complex routine with no repeats as an exercise to the person that has the brain more switched on than I do :)

Cheers,
Gus
Permutation = Respect the order of draw with/without repeat
Combination = No respect for order of draw with/without repeat

Gus, it‘s the reason why i wrote to remove current member of list1 from list2
Basically, it boils down to combination without repeat, which btw is a Lottery („6 out of 49“ etc.)
Title: Re: Comparing a string to the next in a list
Post by: Gustavo 'Gus' Carreno on September 17, 2021, 06:08:56 pm
Hey AVK,

Quote from: Gustavo 'Gus' Carreno on: Today at 05:03:11 pm
Is the implementation on the FuzzyWuzzy.pas/Levenshtein.pas repo better?

The implementation you mentioned calls as the Levenshtein distance a slightly different distance function.

Hummm, was unaware of that. Thanks for the clarification, that's really helpfull!!

Quote from: maurobio on: Today at 04:19:49 pm
In this case, is there an alternative?

I tried to implement some string algorithms, for example, here (https://github.com/avk959/LGenerics/blob/3b3340008f6affd03611869e0c1f175389b9a78d/lgenerics/lgstrhelpers.pas#L330), but not all are implemented for unicode strings :(.

Would it be too much of an effort to make them UTF8 compatible, cuz you have an impressive array of functions there. (Completely forgot that I already starred you repo, LOL!!)

Cheers,
Gus
Title: Re: Comparing a string to the next in a list
Post by: Gustavo 'Gus' Carreno on September 17, 2021, 06:16:57 pm
Hey Zvoni,

Permutation = Respect the order of draw with/without repeat
Combination = No respect for order of draw with/without repeat

Combination, that's the one!!
Crap, such a well used word and I always forget about it and just remember Permutations, LOL!!
Thanks for the memory refresh!!

Gus, it‘s the reason why i wrote to remove current member of list1 from list2
Basically, it boils down to combination without repeat, which btw is a Lottery („6 out of 49“ etc.)

Right, eliminating the pair already compared on a second list makes it quite easy to implement.
I'll have a think about it and maybe update the example.

Thanks very much for the tip. I've thought about it in the past for something else, but my memory is getting really bad and these, most simple, solutions are now fleeing my mind :(

Cheers,
Gus
Title: Re: Comparing a string to the next in a list
Post by: avk on September 17, 2021, 06:24:47 pm
Quote from: Gustavo 'Gus' Carreno on: Today at 06:08:56 pm
Would it be too much of an effort to make them UTF8 compatible...

Of course not, I just don't have a dozen hands and heads.
Title: Re: Comparing a string to the next in a list
Post by: Gustavo 'Gus' Carreno on September 17, 2021, 06:27:28 pm
Hey AVK,

Of course not, I just don't have a dozen hands and heads.

The old problem of too many things to do and not that many hours in the day, gotcha :)

Nonetheless, many thanks for the code you already presented us!!!

Cheers,
Gus
Title: Re: Comparing a string to the next in a list
Post by: avk on September 17, 2021, 06:38:28 pm
You're welcome!
Title: Re: Comparing a string to the next in a list
Post by: maurobio on September 17, 2021, 08:02:20 pm
@Gus, @Zvoni, @avk,

Interesting discussion!


IIRC, the Ratio() and PartialRatio() functions do without leaks, but do they seem the most interesting?


Yes, either the Ratio() or the PartialRatio() functions seem to be working fine. I still have to test them with a larger sample of names, but (at lteast for the time being) they should do.

With best wishes,


TinyPortal © 2005-2018