Recent

Author Topic: Comparing a string to the next in a list  (Read 1925 times)

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 759
  • Professional amateur ;-P
Re: Comparing a string to the next in a list
« Reply #15 on: September 17, 2021, 03:17:56 pm »
Hey Zvoni,

you do realize (i admit, haven't looked too closely) that you compare strings twice?

Yeap, you are completely right!

I opted for the lazy, brute force approach of just doing it with a simple 2 nested loops and skipping comparing the same indexes.

There is of course a more complex way of doing it that will not make reverse comparisons.

I guess it's the permutations versus (Crap, I always forget the other name, the one with repeats, or is permutations the one with repeats? ARGHHH brain is not working ATM, sorry!!!) dilemma.

I'll let the more complex routine with no repeats as an exercise to the person that has the brain more switched on than I do :)

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.04 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.04 64b Dark Theme
http://github.com/gcarreno

maurobio

  • Sr. Member
  • ****
  • Posts: 301
  • Ecology is everything.
    • GitHub
Re: Comparing a string to the next in a list
« Reply #16 on: September 17, 2021, 03:39:25 pm »
@Gus, @Zvoni,

Thanks for your time... I didn't think the thing would turn out to be so complex!

The original FuzzyWuzzy library, written in Python (https://github.com/seatgeek/fuzzywuzzy) provides a handy 'extract' function which select the best match in a list or dictionary of choices; however, the author of the Free Pascal translation of that library (https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas) did not provide a corresponding translation of said function, therefore what we are trying to achieve in practice is implementing it from scratch in Pascal!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

avk

  • Sr. Member
  • ****
  • Posts: 499
    • my self-education project
Re: Comparing a string to the next in a list
« Reply #17 on: September 17, 2021, 04:08:11 pm »
@maurobio, just in case, I would like to draw your attention to the fact that the mentioned Pascal implementation of FuzzyWuzzy seems to be one continuous memory leak.

maurobio

  • Sr. Member
  • ****
  • Posts: 301
  • Ecology is everything.
    • GitHub
Re: Comparing a string to the next in a list
« Reply #18 on: September 17, 2021, 04:19:49 pm »
@avk,

Continuous memory leak?  :o

In this case, is there an alternative? I really just need to detect 'suspected' names (ie, names which may be ortographically incorrect).

Regards,
UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 759
  • Professional amateur ;-P
Re: Comparing a string to the next in a list
« Reply #19 on: September 17, 2021, 04:31:14 pm »
Boas Maurobio,

If you don't need the ratio and all the other stuff implemented and are just looking for the Levenshtein Distance, the Free Pascal wiki has an article with code to implement it:
https://wiki.freepascal.org/Levenshtein_distance

I'm not sure that this version is memory leak free, since I haven't tested it, but it could be another avenue of pursuit?

Sorry if this doesn't help and just adds to the noise, but at least you have one more implementation to look at ;)

Grande abraço,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.04 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.04 64b Dark Theme
http://github.com/gcarreno

avk

  • Sr. Member
  • ****
  • Posts: 499
    • my self-education project
Re: Comparing a string to the next in a list
« Reply #20 on: September 17, 2021, 04:57:17 pm »
Quote from: maurobio on: Today at 04:19:49 pm
Continuous memory leak?  :o

IIRC, the Ratio() and PartialRatio() functions do without leaks, but do they seem the most interesting?

Quote from:  Gustavo 'Gus' Carreno on: Today at 04:31:14 pm
If you don't need the ratio and all the other stuff implemented and are just looking for the Levenshtein Distance, the Free Pascal wiki has an article with code to implement it:
https://wiki.freepascal.org/Levenshtein_distance

To be honest, I think this is the worst example of the Levenshtein distance function.
« Last Edit: September 17, 2021, 05:00:49 pm by avk »

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 759
  • Professional amateur ;-P
Re: Comparing a string to the next in a list
« Reply #21 on: September 17, 2021, 05:03:11 pm »
Hey AVK,

Quote from:  Gustavo 'Gus' Carreno on: Today at 04:31:14 pm
If you don't need the ratio and all the other stuff implemented and are just looking for the Levenshtein Distance, the Free Pascal wiki has an article with code to implement it:
https://wiki.freepascal.org/Levenshtein_distance

To be honest, I think this is the worst example of the Levenshtein distance function.

OOOOPPPSSS!!!

Please disregard my post then, LOL!!!

@avk:
On this subject, do you have a better example?
Is the implementation on the FuzzyWuzzy.pas/Levenshtein.pas repo better?

Sorry for the noise @Maurobio!!

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.04 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.04 64b Dark Theme
http://github.com/gcarreno

avk

  • Sr. Member
  • ****
  • Posts: 499
    • my self-education project
Re: Comparing a string to the next in a list
« Reply #22 on: September 17, 2021, 05:44:09 pm »
Quote from: Gustavo 'Gus' Carreno on: Today at 05:03:11 pm
Is the implementation on the FuzzyWuzzy.pas/Levenshtein.pas repo better?

The implementation you mentioned calls as the Levenshtein distance a slightly different distance function.

Quote from: maurobio on: Today at 04:19:49 pm
In this case, is there an alternative?

I tried to implement some string algorithms, for example, here, but not all are implemented for unicode strings :(.

Zvoni

  • Hero Member
  • *****
  • Posts: 746
Re: Comparing a string to the next in a list
« Reply #23 on: September 17, 2021, 05:52:10 pm »
Hey Zvoni,

you do realize (i admit, haven't looked too closely) that you compare strings twice?

Yeap, you are completely right!

I opted for the lazy, brute force approach of just doing it with a simple 2 nested loops and skipping comparing the same indexes.

There is of course a more complex way of doing it that will not make reverse comparisons.

I guess it's the permutations versus (Crap, I always forget the other name, the one with repeats, or is permutations the one with repeats? ARGHHH brain is not working ATM, sorry!!!) dilemma.

I'll let the more complex routine with no repeats as an exercise to the person that has the brain more switched on than I do :)

Cheers,
Gus
Permutation = Respect the order of draw with/without repeat
Combination = No respect for order of draw with/without repeat

Gus, it‘s the reason why i wrote to remove current member of list1 from list2
Basically, it boils down to combination without repeat, which btw is a Lottery („6 out of 49“ etc.)
« Last Edit: September 17, 2021, 05:53:54 pm by Zvoni »
One System to rule them all, One IDE to find them,
One Code to bring them all, and to the Framework bind them,
in the Land of Redmond, where the Windows lie
---------------------------------------------------------------------
People call me crazy, because i'm jumping out of perfectly fine aircraft

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 759
  • Professional amateur ;-P
Re: Comparing a string to the next in a list
« Reply #24 on: September 17, 2021, 06:08:56 pm »
Hey AVK,

Quote from: Gustavo 'Gus' Carreno on: Today at 05:03:11 pm
Is the implementation on the FuzzyWuzzy.pas/Levenshtein.pas repo better?

The implementation you mentioned calls as the Levenshtein distance a slightly different distance function.

Hummm, was unaware of that. Thanks for the clarification, that's really helpfull!!

Quote from: maurobio on: Today at 04:19:49 pm
In this case, is there an alternative?

I tried to implement some string algorithms, for example, here, but not all are implemented for unicode strings :(.

Would it be too much of an effort to make them UTF8 compatible, cuz you have an impressive array of functions there. (Completely forgot that I already starred you repo, LOL!!)

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.04 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.04 64b Dark Theme
http://github.com/gcarreno

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 759
  • Professional amateur ;-P
Re: Comparing a string to the next in a list
« Reply #25 on: September 17, 2021, 06:16:57 pm »
Hey Zvoni,

Permutation = Respect the order of draw with/without repeat
Combination = No respect for order of draw with/without repeat

Combination, that's the one!!
Crap, such a well used word and I always forget about it and just remember Permutations, LOL!!
Thanks for the memory refresh!!

Gus, it‘s the reason why i wrote to remove current member of list1 from list2
Basically, it boils down to combination without repeat, which btw is a Lottery („6 out of 49“ etc.)

Right, eliminating the pair already compared on a second list makes it quite easy to implement.
I'll have a think about it and maybe update the example.

Thanks very much for the tip. I've thought about it in the past for something else, but my memory is getting really bad and these, most simple, solutions are now fleeing my mind :(

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.04 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.04 64b Dark Theme
http://github.com/gcarreno

avk

  • Sr. Member
  • ****
  • Posts: 499
    • my self-education project
Re: Comparing a string to the next in a list
« Reply #26 on: September 17, 2021, 06:24:47 pm »
Quote from: Gustavo 'Gus' Carreno on: Today at 06:08:56 pm
Would it be too much of an effort to make them UTF8 compatible...

Of course not, I just don't have a dozen hands and heads.

Gustavo 'Gus' Carreno

  • Hero Member
  • *****
  • Posts: 759
  • Professional amateur ;-P
Re: Comparing a string to the next in a list
« Reply #27 on: September 17, 2021, 06:27:28 pm »
Hey AVK,

Of course not, I just don't have a dozen hands and heads.

The old problem of too many things to do and not that many hours in the day, gotcha :)

Nonetheless, many thanks for the code you already presented us!!!

Cheers,
Gus
Lazarus 2.3.0(trunk) FPC 3.3.1(trunk) Ubuntu 21.04 64b Dark Theme
Lazarus 2.0.12(stable) FPC 3.2.2(stable) Ubuntu 21.04 64b Dark Theme
http://github.com/gcarreno

avk

  • Sr. Member
  • ****
  • Posts: 499
    • my self-education project
Re: Comparing a string to the next in a list
« Reply #28 on: September 17, 2021, 06:38:28 pm »
You're welcome!

maurobio

  • Sr. Member
  • ****
  • Posts: 301
  • Ecology is everything.
    • GitHub
Re: Comparing a string to the next in a list
« Reply #29 on: September 17, 2021, 08:02:20 pm »
@Gus, @Zvoni, @avk,

Interesting discussion!


IIRC, the Ratio() and PartialRatio() functions do without leaks, but do they seem the most interesting?


Yes, either the Ratio() or the PartialRatio() functions seem to be working fine. I still have to test them with a larger sample of names, but (at lteast for the time being) they should do.

With best wishes,


UCSD Pascal / Burroughs 6700 / Master Control Program
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19, GNU/Linux Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

 

TinyPortal © 2005-2018