Recent

Author Topic: regex/and  (Read 1889 times)

BubikolRamios

  • Sr. Member
  • ****
  • Posts: 305
regex/and
« on: December 07, 2023, 04:36:52 pm »
Need user to  be able to combine multiple regex match on single text.

https://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator
So it is difficult to near impossible.

Since one can't, I guess, invent delimiter in ordinary text such as 'a;b', i.e. , seek for regex match 'a' and regex match 'b' and then if both positive there is a find  ...
Is there a control in lazarus, table like, where you can enter text as you would into Tedit ?
Where first col would be 'and' or 'or' and second regex ?

TValueListEditor at first look ... ? Any suggestion ?


Edit: possible with 4 columns so math like logic could be invented, like , R = regex
(R1) AND (R2 OR R3) .....
« Last Edit: December 07, 2023, 04:40:47 pm by BubikolRamios »
lazarus 3.2-fpc-3.2.2-win32/win64

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10542
  • Debugger - SynEdit - and more
    • wiki
Re: regex/and
« Reply #1 on: December 07, 2023, 06:47:48 pm »
The answer is on the stackoverflow page.

Use lookaheads.

^(?=.*?PATTERN1)(?=.*?PATTERN2)

You need the .* so they can each start at a different pos. That is missed by some of the answers, but at least one answer has it.

BubikolRamios

  • Sr. Member
  • ****
  • Posts: 305
Re: regex/and
« Reply #2 on: December 08, 2023, 01:04:22 am »
Quote
(?=.*word1)(?=.*word2)(?=.*word3)

trying to test against "word1 word2" in EditPadPro, which has match highlight and usualy works (using it to test regex for lazarus). Does not highlight.
« Last Edit: December 08, 2023, 01:10:44 am by BubikolRamios »
lazarus 3.2-fpc-3.2.2-win32/win64

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10542
  • Debugger - SynEdit - and more
    • wiki
Re: regex/and
« Reply #3 on: December 08, 2023, 02:17:13 am »
Quote
(?=.*word1)(?=.*word2)(?=.*word3)

trying to test against "word1 word2" in EditPadPro, which has match highlight and usualy works (using it to test regex for lazarus). Does not highlight.

Oh, ok. You just set to get true if both match.

The above pattern matches a zero-len text.

If you got the text "This is word1 and this is word2" and you match it with "(?=.*word1)(?=.*word2)", then it will match right at the first pos (or any pos before the words). Put it would not highlight anything because it is zero length.

To explain:
- If you you just match with the pattern "word1", then the match len is 5.
- If you match with an empty pattern, then the match len is zero (not all regex allow that, but some do)
- if the pattern is "word1(?=...)" then it will look ahead and ensure that there are at least 3 chars after word1. Those chars are not part of the match. But if they are not present then "word1" would not match. If they are present then you just matched "word1" with len 5.

A pattern with just look aheads and nothing else, is an empty pattern, but with the conditions that the lookaheads must match.


If you want to match the entire substring containing both words it gets complex.
For 2 substrings
Code: Text  [Select][+][-]
  1. (?=word1).*?word2|(?=word2).*?word1)

The look ahead says "find the position with word1" (or for the 2nd lookahead word2). Then from that position (including overlap) you match (normal match, not look ahead) up to and including word2.

You need to cover both orders.

For more words that gets complex, due to the number of variation in which order they can appear.



For 3 words, if words can occur as duplicates
Code: Text  [Select][+][-]
  1. (?=(word1|word2|word3)).*?(?!\1)((?1)).*?(?!\1)(?!\2)((?1))
https://regex101.com/r/XRHvf7/1

(?1) matches the "(word1|word2|word3)" again.
There are 2 (?1) so altogether 3 words must be matched.

(?!\1)  makes sure that you are at a point that is not identical to what the first (word1|word2|word3) matched.

So if (word1|word2|word3) matched word3 then (?!\1)(?1)  makes sure it does not match word3, but it matches one of the others.
The (?1) is in additional brackets so it can be referred to as \2

(?!\1)(?!\2)(?1)   does not match the 2 already matched words, but matches a word from the list. So that must then be the 3rd word.



If you don't wont duplicate occurrences of the word.

Code: Text  [Select][+][-]
  1. (?=(word1|word2|word3))(?:.(?!\1))*?(?!\1)((?1))(?:.(?!\1)(?!\2))*?(?!\1)(?!\2)((?1))
https://regex101.com/r/0NZqM2/1
(Mind that "word 2" with space is not a duplicate, it is random text in-between)

Now each instead of .*? between the words, the dot is guarded by negative lookahead, ensuring that none of the already matched words is matched by the .*?


And that can be simplified to
Code: Text  [Select][+][-]
  1. (?=(word1|word2|word3))(?:.(?!\1))+?((?1))(?:.(?!\1)(?!\2))+?((?1))
https://regex101.com/r/eYujKG/1

And similar patterns can be build for more... Not sure how efficient they are.





The pattern are build so the sub patterns can overlap
If you want to include the 2 terms "mate" and "team", then the text "mateam" will be enough. (it contains both).

If you don't want to allow overlaps, the patterns will be similar, but you don't need the positive lookaheads (you still need the negative ones, I thing).


Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10542
  • Debugger - SynEdit - and more
    • wiki
Re: regex/and
« Reply #4 on: December 08, 2023, 10:54:49 am »
Small mistake, for 3 words, if overlaps are allowed, the 2nd word must be matched in a lookahead
Code: Text  [Select][+][-]
  1.     (?=(word1|word2|word3)).*?(?!\1)(?=((?1))).*?(?!\1)(?!\2)((?1))

Also it has a flaw.

if your first word is a pattern like "ma
e" and the 2nd is "at" then the text "mate" contains both, but the match will be "mat".

rens

  • Newbie
  • Posts: 2
Re: regex/and
« Reply #5 on: February 03, 2024, 04:16:40 am »

uhm... masks unit can do this :

*ex1*ex2*;*ex2*ex1*

this will match on
thisex1thatex2
and also
thisex2thatex1

but not on thisex1
and not on thisex2

MatchesMaskList(name,srchitem,';',ignoCase)

that what you are looking for ?

Need user to  be able to combine multiple regex match on single text.

https://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator
So it is difficult to near impossible.

Since one can't, I guess, invent delimiter in ordinary text such as 'a;b', i.e. , seek for regex match 'a' and regex match 'b' and then if both positive there is a find  ...
Is there a control in lazarus, table like, where you can enter text as you would into Tedit ?
Where first col would be 'and' or 'or' and second regex ?

TValueListEditor at first look ... ? Any suggestion ?


Edit: possible with 4 columns so math like logic could be invented, like , R = regex
(R1) AND (R2 OR R3) .....

jamie

  • Hero Member
  • *****
  • Posts: 6733
Re: regex/and
« Reply #6 on: February 03, 2024, 03:45:40 pm »
You know, I never have a problem with RegEx  :D
The only true wisdom is knowing you know nothing

dsiders

  • Hero Member
  • *****
  • Posts: 1280
Re: regex/and
« Reply #7 on: February 03, 2024, 05:20:22 pm »
You know, I never have a problem with RegEx  :D

"Just say NO"? ;)
Preview the next Lazarus documentation release at: https://dsiders.gitlab.io/lazdocsnext

 

TinyPortal © 2005-2018