Forum > General

regex/and

(1/2) > >>

BubikolRamios:
Need user to  be able to combine multiple regex match on single text.

https://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator
So it is difficult to near impossible.

Since one can't, I guess, invent delimiter in ordinary text such as 'a;b', i.e. , seek for regex match 'a' and regex match 'b' and then if both positive there is a find  ...
Is there a control in lazarus, table like, where you can enter text as you would into Tedit ?
Where first col would be 'and' or 'or' and second regex ?

TValueListEditor at first look ... ? Any suggestion ?


Edit: possible with 4 columns so math like logic could be invented, like , R = regex
(R1) AND (R2 OR R3) .....

Martin_fr:
The answer is on the stackoverflow page.

Use lookaheads.

^(?=.*?PATTERN1)(?=.*?PATTERN2)

You need the .* so they can each start at a different pos. That is missed by some of the answers, but at least one answer has it.

BubikolRamios:

--- Quote ---(?=.*word1)(?=.*word2)(?=.*word3)
--- End quote ---

trying to test against "word1 word2" in EditPadPro, which has match highlight and usualy works (using it to test regex for lazarus). Does not highlight.

Martin_fr:

--- Quote from: BubikolRamios on December 08, 2023, 01:04:22 am ---
--- Quote ---(?=.*word1)(?=.*word2)(?=.*word3)
--- End quote ---

trying to test against "word1 word2" in EditPadPro, which has match highlight and usualy works (using it to test regex for lazarus). Does not highlight.

--- End quote ---

Oh, ok. You just set to get true if both match.

The above pattern matches a zero-len text.

If you got the text "This is word1 and this is word2" and you match it with "(?=.*word1)(?=.*word2)", then it will match right at the first pos (or any pos before the words). Put it would not highlight anything because it is zero length.

To explain:
- If you you just match with the pattern "word1", then the match len is 5.
- If you match with an empty pattern, then the match len is zero (not all regex allow that, but some do)
- if the pattern is "word1(?=...)" then it will look ahead and ensure that there are at least 3 chars after word1. Those chars are not part of the match. But if they are not present then "word1" would not match. If they are present then you just matched "word1" with len 5.

A pattern with just look aheads and nothing else, is an empty pattern, but with the conditions that the lookaheads must match.


If you want to match the entire substring containing both words it gets complex.
For 2 substrings

--- Code: Text  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---(?=word1).*?word2|(?=word2).*?word1)
The look ahead says "find the position with word1" (or for the 2nd lookahead word2). Then from that position (including overlap) you match (normal match, not look ahead) up to and including word2.

You need to cover both orders.

For more words that gets complex, due to the number of variation in which order they can appear.

For 3 words, if words can occur as duplicates

--- Code: Text  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---(?=(word1|word2|word3)).*?(?!\1)((?1)).*?(?!\1)(?!\2)((?1))https://regex101.com/r/XRHvf7/1

(?1) matches the "(word1|word2|word3)" again.
There are 2 (?1) so altogether 3 words must be matched.

(?!\1)  makes sure that you are at a point that is not identical to what the first (word1|word2|word3) matched.

So if (word1|word2|word3) matched word3 then (?!\1)(?1)  makes sure it does not match word3, but it matches one of the others.
The (?1) is in additional brackets so it can be referred to as \2

(?!\1)(?!\2)(?1)   does not match the 2 already matched words, but matches a word from the list. So that must then be the 3rd word.


If you don't wont duplicate occurrences of the word.


--- Code: Text  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---(?=(word1|word2|word3))(?:.(?!\1))*?(?!\1)((?1))(?:.(?!\1)(?!\2))*?(?!\1)(?!\2)((?1))https://regex101.com/r/0NZqM2/1
(Mind that "word 2" with space is not a duplicate, it is random text in-between)

Now each instead of .*? between the words, the dot is guarded by negative lookahead, ensuring that none of the already matched words is matched by the .*?


And that can be simplified to

--- Code: Text  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---(?=(word1|word2|word3))(?:.(?!\1))+?((?1))(?:.(?!\1)(?!\2))+?((?1))https://regex101.com/r/eYujKG/1

And similar patterns can be build for more... Not sure how efficient they are.



The pattern are build so the sub patterns can overlap
If you want to include the 2 terms "mate" and "team", then the text "mateam" will be enough. (it contains both).

If you don't want to allow overlaps, the patterns will be similar, but you don't need the positive lookaheads (you still need the negative ones, I thing).

Martin_fr:
Small mistake, for 3 words, if overlaps are allowed, the 2nd word must be matched in a lookahead

--- Code: Text  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---    (?=(word1|word2|word3)).*?(?!\1)(?=((?1))).*?(?!\1)(?!\2)((?1))
Also it has a flaw.

if your first word is a pattern like "mae" and the 2nd is "at" then the text "mate" contains both, but the match will be "mat".

Navigation

[0] Message Index

[#] Next page

Go to full version