(?=.*word1)(?=.*word2)(?=.*word3)
trying to test against "word1 word2" in EditPadPro, which has match highlight and usualy works (using it to test regex for lazarus). Does not highlight.
Oh, ok. You just set to get true if both match.
The above pattern matches a zero-len text.
If you got the text "This is word1 and this is word2" and you match it with "(?=.*word1)(?=.*word2)", then it will match right at the first pos (or any pos before the words). Put it would not highlight anything because it is zero length.
To explain:
- If you you just match with the pattern "word1", then the match len is 5.
- If you match with an empty pattern, then the match len is zero (not all regex allow that, but some do)
- if the pattern is "word1(?=...)" then it will look ahead and ensure that there are at least 3 chars after word1. Those chars are not part of the match. But if they are not present then "word1" would not match. If they are present then you just matched "word1" with len 5.
A pattern with just look aheads and nothing else, is an empty pattern, but with the conditions that the lookaheads must match.
If you want to match the entire substring containing both words it gets complex.
For 2 substrings
(?=word1).*?word2|(?=word2).*?word1)
The look ahead says "find the position with word1" (or for the 2nd lookahead word2). Then from that position (including overlap) you match (normal match, not look ahead) up to and including word2.
You need to cover both orders.
For more words that gets complex, due to the number of variation in which order they can appear.
For 3 words, if words can occur as duplicates
(?=(word1|word2|word3)).*?(?!\1)((?1)).*?(?!\1)(?!\2)((?1))
https://regex101.com/r/XRHvf7/1(?1) matches the "(word1|word2|word3)" again.
There are 2 (?1) so altogether 3 words must be matched.
(?!\1) makes sure that you are at a point that is not identical to what the first (word1|word2|word3) matched.
So if (word1|word2|word3) matched word3 then (?!\1)(?1) makes sure it does not match word3, but it matches one of the others.
The (?1) is in additional brackets so it can be referred to as \2
(?!\1)(?!\2)(?1) does not match the 2 already matched words, but matches a word from the list. So that must then be the 3rd word.
If you don't wont duplicate occurrences of the word.
(?=(word1|word2|word3))(?:.(?!\1))*?(?!\1)((?1))(?:.(?!\1)(?!\2))*?(?!\1)(?!\2)((?1))
https://regex101.com/r/0NZqM2/1(Mind that "word 2" with space is not a duplicate, it is random text in-between)
Now each instead of .*? between the words, the dot is guarded by negative lookahead, ensuring that none of the already matched words is matched by the .*?
And that can be simplified to
(?=(word1|word2|word3))(?:.(?!\1))+?((?1))(?:.(?!\1)(?!\2))+?((?1))
https://regex101.com/r/eYujKG/1And similar patterns can be build for more... Not sure how efficient they are.
The pattern are build so the sub patterns can overlap
If you want to include the 2 terms "mate" and "team", then the text "mateam" will be enough. (it contains both).
If you don't want to allow overlaps, the patterns will be similar, but you don't need the positive lookaheads (you still need the negative ones, I thing).