Recent

Author Topic: Understanding TRegExpr  (Read 1316 times)

pmesquita

  • Jr. Member
  • **
  • Posts: 62
Understanding TRegExpr
« on: August 28, 2022, 07:17:17 pm »
Guys,.
Anyone know TRegex (https://github.com/andgineer/TRegExpr)?

I'm playing with TRegExpr and according to the website: https://regex101.com/ if I use the expression: ^[a-z]([\\.](?![\\.])|[a-z]){5,50}[a-z]$ to validate the text: foo.bar.

The site is valid, but if you throw this same expression to the FPC it "screams" that the position is unknown...

Note: the two backslashes are because the "schema" field is made a "GetJSON" in the content..

Validation code:

Function CheckPattern(AValue, APattern: String): Boolean;
var
   LRegex: TRegExpr;
  Begin
   LRegex:= TRegExpr.Create;
   try
    LRegex.Expression:= APattern;
    LRegex.ModifierI:= False;
    LRegex.ModifierS:= True;
    LRegex.ModifierX:= True;
    Result:= LRegex.Exec(AValue);
   finally
    FreeAndNil(LRegex);
   End;
  End;


AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Re: Understanding TRegExpr
« Reply #1 on: August 28, 2022, 07:19:52 pm »
I am the maintainer.
FPC cannot find this part: (?![\\.])
It is called 'assertion' and it was added to FPC 'trunk' but was not added YET to the last FPC release.

And, 'assertion' must be at the very end of expression.
« Last Edit: August 28, 2022, 07:21:46 pm by AlexTP »

pmesquita

  • Jr. Member
  • **
  • Posts: 62
Re: Understanding TRegExpr
« Reply #2 on: August 28, 2022, 08:51:23 pm »
I'm talking to the right person.. :D

understood, but I'm using the master version in Git.
So the expression should look like this: ^[a-z][\\.][a-z]{5.50}(?![\\.])$ ?

AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Re: Understanding TRegExpr
« Reply #3 on: August 28, 2022, 08:55:59 pm »
That is also 'not at the very end', notice the $ at the very end. BTW, you can test this in CudaText (with latest TRegExpr). Editor shows detailed regex errors.

pmesquita

  • Jr. Member
  • **
  • Posts: 62
Re: Understanding TRegExpr
« Reply #4 on: August 29, 2022, 12:19:32 am »
what's up..

I think I understand, so the 'assertions' should always be declared or at the beginning or end of the expression would it be a current limitation or implementation?

Because in my understanding and using CudaText I arrived at the expression: ^[a-z]+[\.][a-z]{1,50}$(?![\.]) where it works for what I need, however I a question arose:

Would the assertion (?![\.]) at the end of the text (after the $) be a regular expression for the "[\.]" declared earlier in order to avoid repetition of the dot (.) ?

AlexTP

  • Hero Member
  • *****
  • Posts: 2557
    • UVviewsoft
Re: Understanding TRegExpr
« Reply #5 on: August 29, 2022, 10:20:17 am »
Yes, it is the limitation, https://regex.sorokin.engineer/en/latest/regular_expressions.html#assertions - docs tell it.

Quote
Would the assertion (?![\.]) at the end of the text (after the $) be a regular expression for the "[\.]" declared earlier in order to avoid repetition of the dot (.) ?
I did not understand. En is not my native lang.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11050
  • Debugger - SynEdit - and more
    • wiki
Re: Understanding TRegExpr
« Reply #6 on: August 29, 2022, 12:16:51 pm »
In this case the regex may be "fixable"

The orig regex matches any text that
- starts and ends in a letter
- is between 7 and 52 chars long
- can have any amount of single dots between any 2 letters.


Since the text has only dots and letters, the lookahead (?!\\.)  means a letter must follow => and hence this is a word boundary.

I did not test, but I would expect that this should work:
Code: Text  [Select][+][-]
  1. ^[a-z](\\.\\b|[a-z]){5,50}[a-z]$


Depending on what "only at the end of an expression" means exactly... (I.e. only at the very end of the entire regex, or any sub-expression?), there may be another options.
The condition is that the overall length should be between 7 and 52 chars....
If we temporary ignore the length, the regex becomes real simple: 
Code: Text  [Select][+][-]
  1. ^[a-z]+(\\.[a-z]+)*$
- starts with one or more letters
- followed by zero, one or more: dot with at least one letter after it
  (that also ensures the last char is a letter)

If we can place the lookahead at the very start, we can check the length first
Code: Text  [Select][+][-]
  1. ^(?=.{7,52}$)[a-z]+(\\.[a-z]+)+$
That does 2 matches, both from the start, and both required to succeed.
- check the length
- check the content as explained


 

TinyPortal © 2005-2018