Recent

Author Topic: TRegExpr - Idfficulty using expressions found from elsewhere  (Read 10842 times)

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Hi

Using TRegExpr (http://regexpstudio.com) I am struggling to use some of the example RE's on the net for, in this case, validating strong passwords (i.e. mix of upper and lower case with at least one number etc) though I have struggled with some other examples too as there seems to be so many variances for what I thought was fairly standard 'language'. Anyway, I have found several examples for strong passwords but cannot get any of them to work with TRegExpr. For example:

^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9]).{6,50}$ (http://regexhero.net/library/35/strong-password)
or
 /^(?=^.{6,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$/ (http://regexhero.net/library/35/strong-password)

When I run the program I usually get 'Unrecognised modifier (Pos X)'

Many thanks for any advice

Ted

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #1 on: June 16, 2012, 11:09:09 am »
To put it bluntly: split out the regexes into the smallest parts that don't work. Compare with documentation (e.g. http://regexpstudio.com/TRegExpr/Help/RegExp_Syntax.html) to see if that syntax is supported. File a bug if it should be.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #2 on: June 16, 2012, 11:45:24 am »
There are several incompatible regex dialects. The (?= is a lookahead assertion used in the PHP (PCRE) dialect. TRegExpr is close to the PERL regex syntax and doesn't support (?=

The document mentioned by BigChimp shows that (? is interpreted as an inline modifier which explains the error message.

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #3 on: June 16, 2012, 11:08:03 pm »
I've decided to try and start from scratch following the link provided by BC.

So, the following works quite well : [A-Za-z]{6,}[0-9]{1,}

will find things like TedSmith28 but not Ted-Smith28 or T3dSm1th28....will keep trying:-) It's like another language in itself!

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #4 on: June 17, 2012, 12:03:00 am »
I haven't tested it, and I hope I did not put typos in there.


But the below (all on ONE line / NO spaces) should test that lower, upper and digits are all present (in any order)

It does NOT check any minimum length. (Can be added, but will make the pattern much more complex)


(.* [A-Z] .* ( ([a-z].*[0-9]) | ([0-9].* [a-z]) ) .* ) |
(.* [a-z] .*  ( ([A-Z].*[0-9]) | ([0-9].* [A-Z]) ) .* ) |
(.* [0-9] .*  ( ([a-z].*[A-Z]) | ([A-Z].* [a-z]) ) .* )

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #5 on: June 17, 2012, 10:29:50 pm »
Thanks a lot Martin for contributing.

I gave your expression a whirl and it is much easier to understand than other examples I read. In practise though it seems to be quite slow. So a played around somemore and wrote this (expansion on my earlier above) and have found it to work well apart from the fact that it won't search for special characters like '@', '!', '"', '£' etc that people might also throw in:

[A-Za-z0-9].{6,25}[A-Za-z0-9]   (finds upper, lower, numeric and other chars like '-' etc of at least 6 values in length to a max of 25 chars followed by the same in a different form. It works very fast.
« Last Edit: June 17, 2012, 11:03:19 pm by tedsmith »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #6 on: June 18, 2012, 12:15:25 am »
[A-Za-z0-9].{6,25}[A-Za-z0-9]   (finds upper, lower, numeric

"finds" maybe. But does not enforce. The above will also match: aaaaaaaa

I haven't tested my example. But if it really is that noticeable slow, then the regex engine is crap.

I used patterns that were more than 10 times as complex (spreading half a page) in perl. And they worked fast.

It is important, that you should create a TRegEx object once (and only once), and set the expression only once too. Then you can use it as often as you want.

Though of course, if you use that in a web-server, then you may not be able to keep the object.


Anyway, for the task you want to archive, I would suggest a few lines in pascal.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #7 on: June 18, 2012, 12:16:33 am »
[A-Za-z0-9].{6,25}[A-Za-z0-9]   (finds upper, lower, numeric

"finds" maybe. But does not enforce. The above will also match: aaaaaaaa

I haven't tested my example. But if it really is that noticeable slow, then the regex engine is crap, or you are using it in a wrong way.

I used patterns that were more than 10 times as complex (spreading half a page) in perl. And they worked fast.

It is important, that you should create a TRegEx object once (and only once), and set the expression only once too. Then you can use it as often as you want.

Though of course, if you use that in a web-server, then you may not be able to keep the object.


Anyway, for the task you want to archive, I would suggest a few lines in pascal.

avra

  • Hero Member
  • *****
  • Posts: 2584
    • Additional info
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #8 on: June 18, 2012, 08:45:44 am »
There are several incompatible regex dialects.
+1 for that
ct2laz - Conversion between Lazarus and CodeTyphon
bithelpers - Bit manipulation for standard types
pasettimino - Siemens S7 PLC lib

AnSo

  • Newbie
  • Posts: 1
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #9 on: June 18, 2012, 12:59:32 pm »
I am sorry, but TRegExpr supports only two kinds of
Code: [Select]
(?..)
This are modifiers and comment
Code: [Select]
(?#comment)
(?imsxr-imsxr)

Look-ahead and other extentions had not been implemented in the engine.


Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #10 on: June 18, 2012, 02:27:52 pm »
Martin:
Quote
"finds" maybe. But does not enforce. The above will also match: aaaaaaaa

You're quite right. I found this to be the case soon after having thrown at some "real" data as opposed to engineered test data. So, whilst it will find "TedSmith28", for example, it also finds tedsmithIsHairy or something.

I would like an expression that strictly searches for a capital letter at the start, followed by a combination of upper or lower in the middle, followed by at least 1 digit. Such as 'Tedsmith28'. I think that's what your suggestion does do Martin (?) but when I ran it across a 20Mb source data it took over 10 minutes and hasn't returned anything. Maybe I had coded it wrong though.

FYI, I have (fro memory)

MyRegExpr.Exec := 'YourExpressionAbove';

try
  for i := 0 to StringList -1 do
    if MyRegExpr.Exec then do (if the expression found a hit in the StringList line)
      Populate a StringGrid using MyRegExpr.Match[0];


Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #11 on: June 18, 2012, 04:37:49 pm »
I would like an expression that strictly searches for a capital letter at the start, followed by a combination of upper or lower in the middle, followed by at least 1 digit. Such as 'Tedsmith28'. I think that's what your suggestion does do Martin (?) but when I ran it across a 20Mb source data it took over 10 minutes and hasn't returned anything. Maybe I had coded it wrong though.
Mine did more:
It checked for the presence of the 3 (upper,lower,digit) in any order, and with any others inbetween.

For what you describe:
  ^[A-Z].*[a-z].*[0-9]$

or more optimized
  ^[A-Z][^a-z]*^[a-z].*[0-9]$


- First must be upper
- then anything can follow
- somewhere in the middle must be a lower
- after the lower anything can follow
- at the end must be a digit

Actually you can optimize my initial one

^[^a-zA-Z0-9]* (
( [A-Z]  [^a-z0-9]*  ( ([a-z].*?[0-9]) | ([0-9].*? [a-z]) )  ) |
( [a-z]  [^A-Z0-9]*  ( ([A-Z].*?[0-9]) | ([0-9].*? [A-Z]) )  ) |
( [0-9]  [^a-zA-Z]*  ( ([a-z].*?[A-Z]) | ([A-Z].*? [a-z]) )  )
)

And the [^a-zA-Z0-9] is only needed if it can start with .,-=^*&...
But do keep the ^ at the very start.

Out of interest, compare the time for the optimized as above, and a none greedy version:
^[^a-zA-Z0-9]*? (
( [A-Z]  [^a-z0-9]*?  ( ([a-z].*?[0-9]) | ([0-9].*? [a-z]) )  ) |
( [a-z]  [^A-Z0-9]*?  ( ([A-Z].*?[0-9]) | ([0-9].*? [A-Z]) )  ) |
( [0-9]  [^a-zA-Z]*?  ( ([a-z].*?[A-Z]) | ([A-Z].*? [a-z]) )  )
)


This will require less backtracing in the reg-ex engine

[^a-zA-Z0-9]*  skip all none chars e.g £$%^&-.,

Then it hits one of the 3 sub expressions (one of the 3 lines). The password that is tested must (after all ,.35433) have either an upper, a lower, or a digit

example 1st line:
   $$ABC-Xdef!1

The $$ is skipped
The ABC is matched by [A-Z]+
Now all can be skiped that is NOT lower or digit [^a-z0-9]* / That includes' skipping further upper
The inner or ("|") sud expression is hit
- either a lower [a-z], followed whatever, and eventually a digit
- or a digit [0-9], followed whatever, and eventually a lower




Quote

FYI, I have (fro memory)

MyRegExpr.Exec := 'YourExpressionAbove';

you mean  MyRegExpr.Expression  ?

« Last Edit: June 18, 2012, 05:00:18 pm by Martin_fr »

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12202
  • Debugger - SynEdit - and more
    • wiki
Re: TRegExpr - Idfficulty using expressions found from elsewhere
« Reply #12 on: June 18, 2012, 04:43:00 pm »
[A-Za-z0-9].{6,25}[A-Za-z0-9]   (finds upper, lower, numeric
I haven't tested my example. But if it really is that noticeable slow, then the regex engine is crap.

I correct myself. The initial pattern allowed for a lot of backtracking. Therefore it would  have used more time. I wasn't aware you need to scan such large data.

How long does your pattern take?
« Last Edit: June 18, 2012, 04:55:48 pm by Martin_fr »

 

TinyPortal © 2005-2018