Recent

Author Topic: Regex - Unrecognized modifier at pos  (Read 6702 times)

funk247

  • New Member
  • *
  • Posts: 33
Regex - Unrecognized modifier at pos
« on: August 19, 2014, 05:30:56 pm »
I'm trying to split address strings into building number and address, to do this I am making use of a simple regex, I've tested it on regexr.com and it seems to work fine but Lazarus is having a few issues and as a regex noob I'm completely at a loss as to where I've gone wrong.

This is the regex I was hoping to use, basically it looks for numbers and forward slashes so it should match any of the following:

Code: [Select]
121 Street address
1/3 Street address
House name, 112 Street Address

Code: [Select]
(?:\d*\u002F)?\d+ Throws out an unrecognized modifier (pos 12)

I also tried with

Code: [Select]
(?:\d*\/)?\d+ Again, this works in regexer but not in my app, the unrecognized modifier is at pos 8 in this case.

Where am I going wrong and where do I go from here?

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Regex - Unrecognized modifier at pos
« Reply #1 on: August 19, 2014, 07:09:27 pm »
Maybe:
((?:\d*\/)?\d+)

howardpc

  • Hero Member
  • *****
  • Posts: 4144
Re: Regex - Unrecognized modifier at pos
« Reply #2 on: August 19, 2014, 07:14:14 pm »
For simple parsing tasks I find it quicker and less taxing on the brain to write a simple routine myself, rather than struggling with regex syntax.

Code: [Select]
function ExtractedHouseNumber(const anAddress: string; out numberless: string): string;
const
  Numchar: TSysCharSet=['/'..'9'];
var
  s: string;
  c: Char;
begin
  s:=Trim(anAddress);
  numberless:='';
  Result:='';
  for c in s do
  begin
    if (c in Numchar) then
      AppendStr(Result, c)
    else
      AppendStr(numberless, c);
  end;
  numberless:=Trim(numberless);
end;


funk247

  • New Member
  • *
  • Posts: 33
Re: Regex - Unrecognized modifier at pos
« Reply #3 on: August 19, 2014, 09:19:00 pm »
Regex is a PITA. It would help if the expressions were universal instead of having quirks depending on the language you're using to parse.

That's a nice simple looking function, wish I'd been able to find out about numChar as it's exactly the kind of thing I was looking for. I'll post back tomorrow with my solution.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Regex - Unrecognized modifier at pos
« Reply #4 on: August 19, 2014, 09:53:49 pm »
It would help if the expressions were universal instead of having quirks depending on the language you're using to parse.
In this case it has nothing to do with the language. It is only one file, probably: regexpr.pas. Since it is open source, it is possible to enhance it and eliminate any "quirks".

You can use: ((\d+/)?\d+) when it matches, the building number should be in Match[1]


eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Regex - Unrecognized modifier at pos
« Reply #5 on: August 19, 2014, 11:49:28 pm »
Since it is open source, it is possible to enhance it and eliminate any "quirks".
Messing around with the source of regexpr is not for the faint hearted.

Anyways regular expressions were invented to prevent programs getting polluted with dozens of so called 'easy to understand parsing routines' that need to be changed every time input parameters change. Not to mention all the extra time wasted to test and debug hem.

Make sure to use RegExpr (Andrey V. Sorokin's reg exp lib that was added to Lazarus in one of the last versions) and not regex.
It comes with an excellent help file (at least the separate download; not sure if it's included in the lazarus install).
The simple expression is: ([\d/]{1,})
« Last Edit: August 19, 2014, 11:51:58 pm by eny »
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: Regex - Unrecognized modifier at pos
« Reply #6 on: August 20, 2014, 12:26:54 am »
It comes with an excellent help file (at least the separate download; not sure if it's included in the lazarus install).
The one in the trunk does not come with a help file. I wonder if you mean this help file.

@funk247
You might want to try BeRo RegularExpression Engine which seems to support non-capturing groups: (?:regex)

I don't remember trying it before, but I did keep its link.

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Regex - Unrecognized modifier at pos
« Reply #7 on: August 20, 2014, 01:19:55 am »
The one in the trunk does not come with a help file. I wonder if you mean this help file.
Thats indeed the location where the source and help ended up, after having disappeared from the radar some years ago.
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

funk247

  • New Member
  • *
  • Posts: 33
Re: Regex - Unrecognized modifier at pos
« Reply #8 on: August 20, 2014, 01:36:28 pm »
Ah lots of useful info in these replies :D

@engkin, this:
Code: [Select]
((\d+/)?\d+) Worked perfectly, thanks very much :D

@eny, Am already using RegExpr, it looks like it didn't need the forward slash to be escaped, I'd probably have known that if I'd RTFM so thanks for pointing me toward it,
Code: [Select]
([\d/]{1,}) work great, but in instances where there is a C/O in the address it picked out the slash.

I'm trying not to use external units in my program as this is as much an exercise in learning pascal as anything else. It's much easier than C++ which is a major point in it's favour AFAIC, I'm just trying not to dive too deep into the rabbit hole until I've got the basics down :)

eny

  • Hero Member
  • *****
  • Posts: 1634
Re: Regex - Unrecognized modifier at pos
« Reply #9 on: August 20, 2014, 05:39:41 pm »
@eny, Am already using RegExpr, it looks like it didn't need the forward slash to be escaped, I'd probably have known that if I'd RTFM so thanks for pointing me toward it,
Code: [Select]
([\d/]{1,}) work great, but in instances where there is a C/O in the address it picked out the slash.
Yeah exactly my point   8-)
I gave the simple example because it works for all your initial examples.
And it's very easy to tweak for more sophisticated scenario's, simply by expanding the expression.
If you were to go for coding that all by hand, it would take a lot more time.
And if you would have more numbering scenario's, you can easily set up more than one expression.
Good luck!
All posts based on: Win10 (Win64); Lazarus 2.0.10 'stable' (x64) unless specified otherwise...

 

TinyPortal © 2005-2018