Recent

Author Topic: Using regular expressions to parse numeric sequences  (Read 1374 times)

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Using regular expressions to parse numeric sequences
« on: May 28, 2020, 04:16:32 pm »
Dear ALL,

I have sequences of numbers like these:

Quote
,(3-)5-100
8-250(-430)
60-110
(200-)500-3500
I want get the numbers outside parenthesis, splitting them into their minimum and maximum values, like this:

Quote
,(3-)5-100 --> 5 100
8-250(-430) --> 8 250
60-110 --> 60 110
(200-)500-3500 --> 500 3500
Is there a compact way of achieving this using regular expressions? Examples will be welcome!

Thanks in advance!

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

jamie

  • Hero Member
  • *****
  • Posts: 6131
Re: Using regular expressions to parse numeric sequences
« Reply #1 on: May 28, 2020, 04:36:18 pm »
Others will try to lure you onto RegExp  but that is so twisted and mind boggling that it just isn't worth it to  me..

 Write a simple little parser that scans the string..
 it needs to look for opening and closing (,) and numbers, math signs.. etc..

also I notice you may have some comma's there..

 you can use a Split function to break that up.
The only true wisdom is knowing you know nothing

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Using regular expressions to parse numeric sequences
« Reply #2 on: May 28, 2020, 04:57:23 pm »
If it's as simple as it seems (ignoring anything between parens and returning the rest, substituting operators by spaces and discarding other non-digits) I think a judicious use of Pos, PosEx(), Delete() and StringReplace() would do the trick quite nicely :)
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

jamie

  • Hero Member
  • *****
  • Posts: 6131
Re: Using regular expressions to parse numeric sequences
« Reply #3 on: May 28, 2020, 05:50:44 pm »
Code: Pascal  [Select][+][-]
  1. Function GetPairvalues(Str:String; Out left,Sep,Right:String):Boolean;
  2. var
  3.   I:integer=1;
  4.   Q:Integer=0;
  5.   D:Integer=1;
  6. begin
  7.   Left := '';sep:=''; Right := '';
  8.   While I <= Length(Str) do
  9.    Begin
  10.     If Str[I] ='(' then Inc(Q) Else
  11.       if Str[I] =')' Then
  12.         Dec(Q) else // Raise an error if neg...
  13.          If Q = 0 Then
  14.          if not (Str[I]in ['-','+','*','/']) Then
  15.            Case D of
  16.             1:Left+=Str[I];
  17.             2:Right+=Str[I];
  18.            End Else
  19.             Begin
  20.              Sep:=Str[I];
  21.              Inc(D);
  22.             End;
  23.       Inc(I);
  24.      End;
  25. End;
  26.  
  27. procedure TForm1.Button2Click(Sender: TObject);
  28. var
  29.   L,S,R:String;
  30. begin
  31.   GetPairValues('400-678(-50)',l,s,r);
  32.   Caption := L+S+R;
  33. end;    
  34.  
  35. You can improve on that if you wish  :D                              
  36.  
The only true wisdom is knowing you know nothing

MarkMLl

  • Hero Member
  • *****
  • Posts: 6692
Re: Using regular expressions to parse numeric sequences
« Reply #4 on: May 28, 2020, 06:23:48 pm »
I broadly agree with what everybody else has already said, except that unlike Jamie I'd suggest calling the solution a state machine rather than a parser (which term is loaded with many implications).

The problem with a regex here is that it's difficult- if not impossible- to say "if this bit doesn't exist, return it as an empty string". The result of this is that you'd be starting off with at least five separate regexes to cover cases that look approximately like this:

Code: [Select]
12345
(12)345
12(34)5
123(45)
(123345)

then for each of those you'd be extracting the matched portions by index number. Of course, the regexes themselves would have both literal parentheses and grouping parentheses, and you'll notice that the above don't allow for any of the multiple-parentheses cases.

So you have variable called something like

Code: [Select]
var
  leftOfParentheses: string= '';
  insideParentheses: string= '';
  rightOfParentheses: string= '';
  parenthesesDepth: integer= 0;

You parse each string from the start, accumulating the characters into leftOfParentheses until you hit a left-parenthesis at which point you increment parenthesesDepth. You now continue, accumulating characters into insideParentheses and incrementing/decrementing parenthesesDepth until it becomes zero, at which point you accumulate into rightOfParentheses instead.

Then you look to see which of leftOfParentheses and rightOfParentheses is non-blank, and you can process those in a very similar fashion.

Unlike solution that applies patterns (regexen or otherwise) to each card as it's read, that's bomb-proof: it doesn't rely on your knowing all possible patterns in advance.

So to summarise, there are times when pattern matching is very useful. This isn't one of them.

MarkMLl



MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: Using regular expressions to parse numeric sequences
« Reply #5 on: May 28, 2020, 08:59:48 pm »
Gentlemen,

Thank you very much for the insightful discussion and suggestions.

I do fully agree with @jamie in his remark that regular expressions are 'twisted and mind boggling'. Indeed! But they have the big advantage of being pretty concise and once one is working, it can do a lot in less lines.

But I do understand that my problem is not amenable to treatment by a regex. I will try the code kindly provided by @jamie, and hope that it will be useful.

I noticed that both @jamie and @MarkML treated the problem as one of interpreting mathematical expressions, therefore considering more general solutions than what is really necessary. These numeric sequences are not expressions, they are data, and refer to measurements taken from several parts; so, for example, the sequence (200-)500-3500 represents the minimum (500) and the maximum (3500) values of a measurement, with (200-) representing an extreme lower value. Therefore, such sequences will ever be in that format (just with an occasional debris character like a starting comma). No character other than '-' will ever appear in them, and the value between parenthesis will only appear at the start (as an extreme lower value) or the end (as an extreme upper value) of a given sequence.

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

MarkMLl

  • Hero Member
  • *****
  • Posts: 6692
Re: Using regular expressions to parse numeric sequences
« Reply #6 on: May 28, 2020, 09:57:39 pm »
I noticed that both @jamie and @MarkML treated the problem as one of interpreting mathematical expressions, therefore considering more general solutions than what is really necessary. These numeric sequences are not expressions, they are data, and refer to measurements taken from several parts; so, for example, the sequence (200-)500-3500 represents the minimum (500) and the maximum (3500) values of a measurement, with (200-) representing an extreme lower value. Therefore, such sequences will ever be in that format (just with an occasional debris character like a starting comma). No character other than '-' will ever appear in them, and the value between parenthesis will only appear at the start (as an extreme lower value) or the end (as an extreme upper value) of a given sequence.

No, I am specifically NOT treating it as a mathematical expression. If I were, a parser would be in order... and I specifically said that for this you need a state machine NOT a parser.

You are- of course- entirely at liberty to ignore my suggestion. But I'd much prefer that you didn't mischaracterise it as something it isn't.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: Using regular expressions to parse numeric sequences
« Reply #7 on: May 28, 2020, 10:19:23 pm »
@MarkMLI,

It was not surely my intention to mischaracterise your suggestion!  :o

In a rush, I didn't read your entire post (because, I admit, I go straight for the code). Please accept my most sincere apologies :-\.

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Jurassic Pork

  • Hero Member
  • *****
  • Posts: 1228
Re: Using regular expressions to parse numeric sequences
« Reply #8 on: May 29, 2020, 12:13:13 am »
hello,
maurobio, if your values  always looks like your examples, regex isn't very complicated  to use :

Code: Pascal  [Select][+][-]
  1. procedure TForm1.BtRegexClick(Sender: TObject);
  2. const
  3.   exptest: array[0..3] of string=(',(3-)5-100',
  4.                             '8-250(-430)',
  5.                             '60-110',
  6.                             '(200-)500-3500');
  7. var
  8.   R:TRegExpr;
  9.   ex : string;
  10. begin
  11.   R := TRegExpr.Create;
  12.   try
  13.    for ex in exptest do
  14.      begin
  15.       R.Expression:= '([0-9]+\-[0-9]+)';
  16.       if R.Exec(ex) then  Memo1.Append(StringReplace(R.Match[1],'-',' ',[rfReplaceAll]));
  17.      end;
  18.   finally
  19.     R.Free;
  20.   end;
  21. end;

Result in attachment

Friendly, J.P
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: Using regular expressions to parse numeric sequences
« Reply #9 on: May 29, 2020, 01:22:13 am »
@Jurassic Park,

Thank you very much for your code, it works nice (just as that of @jamie, although his does not use regular expressions).

With best wishes,
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

Jurassic Pork

  • Hero Member
  • *****
  • Posts: 1228
Re: Using regular expressions to parse numeric sequences
« Reply #10 on: May 29, 2020, 08:16:27 am »
hello,
@Jurassic Park,
I am Jurassic Pork not Park see my avatar  ;D
With regex you can extract the min and max values at the same time.
Example :
Code: Pascal  [Select][+][-]
  1. procedure TForm1.BtGoRegexClick(Sender: TObject);
  2. const
  3.   exptest: array[0..3] of string=(',(3-n)5-100','8-250(-430)','60-110','(200-)500-3500');
  4. var
  5.   R:TRegExpr;
  6.   ex : string;
  7.   row : integer;
  8. begin
  9.   R := TRegExpr.Create;
  10.   try
  11.    row := 1;
  12.    for ex in exptest do
  13.      begin
  14.       R.Expression:= '([0-9]+)\-([0-9]+)';
  15.       if R.Exec(ex) then
  16.         begin
  17.          StringGrid1.Cells[0,row] := ex;
  18.          StringGrid1.Cells[1,row] := R.Match[1];
  19.          StringGrid1.Cells[2,row] := R.Match[2];
  20.          row += 1;
  21.         end;
  22.      end;
  23.   finally
  24.     R.Free;
  25.   end;
  26. end;  

Result :  look at Attachment (click to see animation)

Friendly, J.P
« Last Edit: May 29, 2020, 08:19:18 am by Jurassic Pork »
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

maurobio

  • Hero Member
  • *****
  • Posts: 623
  • Ecology is everything.
    • GitHub
Re: Using regular expressions to parse numeric sequences
« Reply #11 on: May 29, 2020, 01:55:42 pm »
Hi, @Jurassic PORK,

Thank you very much for that other fine example. I am glad to see that, after all, my problem has a simple solution with regular expressions!

Sorry for my mistake in your nickname - at least I placed you in the correct geological period (not in the Triassic, or the Silurian!)  :)

With best wishes,
« Last Edit: May 29, 2020, 08:25:00 pm by maurobio »
UCSD Pascal / Burroughs 6700 / Master Control Program
Delphi 7.0 Personal Edition
Lazarus 2.0.12 - FPC 3.2.0 on GNU/Linux Mint 19.1, Lubuntu 18.04, Windows XP SP3, Windows 7 Professional, Windows 10 Home

 

TinyPortal © 2005-2018