Lazarus

Free Pascal => Beginners => Topic started by: justnewbie on July 15, 2019, 07:29:07 pm

Title: Regex question
Post by: justnewbie on July 15, 2019, 07:29:07 pm
Hi guys,

I have this text:
Code: Pascal  [Select][+][-]
  1. something retvalue -1 anything anyt=55; nothing var1=5;

By using the RegExpr, I want to get the text in this form (ie. the anything and nothing should be in a new line):
Code: Pascal  [Select][+][-]
  1. something retvalue -1
  2. anything anyt=55;
  3. nothing var1=5;

The https://regex101.com/ site gives me the good result if I write the \n\0 in the substitution field (see the attached picture).
But the Lazarus cannot handle it.

My code:
Code: Pascal  [Select][+][-]
  1. reBefore := '(anything|nothing)';
  2. reAfter := '\n\0';
  3. Memo1.Text := ReplaceRegExpr(reBefore, Memo1.Text, reAfter, true);

This code doesn't do what I need. How can I solve this?
Title: Re: Regex question
Post by: engkin on July 15, 2019, 07:35:37 pm
replace '\n\0' with LineEnding
Title: Re: Regex question
Post by: justnewbie on July 15, 2019, 07:41:47 pm
replace '\n\0' with LineEnding
Do you mean
Code: Pascal  [Select][+][-]
  1. reAfter := LineEnding;
?
This doesn't work.
Title: Re: Regex question
Post by: engkin on July 15, 2019, 07:49:37 pm
Code: Pascal  [Select][+][-]
  1. program Project1;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   RegExpr;
  7.  
  8. var
  9.   s1:string='something retvalue -1 anything anyt=55; nothing var1=5;';
  10.   s2, reBefore, reAfter: String;
  11. begin
  12.   reBefore := '(anything|nothing)';
  13.   reAfter := LineEnding+'$0';
  14.   s2 := ReplaceRegExpr(reBefore, s1, reAfter, true);
  15.   WriteLn(s2);
  16.   ReadLn;
  17. end.
Title: Re: Regex question
Post by: justnewbie on July 15, 2019, 08:09:03 pm
Great, thank you, works!
If I may ask, why did not work the original version? >> '\n\0'
What does '$0' mean?

Also, I tried this way
Code: Pascal  [Select][+][-]
  1. reAfter := '\n$0';
and it worked as well.
Title: Re: Regex question
Post by: engkin on July 15, 2019, 08:41:21 pm
Great, thank you, works!
why did not work the original version? >> '\n\0'
It did work, but did not give what you want.

What does '$0' mean?
It refers to the text inside the brackets, in this expression it may be anything or nothing. To see it, try this change:
Code: Pascal  [Select][+][-]
  1.   reAfter := '-->$0<--';

Also, I tried this way
Code: Pascal  [Select][+][-]
  1. reAfter := '\n$0';
and it worked as well.
I just checked RegExpr code, it supports:
Code: Pascal  [Select][+][-]
  1.      't': Result := #$9;  // \t => tab (HT/TAB)
  2.      'n': Result := #$a;  // \n => newline (NL)
  3.      'r': Result := #$d;  // \r => carriage return (CR)
  4.      'f': Result := #$c;  // \f => form feed (FF)
  5.      'a': Result := #$7;  // \a => alarm (bell) (BEL)
  6.      'e': Result := #$1b; // \e => escape (ESC)
  7.      'x': begin // \x: hex char
  8.  
Title: Re: Regex question
Post by: justnewbie on July 15, 2019, 08:47:32 pm
Thank you for your help, engkin!  :)
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 11:58:32 am
Now I have a bit more complicated task.
In my example there is a loop "header" that is in 3 lines, and I need it only in 1 line.
The original text:
Code: Pascal  [Select][+][-]
  1.    string something="";
  2.    bool anything=true;
  3.    for(int i=0;
  4.    i<var1;
  5.    i++)
  6.      {
  7.       string nothing=text[i];
  8.      }
I want the for(int i=0; i<var1; i++) in 1 line.
Important: I need it in a general formula, so eg. it can be either for(int var2=x; var2<var3; var2++) or for(int var4=100; var4>=var1; var4--) and similar ...

By using this regex for\s*\([^\n]+\n[^\n]+\n[^\n]+ I can get the loop "header", but I don't know how could I get the result without those 2 \n (highlighted with orange).
Any help?
Title: Re: Regex question
Post by: engkin on July 16, 2019, 01:35:58 pm
To remove all line endings use:
Code: Pascal  [Select][+][-]
  1.   reBefore := '([^\r\n]+)[\r\n]+';
  2.   reAfter := '$1';
  3.   s2 := ReplaceRegExpr(reBefore, s1, reAfter, true);

To remove the two line endings in a for statement:
Code: Pascal  [Select][+][-]
  1.   reBefore := '(for\s*\([^\r\n]+)[\r\n]+([^\r\n]+)[\r\n]+([^\r\n]+\))';
  2.   reAfter := '$1$2$3';
  3.   s2 := ReplaceRegExpr(reBefore, s1, reAfter, true);
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 02:06:33 pm
Thank you engkin, this is exactly what I need for!
(Soon I will have new question ...)  :)
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 04:19:45 pm
New issue:
I need those lines that do not start with the word 'something'.
I know what ^something does, but I need for its opposite.

Edit: maybe I figured it out:
^[^\bsomething\b].*
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 05:09:51 pm
Next:

I have this text:
Quote
mytext mytext hjsdgfjsh 57574 //gkgaksg
//  ztwojkgk
fjfjaf//tjtrjs
nodelete fghfgkhag //zrwjsj
ghfhfhd ://uzrutziut

I want to delete the texts in red (text after the //, together the // itself), but I don't want to delete the texts in teal.
Ie., no delete if
1./ a line starts with nodelete
or
2./ there is a colon (:) right before the //

So I want to get this:
Quote
mytext mytext hjsdgfjsh 57574

fjfjaf
nodelete fghfgkhag //zrwjsj
ghfhfhd ://uzrutziut

Update:
I found the solution for the 2. case:
(?<!:)\/\/[^\n]+[\n]+
I need to expand it with the 1. rule also.
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 06:52:22 pm
I found a solution:
(^(?!nodelete).*)(?<!:)\/\/[^\n]+[\n]+

It works well in the online regex tester https://regex101.com/, but Lazarus gives an error (unrecognized modifier).
Any help?
Title: Re: Regex question
Post by: engkin on July 16, 2019, 07:07:23 pm
Probably time to consider using a different engine. For instance look for FLRE.
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 07:31:01 pm
Probably time to consider using a different engine. For instance look for FLRE.
What is it? I only found 'flare regular expr' in Google.
Can you provide me a link?
Title: Re: Regex question
Post by: engkin on July 16, 2019, 08:15:09 pm
https://github.com/BeRo1985/flre
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 09:01:51 pm
https://github.com/BeRo1985/flre
Uh, that's bad. I never used the github and don't know what should I do with the downloaded zip.
Couldn't find install instructions.
Anyway, thanks.
Title: Re: Regex question
Post by: engkin on July 16, 2019, 10:51:12 pm
There are two files, FLRE.pas and PUCU.pas, need to be visible to your project. Add their path to your project, or copy them to your folder.

For frequently used units, create a package, add them to the package, and compile the it. Later on, when you need any of these units, add the package to your project.
Title: Re: Regex question
Post by: justnewbie on July 16, 2019, 11:00:35 pm
There are two files, FLRE.pas and PUCU.pas, need to be visible to your project. Add their path to your project, or copy them to your folder.

For frequently used units, create a package, add them to the package, and compile the it. Later on, when you need any of these units, add the package to your project.
I will give it a shot, thank you for your help!
Title: Re: Regex question
Post by: justnewbie on October 11, 2019, 10:23:32 am
Hi guys/girls,
I am stuck with a regex problem, cannot figure it out.
This is the text:
Quote
{
   First,//Something text
   Second,
   Small//Something text
   Big
}

I need to get the First, Second, Small and Big (highlighted with Teal).
There are 3 spaces before them, but it can be any not just 3 (from 0 to infinity).
I tried this regex:
(?<=(^\s\s\s))(.*)(?=(,|\/\/|\n))
I doesn't work, see the image. Can anyone help me?



Title: Re: Regex question
Post by: bytebites on October 11, 2019, 01:10:31 pm
(?<=(\n(\s*)))([a-zA-Z]*)(?=([,/\n]))
Title: Re: Regex question
Post by: justnewbie on October 11, 2019, 01:47:02 pm
(?<=(\n(\s*)))([a-zA-Z]*)(?=([,/\n]))
Thank you, but it gives pattern error:
Title: Re: Regex question
Post by: bytebites on October 11, 2019, 02:01:21 pm
https://regexr.com gives only a warning  about positive lookbehind.
Title: Re: Regex question
Post by: justnewbie on October 11, 2019, 02:13:11 pm
https://regexr.com gives only a warning  about positive lookbehind.
I am using the https://regex101.com/, but tried the https://regexr.com and it gives no matches but error:
Title: Re: Regex question
Post by: bytebites on October 11, 2019, 03:01:53 pm
The "positive lookbehind" feature may not be supported in all browsers.
Title: Re: Regex question
Post by: justnewbie on October 11, 2019, 03:23:26 pm
The "positive lookbehind" feature may not be supported in all browsers.
I need this thing in my Lazarus program, not in a browser.
Title: Re: Regex question
Post by: Leledumbo on October 11, 2019, 04:04:41 pm
Hi guys/girls,
I am stuck with a regex problem, cannot figure it out.
This is the text:
Quote
{
   First,//Something text
   Second,
   Small//Something text
   Big
}

I need to get the First, Second, Small and Big (highlighted with Teal).
There are 3 spaces before them, but it can be any not just 3 (from 0 to infinity).
I tried this regex:
(?<=(^\s\s\s))(.*)(?=(,|\/\/|\n))
I doesn't work, see the image. Can anyone help me?
That looks like a JSON so a JSON parser should work better.
Title: Re: Regex question
Post by: justnewbie on October 11, 2019, 05:19:48 pm
Thank you, but your advice is not an answer to my question.
I need the proper pattern that can be used in Lazarus.
Title: Re: Regex question
Post by: howardpc on October 11, 2019, 06:42:41 pm
This quick hack solution does not answer your regex question either, but it does give a correct result.
Code: Pascal  [Select][+][-]
  1. function ParsedOK(aTxt: String; out s1, s2, s3, s4: String): Boolean;
  2. var
  3.   c: SizeInt;
  4.   sl: TStringList;
  5. begin
  6.   Result := False;
  7.   s1 := ''; s2:= ''; s3 := ''; s4 := '';
  8.   aTxt := Trim(aTxt);
  9.   if (Length(aTxt) < 12) or (aTxt[1] <> '{') or (aTxt[Length(aTxt)] <> '}') then
  10.     Exit;
  11.   sl := TStringList.Create;
  12.   try
  13.     sl.Text := aTxt;
  14.     if sl.Count <> 6 then
  15.       Exit;
  16.     c := Pos(',', sl[1]);
  17.     if c = 0 then
  18.       Exit;
  19.     Dec(c);
  20.     s1 := Trim(Copy(sl[1], 1, c));
  21.  
  22.     c := Pos(',', sl[2]);
  23.     if c = 0 then
  24.       Exit;
  25.     Dec(c);
  26.     s2 := Trim(Copy(sl[2], 1, c));
  27.  
  28.     c := Pos('/', sl[3]);
  29.     if c = 0 then
  30.       Exit;
  31.     Dec(c);
  32.     s3 := Trim(Copy(sl[3], 1, c));
  33.  
  34.     s4 := Trim(sl[4]);
  35.     Result := True;
  36.   finally
  37.     sl.Free;
  38.   end;
  39. end;
Title: Re: Regex question
Post by: justnewbie on October 11, 2019, 06:50:57 pm
Thank you, but unfortunately it is not good for me. I need REGEX.
(For example: I don't know in advance how many strings will be there, the 4 is just an example. So, I need a general solution, regex can do it.)
Title: Re: Regex question
Post by: ASerge on October 12, 2019, 01:06:24 am
Thank you, but unfortunately it is not good for me. I need REGEX.
Code: Pascal  [Select][+][-]
  1. {$MODE OBJFPC}
  2. {$APPTYPE CONSOLE}
  3. {$LONGSTRINGS ON}
  4.  
  5. uses RegExpr;
  6.  
  7. procedure Test(const S: string);
  8. var
  9.   R: TRegExpr;
  10. begin
  11.   R := TRegExpr.Create('');
  12.   try
  13.     R.ModifierM := True;
  14.     R.Expression := '^\W*(\w+)';
  15.     if R.Exec(S) then
  16.       repeat
  17.         Writeln('"', R.Match[1], '"');
  18.       until not R.ExecNext;
  19.   finally
  20.     R.Free;
  21.   end;
  22. end;
  23.  
  24. const
  25.   CSampleInputText =
  26.     '{' + LineEnding +
  27.     '   First,//Something text' + LineEnding +
  28.     '   Second,' + LineEnding +
  29.     '   Small//Something text' + LineEnding +
  30.     '   Big' + LineEnding +
  31.     '}';
  32. begin
  33.   Test(CSampleInputText);
  34.   Readln;
  35. end.
Title: Re: Regex question
Post by: howardpc on October 12, 2019, 11:19:09 am
I don't know in advance how many strings will be there, the 4 is just an example. So, I need a general solution, regex can do it.
A more general (non-regex) solution might be something like the following.
Code: Pascal  [Select][+][-]
  1. program ParseExample;
  2.  
  3. {$AppType console}
  4. {$Mode objfpc}{$H+}      
  5.  
  6.  
  7. uses Classes, SysUtils;
  8.  
  9. const
  10.   txt = '{' + LineEnding +
  11.         '   First,//Something text' + LineEnding +
  12.         '   Second,' + LineEnding +
  13.         '                          extra                   ' + LineEnding +
  14.         '   Small//Something text' + LineEnding +
  15.         '   Big' + LineEnding + '}';
  16.  
  17. function ParsedCommentToFirstWords(const aTxt: String; list: TStrings): Boolean;
  18.  
  19.     function FirstWord(const s: String): String;
  20.     var
  21.       p, b: Integer;
  22.     begin
  23.       p := 1;
  24.       while (s[p] in [' ', #9]) and (p < Length(s)) do
  25.         Inc(p);
  26.       b := p;
  27.       while (s[p] in ['a'..'z','A'..'Z']) and (p < Length(s)) do
  28.         Inc(p);
  29.       case (p = Length(s)) and (s[p] in ['a'..'z','A'..'Z']) of
  30.         True:  Result := Copy(s, b, p-b+1);
  31.         False: Result := Copy(s, b, p-b);
  32.       end;
  33.     end;
  34.  
  35. var
  36.   i, min: Integer;
  37. begin
  38.   list.Text := Trim(aTxt);
  39.   if (list.Count < 3) or (Trim(list[0]) <> '{') or (Trim(list[list.Count-1]) <> '}') then
  40.     Exit(False);
  41.   list.Delete(0);
  42.   list.Delete(list.Count-1);
  43.   min := MaxInt;
  44.   for i := 0 to list.Count-1 do
  45.     begin
  46.       list[i] := FirstWord(list[i]);
  47.       if min > Length(list[i]) then
  48.         min := Length(list[i]);
  49.     end;
  50.   Result := min > 0;
  51. end;
  52.  
  53. var
  54.   sl: TStringList;
  55.   s: String;
  56. begin
  57.   sl := TStringList.Create;
  58.   if ParsedCommentToFirstWords(txt, sl) then
  59.     for s in sl do
  60.       WriteLn(s)
  61.   else Writeln('invalid text format for "',txt,'"');
  62.   sl.Free;
  63. end.
Title: Re: Regex question
Post by: justnewbie on October 14, 2019, 10:33:10 pm
Guys, thank you very much!
Title: Re: Regex question
Post by: justnewbie on October 15, 2019, 03:49:45 pm
Hi ASerge, I wish you were somewhere nearby ...  :)
I have a task that is too complicated for me (again! LOL). I spent hours with it, but ...
I have a text and I need to get all the variable names (highlighted with bold).
What is the proper REGEX pattern to get them? Have you any idea?
Quote
int something;
double anything=2.15;
int nothing  = 7;
string anytext, mytext="",poems;
int   x=7;int z; int w=85;
int k,q,e=2;
Title: Re: Regex question
Post by: Thaddy on October 15, 2019, 04:02:43 pm
Regular expressions are powerful, but that looks like something where they are not in place.
It is much simpler to compare to a list of reserved words(int,double etc, not something and family) and parse until white space or control is met
Title: Re: Regex question
Post by: justnewbie on October 15, 2019, 04:10:36 pm
Regular expressions are powerful, but that looks like something where they are not in place.
It is much simpler to compare to a list of reserved words(int,double etc, not something and family) and parse until white space or control is met
I don't understand your post. I need to get the variable names (something, anything etc ..) that are highlighted with bold.
Title: Re: Regex question
Post by: lucamar on October 15, 2019, 04:30:39 pm
Regular expressions are powerful, but that looks like something where they are not in place.
It is much simpler to compare to a list of reserved words(int,double etc, not something and family) and parse until white space or control is met
I don't understand your post. I need to get the variable names (something, anything etc ..) that are highlighted with bold.

It means parsing out the known reserved words, numbers, etc. so that you're left with the "unknown" ones, which are what you're after. For example, parsing the second line: double anything=2.15; you first get "double" which, being a reserved word, you can ignore; then you skip the space(s) and get the word up to the symbol "=": you get "anything" which is not a reserved word but comes after one, so it must be a variable name, which is what you're looking for. Keep going on, skipping the parts in which you're not interested, and you get your list of variables as result.

It might help if you draw a BNF diagram of your lines; that allows you to get a "feel" of what (and how) to parse "in" and what "out".
Title: Re: Regex question
Post by: justnewbie on October 15, 2019, 04:40:04 pm
Regular expressions are powerful, but that looks like something where they are not in place.
It is much simpler to compare to a list of reserved words(int,double etc, not something and family) and parse until white space or control is met
I don't understand your post. I need to get the variable names (something, anything etc ..) that are highlighted with bold.

It means parsing out the known reserved words, numbers, etc. so that you're left with the "unknown" ones, which are what you're after. For example, parsing the second line: double anything=2.15; you first get "double" which, being a reserved word, you can ignore; then you skip the space(s) and get the word up to the symbol "=": you get "anything" which is not a reserved word but comes after one, so it must be a variable name, which is what you're looking for. Keep going on, skipping the parts in which you're not interested, and you get your list of variables as result.

It might help if you draw a BNF diagram of your lines; that allows you to get a "feel" of what (and how) to parse "in" and what "out".
To be honest, I cannot imagine that there is no a much simpler way.
Think of it: there are hundreds of reserved words and characters (my example is heavily simplified).

This is where I am now: I can get the "orange parts" that contain those names (see picture). The pattern: \b(int|double|string)(\s+)(.+)(;)  >> the smiley is a ; and )
But, I don't know how could I get ONLY the names.
Title: Re: Regex question
Post by: Thaddy on October 15, 2019, 04:51:44 pm
To be honest, I cannot imagine that there is no a much simpler way.
Think of it: there are hundreds of reserved words and characters (my example is heavily simplified).
It is NOT that easy! As Lucamar confirmed.
For such tasks I usually write a compiler...
Don't be afraid, I mean I write a grammar and use plex and pyacc or GoldParser to  generate the basic code.
In your case any C grammar would generate a correct lexer and parser for your problem so you don't even have to write a grammar......

Now that is a "nice" answer to anyone that thinks simple things must be easy to program.. O:-)

There is a good example: the h2pas sourcecode. (I mean that: you have to do all that)
Title: Re: Regex question
Post by: justnewbie on October 15, 2019, 07:20:18 pm
It can be done with 2 or 3 steps, see pictures. :)
Title: Re: Regex question
Post by: howardpc on October 15, 2019, 07:59:04 pm
A non-regex solution could be done by extending the list of allowed type names in the following example.
I don't know if digits are allowed in your variable names. If so, you'll need to adjust the parsing routine accordingly.
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$IfDef Windows}{$AppType console}{$EndIf}
  5.  
  6. uses
  7.   SysUtils, Types;
  8.  
  9. function IsReserved(const aWord: String): Boolean; // extend this for needed keywords
  10. begin
  11.   if not Length(aWord) in [3, 6] then
  12.     Exit(False);
  13.   case LowerCase(aWord) of
  14.     'int',
  15.     'string',
  16.     'double': Exit(True);
  17.     else
  18.       Exit(False);
  19.   end;
  20. end;
  21.  
  22. function GetVarNames(aTxt: String): TStringDynArray;
  23. var
  24.   p: Integer = 1;
  25.   index: Integer = 0;
  26.   s: String;
  27.  
  28.   function GetNextWord: String; // assumes variable names must be alphabetical
  29.   begin
  30.     Result := '';
  31.     while (p < Length(aTxt)) and not (aTxt[p] in ['A'..'Z','a'..'z']) do
  32.       Inc(p);
  33.     while (p < Length(aTxt)) and (aTxt[p] in ['A'..'Z','a'..'z']) do
  34.       begin
  35.         Result := Result + aTxt[p];
  36.         Inc(p);
  37.       end;
  38.   end;
  39.  
  40. begin
  41.   SetLength(Result, Length(aTxt) shr 1);
  42.   aTxt := Trim(aTxt);
  43.   repeat
  44.     s := GetNextWord;
  45.     case IsReserved(s) of
  46.       True: ;
  47.       False:
  48.         begin
  49.           Result[index] := s;
  50.           Inc(index);
  51.         end;
  52.     end;
  53.   until s = '';
  54.   SetLength(Result, Pred(index));
  55. end;
  56.  
  57. var
  58.   txt: String = 'int something;'+LineEnding+
  59.                 'double anything=2.15;' + LineEnding +
  60.                 'int nothing  = 7;' + LineEnding +
  61.                 'string anytext, mytext="",poems;' + LineEnding +
  62.                 'int   x=7;int z; int w=85;' + LineEnding +
  63.                 'int k,q,e=2;';
  64.   arr: TStringDynArray;
  65.   s: String;
  66.  
  67. begin
  68.   arr := GetVarNames(txt);
  69.   for s in arr do
  70.     WriteLn(s);
  71.   WriteLn('Press [Enter] to finish');
  72.   ReadLn;
  73. end.
This outputs
Code: Pascal  [Select][+][-]
  1. something
  2. anything
  3. nothing
  4. anytext
  5. mytext
  6. poems
  7. x
  8. z
  9. w
  10. k
  11. q
  12. e
  13. Press [Enter] to finish
Title: Re: Regex question
Post by: Thaddy on October 15, 2019, 08:17:50 pm
That's extending white space...(but it neat code and maybe is a solution)
The only proper solution is a parser.
Title: Re: Regex question
Post by: justnewbie on October 15, 2019, 08:36:19 pm
Thank you guys for the contribution!
Title: Re: Regex question
Post by: bytebites on October 15, 2019, 09:36:31 pm
Code: Pascal  [Select][+][-]
  1. Result := Result + aTxt[p];
This is slow.
Title: Re: Regex question
Post by: howardpc on October 22, 2019, 04:32:28 pm
A non-regex solution:
Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}{$H+}
  4. {$IfDef Windows}{$AppType console}{$EndIf}
  5.  
  6. uses
  7.   SysUtils;
  8.  
  9. var
  10.   txt: String = 'Something (anything nothing="hey!", anything something, nothing="hola!", thing)' +
  11.                 'something nothing="aloha!"';
  12.   sArr: TStringArray;
  13.   s: String;
  14.   b, e, i: Integer;
  15.  
  16.   function ExtractedBrackets(const aTxt: String; out Brackets: TStringArray): Boolean;
  17.   var
  18.     p, pb: Integer;
  19.     bCount: Integer = 0;
  20.   begin
  21.     SetLength({%H-}Brackets, 0);
  22.     Result := False;
  23.     p := 0;
  24.     while p < Length(aTxt) do
  25.       begin
  26.         Inc(p);
  27.         if (aTxt[p] = '(') and (bCount = 0) then
  28.           begin
  29.             Inc(bCount);
  30.             pb := Succ(p);
  31.           end;
  32.         if (aTxt[p] = ')') and (bCount > 0) then
  33.           begin
  34.             SetLength(Brackets, Length(Brackets)+1);
  35.             Brackets[High(Brackets)] := Copy(aTxt, pb, p-pb);
  36.             Dec(bCount);
  37.             Result := True;
  38.           end;
  39.       end;
  40.   end;
  41.  
  42. function FoundBetweenPatternAndSeparator(aBegin: Integer; const aTxt, aPattern: String; aSeparators: TSysCharSet; out Fragment: String; out EndPos: Integer): Boolean;
  43. var
  44.   p, b: Integer;
  45. begin
  46.   Fragment := '';
  47.   Result := False;
  48.   EndPos := 0;
  49.   p := Pos(aPattern, aTxt, aBegin);
  50.   if p > 0 then
  51.     begin
  52.       Inc(p, Length(aPattern));
  53.       b := p;
  54.       repeat
  55.         Inc(p);
  56.       until (p = Length(aTxt)) or (aTxt[p] in aSeparators);
  57.       if aTxt[p] in aSeparators then
  58.         begin
  59.           Fragment := Copy(aTxt, b, p-b);
  60.           EndPos := Succ(p);
  61.           Exit(True);
  62.         end;
  63.     end;
  64. end;
  65.  
  66. begin
  67.   b := 1;
  68.   if ExtractedBrackets(txt, sArr) then
  69.     for i := 0 to High(sArr) do
  70.       while FoundBetweenPatternAndSeparator(b, sArr[i], 'nothing=', [','], s, e) do
  71.         begin
  72.           WriteLn(s);
  73.           b := e;
  74.         end;
  75.   ReadLn;
  76. end.
Title: Re: Regex question
Post by: ASerge on October 22, 2019, 11:04:31 pm
I want to find all occurences of a pattern in a text.
Requirements: follows the nothing= and has to be within brackets.
Code: Pascal  [Select][+][-]
  1. {$MODE OBJFPC}
  2. {$APPTYPE CONSOLE}
  3. {$LONGSTRINGS ON}
  4.  
  5. uses RegExpr;
  6.  
  7. procedure Test(const S: string);
  8. var
  9.   ROut, RIn: TRegExpr;
  10. begin
  11.   ROut := TRegExpr.Create('\(([^\)]*)\)');
  12.   try
  13.     if ROut.Exec(S) then
  14.     begin
  15.       RIn := TRegExpr.Create('nothing="([^"]+)"');
  16.       try
  17.         repeat
  18.           if RIn.Exec(ROut.Match[1]) then
  19.             repeat
  20.               Writeln(RIn.Match[1]);
  21.             until not RIn.ExecNext;
  22.         until not ROut.ExecNext;
  23.       finally
  24.         RIn.Free;
  25.       end;
  26.     end;
  27.   finally
  28.     ROut.Free;
  29.   end;
  30. end;
  31.  
  32. const
  33.   CSampleInputText =
  34.     'Something (anything nothing="hey!", anything something, nothing="hola!", thing)' + LineEnding +
  35.     'something nothing="aloha!"' + LineEnding +
  36.     '(skip empty nothing="")' + LineEnding +
  37.     '(other nothing="other!", nothing="again!", thing)';
  38. begin
  39.   Test(CSampleInputText);
  40.   Readln;
  41. end.
Title: Re: Regex question
Post by: maurobio on November 05, 2019, 11:46:14 am
Dear ALL,

I have strings with some embedded characters as follows:

s := '\i{}This is a string\i0{}'

I want to get rid of the \{} and \i0{} using the following regex:

e := '\\i\d*{}'

But when I attempt this:

n := ReplaceRegExpr(e, s, '', True);

I get the error:

TRegrExpr(comp): Nested *?+ (pos 6)

The above regex works in Python (as n = re.sub(e, '', s)), but obviously the FPC implementation is different.

Could someone give me a hand?

Thanks in advance!

Best regards,
Title: Re: Regex question
Post by: Thaddy on November 05, 2019, 11:51:45 am
I already replied in the other post: use an escape to escape a slash \\
Title: Re: Regex question
Post by: maurobio on November 05, 2019, 12:11:38 pm
Thanks, @Thaddy!
Title: Re: Regex question
Post by: maurobio on November 05, 2019, 12:18:40 pm
... but when I changed the regex to:

e := '\\i\\d*{}'

I still get the same error (just one position changed):

TRegrExpr(comp): Nested *?+ (pos 7)

Any hints?
Title: Re: Regex question
Post by: bytebites on November 05, 2019, 01:44:02 pm
e:='\\i\d*\{}'

() is not needed.
Title: Re: Regex question
Post by: maurobio on November 05, 2019, 09:03:23 pm
@bitebyes:

There are no parentheses ('()') in my regex, those are braces ('{}'), which are part of the string and that I want to get rid of.

Cheers,
TinyPortal © 2005-2018