Recent

Author Topic: Parser - looking for solution  (Read 570 times)

bigeno

  • Full Member
  • ***
  • Posts: 248
Parser - looking for solution
« on: November 12, 2019, 12:04:12 pm »
Hi, I just need to find and convert example:
Code: Pascal  [Select]
  1. text: max(u1:u3;u7:u9)
to
Code: Pascal  [Select]
  1. max(u1;u2;u3;u7;u8;u9)

in short split range u1:u3 to u1;u2;u3 etc.

do we have some parser which can do it ?

Thaddy

  • Hero Member
  • *****
  • Posts: 9285
Re: Parser - looking for solution
« Reply #1 on: November 12, 2019, 01:45:48 pm »
Well, yes, but what are the types? I do not see any type declarations nor variables?
also related to equus asinus.

PascalDragon

  • Hero Member
  • *****
  • Posts: 716
  • Compiler Developer
Re: Parser - looking for solution
« Reply #2 on: November 12, 2019, 03:18:21 pm »
Thaddy, that's not what they're asking about. They're not asking about parsing some Pascal code fragment, but some custom code and they want that parsed and converted.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5788
    • wiki
Re: Parser - looking for solution
« Reply #3 on: November 12, 2019, 03:49:58 pm »
What is your usecase?

Is this a one-off conversion? Like, you want to load some file(s) into an editor, convert them once, save them, and thats it?

Or is it something that happens in an application you wrote (or are about to write)? So your app receives some data (file/stream/input), and needs to convert that data?


In either case, you can use regex to find occurrences of this kind of data. (?<[\(;])u(\d+):u(\d+)[;\)] 
Or write a pascal routine that can find it (i.e. find each "max" or other token that can contain such data, and then parse after it).
Once you have the data, you can write a "for" loop to expand it.

Thaddy

  • Hero Member
  • *****
  • Posts: 9285
Re: Parser - looking for solution
« Reply #4 on: November 12, 2019, 04:56:49 pm »
Thaddy, that's not what they're asking about. They're not asking about parsing some Pascal code fragment, but some custom code and they want that parsed and converted.
So how would you approach that? without further info? max typeless? Max string length? Max start character?
There are obviously two ranges? But how do you interpret that?  Impossible even with type inference.
« Last Edit: November 12, 2019, 05:05:09 pm by Thaddy »
also related to equus asinus.

wp

  • Hero Member
  • *****
  • Posts: 6471
Re: Parser - looking for solution
« Reply #5 on: November 12, 2019, 05:38:40 pm »
Assuming the expression with the colon and semicolon is a string it is easy to separate the components and to expand the colon (which I understand as some kind of range operator). Try this (not carefully tested and certainly not the best code at all, but just a motivation for you to write such simple analyses on your own instead of searching for ready-made code):

Code: Pascal  [Select]
  1. program Project1;
  2.  
  3. uses
  4.   Classes, SysUtils;
  5.  
  6. function ExtractParts(s: String; out APrefix: String; out ANumber: Integer): Boolean;
  7. var
  8.   numberStr: String;
  9.   i: Integer;
  10. begin
  11.   numberStr := '';
  12.   APrefix := '';
  13.   for i:=Length(s) downto 1 do
  14.   begin
  15.     if s[i] = ' ' then
  16.       Continue;
  17.     if s[i] in ['0'..'9'] then
  18.       numberStr := s[i] + numberStr
  19.     else begin
  20.       APrefix := trim(copy(s, 1, i));
  21.       break;
  22.     end;
  23.   end;
  24.   Result := (APrefix <> '') and TryStrToInt(numberStr, ANumber);
  25. end;
  26.  
  27. procedure ExpandList(s: String; AList: TStrings);
  28. var
  29.   L1, L2: TStrings;
  30.   n1, n2: Integer;
  31.   prefix1, prefix2: String;
  32.   i, j: Integer;
  33. begin
  34.   L1 := TStringList.Create;
  35.   L2 := TStringList.Create;
  36.   try
  37.     L1.Delimiter := ';';
  38.     L1.StrictDelimiter := true;
  39.     L1.DelimitedText := s;
  40.  
  41.     L2.Delimiter := ':';
  42.     L2.StrictDelimiter := true;
  43.     for i := 0 to L1.Count - 1 do begin
  44.       L2.DelimitedText := L1[i];
  45.  
  46.       // single item, e.g. n3
  47.       if L2.Count = 1 then begin
  48.         AList.Add(trim(L1[i]));
  49.         Continue;
  50.       end;
  51.  
  52.       // range item, e.g. n3:n6
  53.       if L2.Count = 2 then
  54.       begin
  55.         if ExtractParts(L2[0], prefix1, n1) and
  56.            ExtractParts(L2[1], prefix2, n2) and
  57.            (prefix1 = prefix2) and (n1 <= n2) then
  58.         begin
  59.           for j := n1 to n2 do
  60.             AList.Add(prefix1 + IntToStr(j));
  61.         end else
  62.           raise Exception.Create('Incorrect syntax');
  63.         Continue;
  64.       end;
  65.  
  66.       raise Exception.Create('Incorrect syntax');
  67.     end;
  68.  
  69.   finally
  70.     L2.Free;
  71.     L1.Free;
  72.   end;
  73. end;
  74.  
  75. const
  76.   s = 'u1; u2;u5:u8;u10;u20: u22; u27 :u30';
  77.  
  78. var
  79.   L: TStringList;
  80.   i: Integer;
  81. begin
  82.   WriteLn(s);
  83.   L := TStringList.Create;
  84.   try
  85.     ExpandList(s, L);
  86.     for i:=0 to L.Count-1 do
  87.       WriteLn(L[i])
  88.   finally
  89.     L.Free;
  90.   end;
  91.  
  92.   ReadLn;
  93. end.

Output:
Quote
u1; u2;u5:u8;u10;u20: u22; u27 :u30
u1
u2
u5
u6
u7
u8
u10
u20
u21
u22
u27
u28
u29
u30
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

bigeno

  • Full Member
  • ***
  • Posts: 248
Re: Parser - looking for solution
« Reply #6 on: November 12, 2019, 05:43:34 pm »
Yes, its only string in my app. For now I just searching colon and going to the right and to the left.
Code: Pascal  [Select]
  1.  for i:=2 to Length(mytext)-1 do begin
  2.     if (mytext[i] = ':') and (mytext[i+1] = 'u') then begin
  3.        m:=0;
  4.        _left:=0;
  5.        _right:=0;
  6.        for k:=i+2 to z-1 do begin
  7.           if TryStrToInt(mytext[k],n) then
  8.              Inc(m)
  9.           else
  10.              break;
  11.        end;
  12.        if m>0 then begin
  13.           _right:=StrToInt(Copy(mytext,i+2,m));
  14.           m:=0;
  15.           for k:=i-1 downto 2 do begin
  16.              if TryStrToInt(mytext[k],n) then
  17.                 Inc(m)
  18.              else
  19.                 break;
  20.           end;
  21.           if (m > 0) and (mytext[i-m-1] = 'u') then
  22.              _left:=StrToInt(Copy(mytext,i-m,m));
  23.           if (_left>0) and (_right>0) then begin
  24.              Result.rLeft:=_left;
  25.              Result.rRight:=_right;
  26.              exit;
  27.           end;
  28.        end;
  29.     end;
  30.  end;

but I wonder if exist something like parser with (?<[\(;])u(\d+):u(\d+)[;\)]  as @Martin mentioned.
« Last Edit: November 12, 2019, 05:45:26 pm by bigeno »

bigeno

  • Full Member
  • ***
  • Posts: 248
Re: Parser - looking for solution
« Reply #7 on: November 12, 2019, 05:57:21 pm »
Assuming the expression with the colon and semicolon is a string it is easy to separate the components and to expand the colon (which I understand as some kind of range operator). Try this (not carefully tested and certainly not the best code at all, but just a motivation for you to write such simple analyses on your own instead of searching for ready-made code):

thanks wp, very useful.
You guys misunderstood me ;) I'm not looking for ready solution from you, I just don't want to write something that has already been written or can be done easier/safer with regex parser.

MarkMLl

  • Sr. Member
  • ****
  • Posts: 285
Re: Parser - looking for solution
« Reply #8 on: November 12, 2019, 06:58:24 pm »
I'm not a computer science guy (and sneakily proud of that fact), but as I understand it you can't parse something like

> max(u1:u3;u7:u9)

using a regex since there's implicit context-sensitivity in there (u1 < u3 and so on :-)

OK, so there's ways you can hack it using a regex as the frontend and custom code to process each chunk as it's extracted. But however attractive it looks initially, it usually ends up looking far messier than wp's example solution.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

Thaddy

  • Hero Member
  • *****
  • Posts: 9285
Re: Parser - looking for solution
« Reply #9 on: November 12, 2019, 07:11:18 pm »
I'm not a computer science guy (and sneakily proud of that fact), but as I understand it you can't parse something like

> max(u1:u3;u7:u9)
But what is the context of max()? This is not computer science but logic... max what? If max does not mean max it would be easy to parse, a simple split, but I fear max means max and I do not know - only OP knows - what to max() and with what preconditions... :D O:-) My niece is called Max, even she can not solve this.
Daft example:
Code: Pascal  [Select]
  1. {$mode objfpc}{$H+}
  2. type
  3.   TU = (u1,u2,u3,u4,u5,u6,u7,u8,u9);
  4. var
  5.   u:Tu;
  6. begin
  7.   for u := u1 to u3 do write(u,';');
  8.   for u := u7 to u9 do write(u,';');
  9.   writeln(' garbage in, garbage out');
  10. end.
  ;D
« Last Edit: November 12, 2019, 07:23:56 pm by Thaddy »
also related to equus asinus.

bigeno

  • Full Member
  • ***
  • Posts: 248
Re: Parser - looking for solution
« Reply #10 on: November 12, 2019, 07:40:40 pm »
Max is not importand, its some of my functions, It can be max, min etc. That range argument was my point u1:u8 and maybe u8:u1, after u can be integer from 1 to 1000... I think I'll stay with loop on string and check all condition, I can remove all spaces in text so its not so hard solution.
I just thought that there is package something like:
parser.text:='funct(u1;u2;u5:u11)';
parser.regexp:='u\d+:u\d+ ';

but I now see its more complex for regexp to replace... yeah


MarkMLl

  • Sr. Member
  • ****
  • Posts: 285
Re: Parser - looking for solution
« Reply #11 on: November 12, 2019, 08:14:43 pm »
Well I certainly see what you mean, even if I was uncertain to start with whether you were asking about parsing text in a program or doing a one-shot replacement in the IDE to fix source originally written for some other compiler.

Something like (assuming whitespace is discarded as you progress):

* Parse function name and ( .

* Parse a start-of-range variable name.

* If next character is : swallow it and parse end-of-range variable name. Emit range.

* If next character is ; swallow it. Emit single variable name.

* If next character is ) swallow it. Emit single variable name. Exit loop.

Obviously the "emit range" operation has you splitting the alpha prefix and numeric suffix.

Lots of scope for frills, but a bit of reading on recursive descent and EBNF might be in order. Just don't go down the linguistic rabbit hole.

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

Edson

  • Hero Member
  • *****
  • Posts: 1056
Re: Parser - looking for solution
« Reply #12 on: November 13, 2019, 12:13:20 am »
You could use SynfacilSyn highlighter as a lexer too.  Just define a simple syntax like:

Code: Pascal  [Select]
  1.   xLex.DefTokIdentif('[$A-Za-z_]', '[A-Za-z0-9_]*');
  2.   xLex.DefTokContent('[0-9]', '[0-9.]*', xLex.tnNumber);
  3.  

And the you can scan the tokens from a line:

Code: Pascal  [Select]
  1.     while not xLex.GetEol do begin
  2.       //Do something with the token
  3.       xLex.Next;  
  4.     end;
  5.  
Lazarus 1.6 - FPC 3.0.0 - x86_64-win64 on  Windows 7