Recent

Author Topic: Plex start states  (Read 223 times)

Anonimista

  • New Member
  • *
  • Posts: 19
Plex start states
« on: September 19, 2019, 09:52:27 pm »
I can't figure out how start states in Plex (the lexer generator) work. In lex/flex each state would be enclosed in parens, eg

Code: C  [Select]
  1.    <INITIAL>{
  2.      "//"              BEGIN(IN_COMMENT);
  3.      }
  4.      <IN_COMMENT>{
  5.      \n      BEGIN(INITIAL);
  6.      [^\n]+    // eat comment
  7.      "/"       // eat the lone /
  8.      }

and the <INITIAL> state is the default. In Plex, BEGIN is replaced with start():

Code: Pascal  [Select]
  1. %start x y
  2. %%
  3. <x>a    start(y);
  4. <y>b    start(x);
  5. %%
  6. begin
  7.   start(x); if yylex=0 then ;
  8. end.

start() in lexlib.pas:

Code: Pascal  [Select]
  1. procedure start ( state : Integer );
  2.   (* puts the lexical analyzer in the given start state; state=0 denotes
  3.      the default start state, other values are user-defined *)


Here is my lexer.l:

Code: Pascal  [Select]
  1. %{
  2.     uses lexlib, yacclib;
  3. %}
  4.  
  5. %start str
  6.  
  7. digit [0-9]
  8. letter [ a-zA-Z]
  9.  
  10. %%
  11. {digit}+        writeln(yytext);
  12. \"              begin writeln('string start...'); start(str); end;
  13. <str>{letter}+  writeln('in string');
  14. <str>\"         begin writeln('string end...'); start(0); end;
  15. .               yyerror('Caracter desperado.');
  16.  
  17. %%
  18.  
  19. begin
  20.     yylex();
  21. end.
  22.  

and the example session:

Code: Pascal  [Select]
  1. [jon@test test2]$ ./lexer
  2. 1
  3. 1
  4.  
  5. "
  6. string start...
  7.  
  8. a
  9. in string
  10.  
  11. b
  12. in string
  13.  
  14. "
  15. string start...
  16.  
  17. ???
  18. Caracter desperado.
  19. Caracter desperado.
  20. Caracter desperado.
  21.  

I used start(0) but the lexer is stuck in the <str> state. I'm not sure how to put it in the initial state back again.

Anonimista

  • New Member
  • *
  • Posts: 19
Re: Plex start states
« Reply #1 on: September 20, 2019, 10:14:57 pm »
In the mean time I found the original TP lex/yacc archive file which contains several examples including magic.l which demonstrates the use of start states. Using that I now have this code that seems to work:


Code: Pascal  [Select]
  1.  
  2. %{
  3.     {$mode objfpc}{$H+}
  4.    
  5.     uses LexLib;
  6.    
  7.     var
  8.         strval: string;
  9.  
  10. %}
  11.  
  12. %start initial str
  13.  
  14. %%
  15.  
  16. <initial>[0-9]+     begin
  17.                         writeln('integer: ' + yytext);
  18.                     end;
  19.  
  20. <initial>do         begin
  21.                         writeln('keyword - do');
  22.                     end;
  23.                    
  24. <initial>loop       begin
  25.                         writeln('keyword - loop');
  26.                     end;
  27.  
  28. <initial>\"         begin
  29.                         strval := '';
  30.                         start(str);
  31.                     end;
  32.  
  33. <str>\"             begin
  34.                         writeln('string: ' + strval);
  35.                         start(initial);
  36.                     end;
  37.  
  38. <str>\\\"           begin
  39.                         strval := strval + '\"';
  40.                     end;
  41.  
  42. <str>[^\"]          begin
  43.                         strval := strval + yytext;
  44.                     end;
  45.  
  46. .                   writeln('Caracter desperado!');
  47.  
  48. %%
  49.  
  50. begin
  51.     start(initial);
  52.     if yylex=0 then ;
  53. end.
  54.  
  55.  

In the main loop I start with the initial state (start(initial)). When the lexer encounters an " it switches to the str state and keeps adding characters to strval until it encounters the closing " <str>\\\" accounts for escaped " characters within the string. Here is an example session

Code: Pascal  [Select]
  1. [jon@test test2]$ ./lexer
  2. 123
  3. integer: 123
  4.  
  5. 123 "abc 456" 789
  6. integer: 123
  7.  string: abc 456
  8.  integer: 789
  9.  
  10. string: "\t line \n line \n\r"
  11. string: string: \t line \n line \n\r
  12.  
  13. "\""
  14. string: \"
  15.  
  16. "\nstring\t\"line"
  17. string: \nstring\t\"line
  18.  
  19.  

Using this should allow me to parse quoted strings, comments etc.