### Author Topic: Plex start states  (Read 460 times)

#### Anonimista

##### Plex start states
« on: September 19, 2019, 09:52:27 pm »
I can't figure out how start states in Plex (the lexer generator) work. In lex/flex each state would be enclosed in parens, eg

Code: C  [Select][+][-]
1.    <INITIAL>{
2.      "//"              BEGIN(IN_COMMENT);
3.      }
4.      <IN_COMMENT>{
5.      \n      BEGIN(INITIAL);
6.      [^\n]+    // eat comment
7.      "/"       // eat the lone /
8.      }

and the <INITIAL> state is the default. In Plex, BEGIN is replaced with start():

Code: Pascal  [Select][+][-]
1. %start x y
2. %%
3. <x>a    start(y);
4. <y>b    start(x);
5. %%
6. begin
7.   start(x); if yylex=0 then ;
8. end.

start() in lexlib.pas:

Code: Pascal  [Select][+][-]
1. procedure start ( state : Integer );
2.   (* puts the lexical analyzer in the given start state; state=0 denotes
3.      the default start state, other values are user-defined *)

Here is my lexer.l:

Code: Pascal  [Select][+][-]
1. %{
2.     uses lexlib, yacclib;
3. %}
4.
5. %start str
6.
7. digit [0-9]
8. letter [ a-zA-Z]
9.
10. %%
11. {digit}+        writeln(yytext);
12. \"              begin writeln('string start...'); start(str); end;
13. <str>{letter}+  writeln('in string');
14. <str>\"         begin writeln('string end...'); start(0); end;
16.
17. %%
18.
19. begin
20.     yylex();
21. end.
22.

and the example session:

Code: Pascal  [Select][+][-]
1. [jon@test test2]\$ ./lexer
2. 1
3. 1
4.
5. "
6. string start...
7.
8. a
9. in string
10.
11. b
12. in string
13.
14. "
15. string start...
16.
17. ???
21.

I used start(0) but the lexer is stuck in the <str> state. I'm not sure how to put it in the initial state back again.

#### Anonimista

##### Re: Plex start states
« Reply #1 on: September 20, 2019, 10:14:57 pm »
In the mean time I found the original TP lex/yacc archive file which contains several examples including magic.l which demonstrates the use of start states. Using that I now have this code that seems to work:

Code: Pascal  [Select][+][-]
1.
2. %{
3.     {\$mode objfpc}{\$H+}
4.
5.     uses LexLib;
6.
7.     var
8.         strval: string;
9.
10. %}
11.
12. %start initial str
13.
14. %%
15.
16. <initial>[0-9]+     begin
17.                         writeln('integer: ' + yytext);
18.                     end;
19.
20. <initial>do         begin
21.                         writeln('keyword - do');
22.                     end;
23.
24. <initial>loop       begin
25.                         writeln('keyword - loop');
26.                     end;
27.
28. <initial>\"         begin
29.                         strval := '';
30.                         start(str);
31.                     end;
32.
33. <str>\"             begin
34.                         writeln('string: ' + strval);
35.                         start(initial);
36.                     end;
37.
38. <str>\\\"           begin
39.                         strval := strval + '\"';
40.                     end;
41.
42. <str>[^\"]          begin
43.                         strval := strval + yytext;
44.                     end;
45.
47.
48. %%
49.
50. begin
51.     start(initial);
52.     if yylex=0 then ;
53. end.
54.
55.

In the main loop I start with the initial state (start(initial)). When the lexer encounters an " it switches to the str state and keeps adding characters to strval until it encounters the closing " <str>\\\" accounts for escaped " characters within the string. Here is an example session

Code: Pascal  [Select][+][-]
1. [jon@test test2]\$ ./lexer
2. 123
3. integer: 123
4.
5. 123 "abc 456" 789
6. integer: 123
7.  string: abc 456
8.  integer: 789
9.
10. string: "\t line \n line \n\r"
11. string: string: \t line \n line \n\r
12.
13. "\""
14. string: \"
15.
16. "\nstring\t\"line"
17. string: \nstring\t\"line
18.
19.

Using this should allow me to parse quoted strings, comments etc.