Recent

Author Topic: Semi-OT: Question about operator precedence parsing  (Read 1918 times)

dtoffe

  • Jr. Member
  • **
  • Posts: 55
Semi-OT: Question about operator precedence parsing
« on: July 12, 2020, 06:01:34 pm »
    Sorry for this off-topic, I could not find an answer elsewhere and I know there is people here with vast knowledge about compilers lore.
    I'm following the LLVM Kaleidoscope tutorial which uses a recursive descent parser and where operator precedence is used to parse expressions. Jumping links from the tutorial I landed in this article (by Crockford of Javascript fame):

http://crockford.com/javascript/tdop/tdop.html

    In the third paragraph he claims about top down operator precedence that "Its use in a static, procedural language would be considerably more difficult." (than in a dynamic one), but omits to explain why.
    I fail to understand why this is so, there are two things I can guess, one is the existence of alternative OP algorithms like shunting yard or climbing which could probably be better suited (I have to read about them yet), the other is that there could be difficulties in synthesizing the correct type as the parsing of the expression proceeds.
    The tutorial uses double precision floating point as the single data type, other than that, operator precedence seems to work fine and simple enough. Any hint ???

Thanks,
Daniel

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Semi-OT: Question about operator precedence parsing
« Reply #1 on: July 14, 2020, 10:40:09 pm »
Pratt's paper is at https://web.archive.org/web/20151223215421/http://hall.org.ua/halls/wizzard/pdf/Vaughan.Pratt.TDOP.pdf (via the Wp "Pratt parser" article).

It's certainly an interesting question, and I'm afraid that I can't easily answer it. I actually read that paper a few weeks ago, as part of a long-term project in which I'm trying to get to grips with the Stanford politics, which work something like this:

* Stanford U. (possibly also SLAC) were committed to ALGOL.

* John McCarthy arrived, and despite being committed to ALGOL in its early days now promoted LISP- which the mainframes would not run effectively.

* Wirth left Stanford for Europe. Knuth's position was already unassailable. Everybody else wanted to be nice to their new head of department.

I do like Crockford's turn of phrase in that para, even if I am a bit dubious about such a sweeping statement with no supporting argument.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Semi-OT: Question about operator precedence parsing
« Reply #2 on: July 16, 2020, 08:39:46 am »
Also https://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/ (via https://news.ycombinator.com/item?id=2344837).

I suspect that Crockford's comment relating to "Its use in a ... language" relates to the implementation language, not the language being parsed.

MarkMLl



« Last Edit: July 16, 2020, 08:42:45 am by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Leledumbo

  • Hero Member
  • *****
  • Posts: 8746
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Semi-OT: Question about operator precedence parsing
« Reply #3 on: July 16, 2020, 12:47:41 pm »
I've heard of Pratt Parser before but never look into it. Here lies the full paper for those who don't have access to the ACM article. From a quick glance, I don't see where "Its use in a static, procedural language would be considerably more difficult." holds. The link that @MarkMLI gave shows a simple implementation in Python, which to my eyes, don't differ much from precedence climbing. Here's an example:
Code: Pascal  [Select][+][-]
  1. function ParsePrimary: PNode;
  2. begin
  3.   if Expected([tkInteger]) then begin
  4.     case Token.Kind of
  5.       tkInteger: begin
  6.         ParsePrimary := IntegerNode(Token,GetIntegerValue(Token));
  7.         NextToken;
  8.       end;
  9.     end;
  10.   end;
  11. end;
  12.  
  13. function ParseExpr(const MinPrec: Byte): PNode;
  14. var
  15.   OpTok: TToken;
  16.   Opr: TOperator;
  17.   RHS: PNode;
  18.   NextMinPrec: Byte;
  19. begin
  20.   ParseExpr := ParsePrimary;
  21.   while  OperatorMap[Token.Kind].IsBinOp
  22.     and (OperatorMap[Token.Kind].Prec >= MinPrec)
  23.   do begin
  24.     OpTok := Token;
  25.     Opr := TokenKindToOperator(OpTok.Kind);
  26.     NextMinPrec := OperatorMap[OpTok.Kind].Prec;
  27.  
  28.     NextToken;
  29.     if OperatorMap[Token.Kind].Assoc = ascLeft then Inc(NextMinPrec);
  30.  
  31.     RHS := ParseExpr(NextMinPrec);
  32.     ParseExpr := BinaryExprNode(OpTok,Opr,ParseExpr,RHS);
  33.   end;
  34. end;
  35.  
Codewise, it just delegates certain properties like the "binding power" where here I use a map while the tdop binds it in each token.

Ah, just found it from the same web: https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing. It says:
Quote
Update (2016-11-02): Andy Chu notes that precedence climbing and TDOP are pretty much the same algorithm, formulated a bit differently. I tend to agree, and also note that Shunting Yard is again the same algorithm, except that the explicit recursion is replaced by a stack.
so the argument is simply invalid, as I have proven to be easily implementible (is this even a valid english word?).

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Semi-OT: Question about operator precedence parsing
« Reply #4 on: July 16, 2020, 01:27:55 pm »
so the argument is simply invalid, as I have proven to be easily implementible (is this even a valid english word?).

If s/ible/able/ then yes, it is :-)

The way I read it is that he's effectively adding local facilities to read ahead by some arbitrary number of tokens... and he's potentially scattering those facilities around all over the place. I suppose that's OK if you're embedding the (definition of the) syntax you're processing into a fairly diffuse codebase, but from my POV it's basically /messy/.

The reason that I say it's messy from my POV is that I'm used to using an enhanced variant of BNF which embeds the items to be emitted, so the entire syntax definition is in ONE PLACE: in contrast Crockford et al. appear to be assuming that you start off with BNF and then splurge it into code when in fact that's not necessary... at least for a static language.

I accept that things might be different when considering fully extensible languages, but there aren't many of those... Perl6 comes to mind and that's about all that's in the mainstream (for some particularly sluggish mainstream :-)

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Semi-OT: Question about operator precedence parsing
« Reply #5 on: July 16, 2020, 01:30:41 pm »
Just FYI, I found most explanations about Pratt parsing not particularly easy to understand with one exception, which is, https://dev.to/jrop/pratt-parsing  the article's author walks the reader to how Pratt parsing works and, after that, it's much easier to see and understand all the details exposed in more formal treatments.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8746
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Semi-OT: Question about operator precedence parsing
« Reply #6 on: July 17, 2020, 09:28:44 am »
The way I read it is that he's effectively adding local facilities to read ahead by some arbitrary number of tokens... and he's potentially scattering those facilities around all over the place. I suppose that's OK if you're embedding the (definition of the) syntax you're processing into a fairly diffuse codebase, but from my POV it's basically /messy/.

The reason that I say it's messy from my POV is that I'm used to using an enhanced variant of BNF which embeds the items to be emitted, so the entire syntax definition is in ONE PLACE: in contrast Crockford et al. appear to be assuming that you start off with BNF and then splurge it into code when in fact that's not necessary... at least for a static language.
I see, attribute grammar as used by some parser generators? Well, if being able to see the syntax definition alongside its semantic action is a necessity, then surely it will look messy. But for me debugging this messy code is way easier than a neat looking grammar, due to its kind of "natural" parser movement.

JdeHaan

  • Full Member
  • ***
  • Posts: 115
Re: Semi-OT: Question about operator precedence parsing
« Reply #7 on: July 17, 2020, 07:21:19 pm »
I stumbled upon Pratt (or operator precedence) parsing via this (chapter 17):

http://craftinginterpreters.com/a-bytecode-virtual-machine.html

and

http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/

He explains it very well, and I managed to translate the c-code (from the 1st link) to Free Pascal. I also added a view challenges, e.g. usage of interpolated strings, such as:

var drink = "tea";
var steep = 4;
var cool = 2;
print "Yes, $(drink) will be ready in $(steep + cool) minutes.";

and really cool, added anonymous (lambda) functions, so that you can do the following:

// calculates the integral of a function over the interval a..b in n steps

fun integral(f, a, b, n) {
  var sum = 0;
  var dt = (b-a)/n;

  for (var x = 0; x<n; x=x+1) {
    sum = sum + f(a+(x+0.5)*dt);
  }

  return sum*dt;
}

// using arrow notation
print integral(fun(x)->x*x-2*x+4, 0, 1, 10000);
print integral(fun(x)->sqrt(1-x*x), -1, 1, 10000);  // 1/2 PI

Attached the pascal code + tests

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Semi-OT: Question about operator precedence parsing
« Reply #8 on: July 17, 2020, 07:55:36 pm »
I stumbled upon Pratt (or operator precedence) parsing via this (chapter 17):
Thank you for the references and the code.  He does explain Pratt expression parsing very well.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

dtoffe

  • Jr. Member
  • **
  • Posts: 55
Re: Semi-OT: Question about operator precedence parsing
« Reply #9 on: July 18, 2020, 10:38:55 pm »
    Wow, lots of wonderful references, thanks to all of you for your help. I finally found this refutation:

https://www.oilshell.org/blog/2016/11/05.html

    Besides, in the same site there is another interesting article arguing the same point made by Leledumbo about both Pratt and climbing being almost the same algorithm only with implementation differences:

https://www.oilshell.org/blog/2016/11/01.html

Thanks !!!!

Daniel

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Semi-OT: Question about operator precedence parsing
« Reply #10 on: July 19, 2020, 10:28:04 pm »
    Wow, lots of wonderful references, thanks to all of you for your help. I finally found this refutation:

https://www.oilshell.org/blog/2016/11/05.html

I think there's a fallacy in there. The statement "There's nothing about Pratt's technique that makes it more suited for a dynamic language." does not refute "Pratt's technique, applied to the compiler for a dynamic language, has advantages over pure recursive descent".

I don't think that Crockford implied- or at least he didn't mean to imply- that Pratt parsing had either advantages or disadvantages for non-dynamic languages. I think that he intended to imply that if implementing a dynamic language then Pratt parsing had advantages.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018