Recent

Author Topic: Making the semicolon useless  (Read 24813 times)

circular

  • Hero Member
  • *****
  • Posts: 4195
    • Personal webpage
Re: Making the semicolon useless
« Reply #105 on: April 14, 2020, 12:13:10 pm »
Here is a somewhat exaggerated example from the project I'm working on, but it illustrates the issue very well:
Code: Text  [Select][+][-]
  1. func leapy(y:int):bool
  2. do
  3.   leapy =        //last token "=" expects something after
  4.     (            //token expects something after
  5.       y % 4 = 0
  6.     and          //token expects something before and after
  7.       y % 100 <> 0
  8.     )            //token expects something before
  9.     or           //token expects something before and after
  10.     (            //token expects something after
  11.       y % 400 = 0
  12.     );           //first token ")" expects something before
  13. end;
In fact this expression is not a problem. It depends a bit of the rules using to continue lines, but using the rules I proposed before, it would work, because there are tokens that either need something afterwards or something before. I have put each case as comments in the code.
Conscience is the debugger of the mind

munair

  • Hero Member
  • *****
  • Posts: 798
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Making the semicolon useless
« Reply #106 on: April 14, 2020, 12:23:06 pm »
@Munair,

An unrelated question, I see your function uses the function name to assign it a return value.  Do you plan to implement something like "result" in FPC in your compiler to optionally use that instead of the function's name ?

That is a good question and after yesterday's discussion I am seriously thinking about it. But it might have a somewhat different form, a 'variable' name that makes it even more obvious that it belongs to the function, maybe something like 'funcr'. A name like 'result' is still too general for my taste.

Of course, 'return' is also supported, but that would generate the extra jump instruction.
« Last Edit: April 14, 2020, 12:37:08 pm by Munair »
keep it simple

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Making the semicolon useless
« Reply #107 on: April 14, 2020, 12:28:42 pm »
What do you think of C++/M2 return <x>?

Saves on typing, but isn't a pseudo function like syntax. (IOW syntax that isn't used anywhere else)

p.s. keep in mind that Pascal parsers generally only have one token lookahead.

munair

  • Hero Member
  • *****
  • Posts: 798
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Making the semicolon useless
« Reply #108 on: April 14, 2020, 12:29:58 pm »
Here is a somewhat exaggerated example from the project I'm working on, but it illustrates the issue very well:
Code: Text  [Select][+][-]
  1. func leapy(y:int):bool
  2. do
  3.   leapy =        //last token "=" expects something after
  4.     (            //token expects something after
  5.       y % 4 = 0
  6.     and          //token expects something before and after
  7.       y % 100 <> 0
  8.     )            //token expects something before
  9.     or           //token expects something before and after
  10.     (            //token expects something after
  11.       y % 400 = 0
  12.     );           //first token ")" expects something before
  13. end;
In fact this expression is not a problem. It depends a bit of the rules using to continue lines, but using the rules I proposed before, it would work, because there are tokens that either need something afterwards or something before. I have put each case as comments in the code.

I invite you to develop a compiler that supports the rules you suggest. Do you have any idea how much more complicated and *less* efficient an expression parser becomes having to consider line endings? Not to mention the lexer!

Why make it so much more complicated while a statement terminator makes the statement unambiguous and allows for much more accurate error reporting?
keep it simple

munair

  • Hero Member
  • *****
  • Posts: 798
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Making the semicolon useless
« Reply #109 on: April 14, 2020, 12:32:47 pm »
What do you think of C++/M2 return <x>?

Saves on typing, but isn't a pseudo function like syntax. (IOW syntax that isn't used anywhere else)

p.s. keep in mind that Pascal parsers generally only have one token lookahead.
Great suggestion.

BTW, SharpBASIC also has only one look-ahead character.
keep it simple

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Making the semicolon useless
« Reply #110 on: April 14, 2020, 12:43:17 pm »
BTW, SharpBASIC also has only one look-ahead character.

But does it require line continuations ? Because then the line separator acts as a semicolon in pascal, so that would not be comparable.

munair

  • Hero Member
  • *****
  • Posts: 798
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Making the semicolon useless
« Reply #111 on: April 14, 2020, 12:54:34 pm »
BTW, SharpBASIC also has only one look-ahead character.

But does it require line continuations ? Because then the line separator acts as a semicolon in pascal, so that would not be comparable.

No! Line endings are considered garbage by the lexer and line continuations such as BASIC's underscore would generate a syntax error. For example, the function assignment in leapy would be lexed as:

Code: Text  [Select][+][-]
  1.  1000         BOI           CHARACTER     BOI
  2.  3006         leapy         LITERAL       IDENTIFIER
  3.  4001         =             OPERATOR      =
  4.  5006         (             SYMBOL        (
  5.  3006         y             LITERAL       IDENTIFIER
  6.  4009         %             OPERATOR      %
  7.  3002         4             LITERAL       INTEGER
  8.  4001         =             OPERATOR      =
  9.  3002         0             LITERAL       INTEGER
  10.  6001         and           KEYWORD       and
  11.  3006         y             LITERAL       IDENTIFIER
  12.  4009         %             OPERATOR      %
  13.  3002         100           LITERAL       INTEGER
  14.  4002         <>            OPERATOR      <>
  15.  3002         0             LITERAL       INTEGER
  16.  5009         )             SYMBOL        )
  17.  6022         or            KEYWORD       or
  18.  5006         (             SYMBOL        (
  19.  3006         y             LITERAL       IDENTIFIER
  20.  4009         %             OPERATOR      %
  21.  3002         400           LITERAL       INTEGER
  22.  4001         =             OPERATOR      =
  23.  3002         0             LITERAL       INTEGER
  24.  5009         )             SYMBOL        )
  25.  5010         ;             SYMBOL        ;
  26.  1001         EOI           CHARACTER     EOI

whereby BOI=beginning of input and EOI = end of input. Spaces, tabs, line feeds, line comments and block comments are filtered out.
« Last Edit: April 14, 2020, 01:12:48 pm by Munair »
keep it simple

circular

  • Hero Member
  • *****
  • Posts: 4195
    • Personal webpage
Re: Making the semicolon useless
« Reply #112 on: April 14, 2020, 06:35:37 pm »
I invite you to develop a compiler that supports the rules you suggest. Do you have any idea how much more complicated and *less* efficient an expression parser becomes having to consider line endings? Not to mention the lexer!
That might be the case, though your example doesn't work, and you're not admitting it.  :P
Conscience is the debugger of the mind

circular

  • Hero Member
  • *****
  • Posts: 4195
    • Personal webpage
Re: Making the semicolon useless
« Reply #113 on: April 25, 2020, 01:37:22 pm »
I had a look at FPC compiler and wow it would need a lot of refactoring.

Things that surprised me for example:
- global variables in scanner units are used to store values that would rather be inside the scanner class.
- other units accessing characters (stored in global variable c) directly instead of using token.
Conscience is the debugger of the mind

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Making the semicolon useless
« Reply #114 on: April 25, 2020, 02:07:59 pm »
I had a look at FPC compiler and wow it would need a lot of refactoring.
Aside from the things you found.  Removing the semicolon from the Pascal syntax would significantly increase the amount of work the compiler would have to perform in order to emit anything that would be a half-way decent error message.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11383
  • FPC developer.
Re: Making the semicolon useless
« Reply #115 on: April 25, 2020, 02:10:42 pm »
This has been discussed many times. The answer is that these were deliberately left out when OOPing the compiler from 1.0.x to 2.x because they had a measurable impact on speed.

That doesn't mean it couldn't/shouldn't be reevaluated though. Modern processors scale things in relative tight loops up more than other things, so bottlenecks might have changed. But afaik FPK regularly plays with these kind of things.

Actually since a recent FPC get-together I asked, about internal parallelization (threads in the compiler), for which this (global state reduction) would be important too.

But the problem was that that isn't the hard part, the problem to tackle this is in the module system, iow the toplevel bit that handles if units should be recompiled etc, and the way compiling cascades from program to unit to unit etc.

This part is said to be enormously complex with many pitfalls, and has been scheduled for a rewrite for over 15 years now. (originally pushed to "after 2.0", then to "after 2.2" etc etc)
« Last Edit: April 25, 2020, 02:12:23 pm by marcov »

circular

  • Hero Member
  • *****
  • Posts: 4195
    • Personal webpage
Re: Making the semicolon useless
« Reply #116 on: April 25, 2020, 02:48:36 pm »
@marcov

For sure going full OOP would slow down the compiler. Though there are other ways to encapsulate.

In the case of the scanner, for example, many public functions are not actually used elsewhere, and just making one or two virtual functions would probably not have so much effect. Then one could instantiate a derived class that would act as a bridge to do some preprocessing.

The main obstacle I see is rather that direct access to characters would be possible only in a scanner, which would require some refactoring. For example, the assembler scanner is not actually defined as a scanner, but hacks into the current scanner.

Regarding parallel compilation, I imagine that encapsulating everything could be even more work and that some deep refactoring would be needed for thread safety.

@440px

I feel your pessimism.
Conscience is the debugger of the mind

munair

  • Hero Member
  • *****
  • Posts: 798
  • compiler developer @SharpBASIC
    • SharpBASIC
Re: Making the semicolon useless
« Reply #117 on: April 26, 2020, 01:25:15 am »
I invite you to develop a compiler that supports the rules you suggest. Do you have any idea how much more complicated and *less* efficient an expression parser becomes having to consider line endings? Not to mention the lexer!
That might be the case, though your example doesn't work, and you're not admitting it.

Apparently I missed this one. I'm not sure what doesn't work or what there is to admit. My example was to illustrate (as I said literally) "the ability to arrange code in a way that makes it easier to read/understand", which has nothing to do with the question whether or not a parser can swallow it without a semicolon. With enough time almost anything can be done. The question is if it is preferable.
keep it simple

 

TinyPortal © 2005-2018