Lazarus

Miscellaneous => Other => Topic started by: 440bx on December 16, 2019, 01:20:31 am

Title: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 01:20:31 am
There have recently been a number of posts related to "compiler improvements", that made me wonder what the level of interest is in the forum participants to see a tutorial on the development of a full compiler.

By "full", I mean a compiler that is reasonably full featured, does a "fair" number of optimizations and is self hosting.

By tutorial, I mean produced a step at a time and with lavishly commented code.

I know there are examples out there but, so far, what I've found are either "toy" compilers, that is, languages that lack too many features to be used in the creation of "real" programs or, as in the case of FPC, a full compiler but, not exactly written in a way to teach someone how to write a compiler. 

I first thought about asking the question by conducting a poll but, on second thought, I felt that something "free form" might be better.

All input welcome!
Title: Re: Tutorial on creating a compiler
Post by: lainz on December 16, 2019, 01:33:33 am
I think the best way will be including the binaries of that compiler and having working sources that do something.

And yes I want that kind of tutorial. I found several without compiled binaries to test each step of the language.

If this is done instead of a single PDF I think is better a full repository with intermediate builds ready to test or the sources ready to compile and debug for each stage. Not just the final thing
 But as well the final thing must be shipped.
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 02:11:25 am
I think the best way will be including the binaries of that compiler and having working sources that do something.

And yes I want that kind of tutorial. I found several without compiled binaries to test each step of the language.

If this is done instead of a single PDF I think is better a full repository with intermediate builds ready to test or the sources ready to compile and debug for each stage. Not just the final thing
 But as well the final thing must be shipped.
I agree.  Thank you for the feedback.
Title: Re: Tutorial on creating a compiler
Post by: dtoffe on December 16, 2019, 02:34:14 am
There have recently been a number of posts related to "compiler improvements", that made me wonder what the level of interest is in the forum participants to see a tutorial on the development of a full compiler.

What I have seen is a number of posts related to how "someone else" can improve the compiler by adding syntax changes of the most diverse kind. I don't recall seen posts by people asking how to better implement type checking, code generation or whatever "in its own language and compiler". I don't think one translates well into the other.

I would love to see such a tutorial. But however complete the tutorial, there are many decisions to be made, and the value in the choices not picked will be lost. Will your tutorial use Plex and Pyacc ? If I wanted to see hand coded recursive descent then I'm out of luck.

I hope you see my point.

Best,

Daniel
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 03:42:44 am
What I have seen is a number of posts related to how "someone else" can improve the compiler by adding syntax changes of the most diverse kind. I don't recall seen posts by people asking how to better implement type checking, code generation or whatever "in its own language and compiler". I don't think one translates well into the other.
What you stated above is true.  To me, it translates into the fact that, if the people asking for the things they are asking had some knowledge of grammars, how they are designed and implemented then their requests for improvements would be a bit more "on target" (can't think of a better way to put it.)

I would love to see such a tutorial. But however complete the tutorial, there are many decisions to be made, and the value in the choices not picked will be lost. Will your tutorial use Plex and Pyacc ? If I wanted to see hand coded recursive descent then I'm out of luck.
I know what you're saying.  It's actually quite an understatement to state that many decisions need to be made.

I am not really fond of Lex and Yacc, particularly for a tutorial.  lex and yacc have their place but, they greatly subtract from _exposing_ how a grammar is scanned and parsed.  I would use a hand coded recursive descent parser LL(1), not only because I believe it is educationally better but, also because it ensures the grammar is "simple" and, I strongly believe in what Einstein stated "Everything should be made as simple as possible, but no simpler".

That said, in the language design step, I would definitely use some tool to ensure the language grammar isn't flawed.

I hope you see my point.
I tried and I hope I did. ;)

I appreciate the feedback.  Thank you.
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 16, 2019, 05:36:11 am
There have recently been a number of posts related to "compiler improvements", that made me wonder what the level of interest is in the forum participants to see a tutorial on the development of a full compiler.
By "full", I mean a compiler that is reasonably full featured, does a "fair" number of optimizations and is self hosting.
By tutorial, I mean produced a step at a time and with lavishly commented code.
I know there are examples out there but, so far, what I've found are either "toy" compilers, that is, languages that lack too many features to be used in the creation of "real" programs or, as in the case of FPC, a full compiler but, not exactly written in a way to teach someone how to write a compiler. 
I first thought about asking the question by conducting a poll but, on second thought, I felt that something "free form" might be better.
All input welcome!
Although a very big work, I would like to see such a tutorial.

First, you should indicate one or two modern affordable books as reference about the needed deep theory.

Then, if possible, your tutorial should show the whole process: Preprocessor, Compiler, Assembler, Linker, Loader, and Memory.

Those outrageously priced old books are always cited here because they showed the whole process from toy language source code (text) through useless working binary.

Again, if possible, your tutorial should include at least the types: Integer, Float, String and Date. And Procedures and Functions.

Beyond being imperative procedural, it would be amazing if a minimum usable imperative object-oriented paradigm (OOP) could be added.

The topics for the compiler part could be: Lexical Analyses, Syntax Analyses, Semantic Analyses, Intermediate Code Generation, Error Handling, Exception Handling, Code Optimization, and Target Code Generation.
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 16, 2019, 05:42:48 am
I think the best way will be including the binaries of that compiler and having working sources that do something.
And yes I want that kind of tutorial. I found several without compiled binaries to test each step of the language.
If this is done instead of a single PDF I think is better a full repository with intermediate builds ready to test or the sources ready to compile and debug for each stage. Not just the final thing but as well the final thing must be shipped.
Distinguishing different phases in smaller source code and separated binaries would be a good idea.
Putting explanation in a PDF together with source code and binaries could also be another good idea.
Title: Re: Tutorial on creating a compiler
Post by: ASBzone on December 16, 2019, 05:49:33 am
I would be interested in seeing such a tutorial, and potentially using it, time permitting...
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 16, 2019, 05:51:34 am
There have recently been a number of posts related to "compiler improvements", that made me wonder what the level of interest is in the forum participants to see a tutorial on the development of a full compiler.
What I have seen is a number of posts related to how "someone else" can improve the compiler by adding syntax changes of the most diverse kind. I don't recall seen posts by people asking how to better implement type checking, code generation or whatever "in its own language and compiler". I don't think one translates well into the other.
I would love to see such a tutorial. But however complete the tutorial, there are many decisions to be made, and the value in the choices not picked will be lost. Will your tutorial use Plex and Pyacc ? If I wanted to see hand coded recursive descent then I'm out of luck.
I hope you see my point.
Best,
Daniel
First, I believe this tutorial could help beginners to understand how is a compiler internally and how it is build.
Then, it would improve beginners abilities to to better suggest new features or corrections, or even to provide source code to Free Pascal and Lazarus projects.
At last, but no less important, learning about algorithms, data structures and parsing is always useful.
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 16, 2019, 06:00:13 am
What I have seen is a number of posts related to how "someone else" can improve the compiler by adding syntax changes of the most diverse kind. I don't recall seen posts by people asking how to better implement type checking, code generation or whatever "in its own language and compiler". I don't think one translates well into the other.
What you stated above is true.  To me, it translates into the fact that, if the people asking for the things they are asking had some knowledge of grammars, how they are designed and implemented then their requests for improvements would be a bit more "on target" (can't think of a better way to put it.)
I would love to see such a tutorial. But however complete the tutorial, there are many decisions to be made, and the value in the choices not picked will be lost. Will your tutorial use Plex and Pyacc ? If I wanted to see hand coded recursive descent then I'm out of luck.
I know what you're saying.  It's actually quite an understatement to state that many decisions need to be made.
I am not really fond of Lex and Yacc, particularly for a tutorial.  lex and yacc have their place but, they greatly subtract from _exposing_ how a grammar is scanned and parsed.
+1
Quote
I would use a hand coded recursive descent parser LL(1), not only because I believe it is educationally better but, also because it ensures the grammar is "simple" and, I strongly believe in what Einstein stated "Everything should be made as simple as possible, but no simpler".
Very interesting.
Quote
That said, in the language design step, I would definitely use some tool to ensure the language grammar isn't flawed.
Use or build?
Quote
I hope you see my point.
I tried and I hope I did. ;)
I appreciate the feedback.  Thank you.
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 06:18:22 am
Although a very big work
Yes, it is a lot of work.  IF (note the big IF) I decided to go ahead with it, I would definitely like and appreciate some help.

I would like to see such a tutorial.
Pleased to read that.  (normally, I'd say: pleased to hear it but, I read it not heard it ;) )

First, you should indicate one or two modern affordable books as reference about the needed deep theory.

Those outrageously priced old books are always cited here because they showed the whole process from toy language source code (text) through useless working binary.
Strangely, it doesn't really take much "deep theory".  A recursive descent parser is easy to understand and implement.  As far as book recommendations, the one book I really like, which I have recommended a number of times already is definitely _not_ affordable.

Then, if possible, your tutorial should show the whole process: Preprocessor, Compiler, Assembler, Linker, Loader, and Memory.
I don't have a design at this time.  At this time, I just want to find out if there is a reasonably significant amount of interest in such a tutorial.  That said, I don't see a preprocessor being there.  The reason being that, in my experience, preprocessors tend to make a program more difficult to understand because what the programmer reads isn't what the compiler sees.  IMO, while a preprocessor is desirable in some ways, it creates too many undesirable problems resulting in potentially very hard to understand and maintain code.

Again, if possible, your tutorial should include at least the types: Integer, Float, String and Date. And Procedures and Functions.
Integer, float, records, pointers, procedures, functions, strong typing and inline scopes: definitely. Exception handling: almost certainly (though maybe not in the first implementation.) String and Date: I see them more as features that belong in a library, not in the compiler.  Just the "definitely(s)" add up to a fair amount of work.

Beyond being imperative procedural, it would be amazing if a minimum usable imperative object-oriented paradigm (OOP) could be added.
I would definitely not include any OOP because that can really complicate matters.

The topics for the compiler part could be: Lexical Analyses, Syntax Analyses, Semantic Analyses, Intermediate Code Generation, Error Handling, Exception Handling, Code Optimization, and Target Code Generation.
While I don't have a design at this time, I lean towards error handling implemented the "Turbo Pascal" way, every error stops the compilation.  That implies, the compiler has to be _blazingly_ fast, which is doable.

Code optimization and generation: that's where the real "fun" starts.  I would focus on "hand written assembly like" size code that is reasonably fast.  I really like small executables, that is a _must_.

Thank you for your input. I appreciate it.

ETA:

Use or build?
Use not build.  Compiler-Compilers can be used to ensure the grammar doesn't contain ambiguities and/or conflicts.  To me, that's their primary appeal but, when it comes to build it, hand coded scanners and parsers are it.

ETA[2]

I would be interested in seeing such a tutorial, and potentially using it, time permitting...
Noted.  Thank you for the feedback.
Title: Re: Tutorial on creating a compiler
Post by: trev on December 16, 2019, 06:23:47 am
It's a lot of work, but don't let that stop you if it's an itch you want to scratch. Just don't expect it to alter the perception of those who always expect that someone else will do the work for them if they make enough noise.
Title: Re: Tutorial on creating a compiler
Post by: af0815 on December 16, 2019, 07:00:02 am
The topics for the compiler part could be: Lexical Analyses, Syntax Analyses, Semantic Analyses, Intermediate Code Generation, Error Handling, Exception Handling, Code Optimization, and Target Code Generation.
It would be nice to known where this stages are in the actual compiler (units), for reference purposes. So it is more possible to have a look at the actual sources for study without understanding the whole compiler and to compare this with information from books, how it is working in the wild.   :D
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 07:08:16 am
It would be nice to known where this stages are in the actual compiler (units), for reference purposes. So it is more possible to have a look at the actual sources for study without understanding the whole compiler and to compare this with information from books, how it is working in the wild.   :D
I'm rather pleased to see someone asking for that kind of structure.  Every program (not just a compiler) should be written in such a way that it is very intuitive (if not obvious) to determine in which file/unit a logical "process" (groups of functions and procedures that accomplish a task) is found.

Title: Re: Tutorial on creating a compiler
Post by: JdeHaan on December 16, 2019, 08:57:29 am
I'm interested and willing to learn and help.
Recently, I built an interpreter, but a complete compiler is is smth I did't dare to start.
About parsing techniques, it's worth checking out Pratt-parsing (which I used in my interpreter written in FPC and happy to share).
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 16, 2019, 09:42:26 am
This is not something that can be done by a single tutorial. It would take roughly an academic year to teach the computer science underpinnings (data structures, a suitable implementation language, and the basics of parsing), another to review target hardware and implementation approaches (many of which have turned out to be dead ends), and a third to look at actual code generation and the fundamentals of optimisation. Follow that by another year doing guided intermediate-level research, at the end of which the student might possibly have some appreciation of what he doesn't yet know hence might possibly be some use for something.

I attach one PDF which I have found overwhelmingly useful over the years: it's old, but is one of the best introductions to parsing and naive code generation. After that I'd suggest reviewing https://en.wikipedia.org/wiki/TREE-META (https://en.wikipedia.org/wiki/TREE-META) since it continues the same theme and hints at how one can start optimisation. There's some useful historical links at https://wiki.freepascal.org/Make_your_own_compiler,_interpreter,_parser,_or_expression_analyzer#Ancient_history (https://wiki.freepascal.org/Make_your_own_compiler,_interpreter,_parser,_or_expression_analyzer#Ancient_history) , broadly speaking I suggest that understanding how languages got to their current state and appreciating that there are some "nice features" that could be put in a syntax that would make it unparseable is more useful than knowing every last detail of post-Chomsky linguistic research.

But finally, I do have to emphasise that the hard work of writing a compiler is understanding the low-level stuff: you have to understand the implications of apparently-simple language features (e.g. exceptions) on the implementation and you have to have an intimate understanding of the target architecture.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 11:12:24 am
About parsing techniques, it's worth checking out Pratt-parsing (which I used in my interpreter written in FPC and happy to share).
Thank you for mentioning Pratt parsing.  I was not aware of that method of parsing expressions.  Definitely a parsing technique I'm going to look at closely.




@MarkMLl

I agree that  covering all the aspects you mentioned would result in quite a "tutorial" (if being that extensive can still be called a "tutorial".)

You may not be familiar with "Per Brinch Hansen on Pascal Compilers" but in his book he narrows down the field quite a bit.  Specifically, he identifies a problem and offers one of the large number of possible solutions.  I find that to be a good approach and I would do the same.  Of course, using that approach implies that the reader has a decent programming foundation.  IOW, not for complete beginners.

A reasonably knowledgeable programmer can read/study and understand Brinch Hansen's book in about a month.  Unfortunately, his book is rather pricey at this time.

This is not something that can be done by a single tutorial.
Agree.  Some areas would require a progression from a very simple example to what will be the target implementation.

It would take roughly an academic year to teach the computer science underpinnings (data structures, a suitable implementation language, and the basics of parsing),
I believe that a reasonably knowledgeable programmer can get the necessary knowledge in about a week.  Of course, that does _not_ include becoming familiar with all the common parsers and their implications on a grammar.  In this case, it would just be one (1): Top down recursive descent, which is fairly easy.

another to review target hardware and implementation approaches (many of which have turned out to be dead ends), and a third to look at actual code generation and the fundamentals of optimisation.
Once the compiler deals with hardware and the O/S, real complexity starts.  In this area, the tutorial would present just enough to enable the reader to read the code and understand how things are getting done.  IOW, a fair amount of personal effort from the reader would be required.

Follow that by another year doing guided intermediate-level research, at the end of which the student might possibly have some appreciation of what he doesn't yet know hence might possibly be some use for something.
You're right but, to go beyond what is presented in the tutorial, Google is the student's best friend.

I attach one PDF which I have found overwhelmingly useful over the years: it's old, but is one of the best introductions to parsing and naive code generation.
Thank you.

appreciating that there are some "nice features" that could be put in a syntax that would make it unparseable is more useful than knowing every last detail of post-Chomsky linguistic research.
Yes, there are a lot of "nice features" that someone who is not familiar with language grammars would wish for and would make the grammar either incorrect or downright unparsable.  That's another area that the "student" will have to investigate on his/her own if they are inclined to know more.

But finally, I do have to emphasise that the hard work of writing a compiler is understanding the low-level stuff: you have to understand the implications of apparently-simple language features (e.g. exceptions) on the implementation and you have to have an intimate understanding of the target architecture.
Very true.  That's probably why a lot of compiler books sidestep the issue by creating a virtual CPU and compiling to that instead of a real CPU.

I want to make sure this is clear, I am fully aware that such a tutorial is a lot of work and, to keep it of reasonable size, many topics will not even be mentioned and some will only be explained to the extent that would enable an interested party to understand it fully only after some personal effort on their part.  In some cases, such as for someone who isn't familiar with the CPU's instruction's set and assembly language, a significant personal effort.

Thank you for the feedback.
Title: Re: Tutorial on creating a compiler
Post by: Thaddy on December 16, 2019, 11:22:42 am
The current curricula refer to
Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, Compilers: Principles, Techniques, and Tools, second edition, Pearson New International Edition, 2013, ISBN 9781292024349.

Which basically does the same theory as 40 years ago.

Then again: refer to the search options on this forum, you may be surprised.
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 16, 2019, 11:30:18 am
> You may not be familiar with "Per Brinch Hansen on Pascal Compilers"

I'm not familiar with that specific book but IMO Brinch Hansen is rather an unsung hero.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: Thaddy on December 16, 2019, 11:34:57 am
I'm not familiar with that specific book but IMO Brinch Hansen is rather an unsung hero.

MarkMLl
No, he is not! forgotten...

But you and others in this discussion should warn against "language theology": A computer language is a tool, a means of communication, NOT computer science.
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 16, 2019, 11:44:00 am
@Thaddy: I agree, and I think I did warn against obsessively studying "every last detail of post-Chomsky linguistic research".

Dijkstra had some scathing things to say about computer scientists. In his latter years he had scathing things to say about almost everybody...

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 11:47:43 am
But you and others in this discussion should warn against "language theology": A computer language is a tool, a means of communication, NOT computer science.
Leave theology out of this. Apparently you didn't notice that no one in this thread has made any claims about any language and, whether you believe it or not, writing a compiler is applied computer science.
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 16, 2019, 12:06:34 pm
I believe that a reasonably knowledgeable programmer can get the necessary knowledge in about a week.

Unfortunately there is little reason to believe that the target audience are reasonably knowledgeable programmers.

Please don't get me wrong: they might be vastly knowledgeable and experienced in their own fields. But not necessarily this one.

As an aside, the Meta-2 PDF that I attached was originally published in 1964. When I started using it for teaching in about 1985 I thought that was ancient... and that was 35 years ago. But it's still useful.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: Fred vS on December 16, 2019, 12:58:19 pm
Hello.

You may take a look at mselang, the Pascal compiler of LLVM.

https://github.com/mse-org/mselang

It uses fpc to compile himself.

Fre;D
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 16, 2019, 03:15:00 pm
Although a very big work
Yes, it is a lot of work.  IF (note the big IF) I decided to go ahead with it, I would definitely like and appreciate some help.
If possible, I'd like to contribute.

Quote
I would like to see such a tutorial.
Pleased to read that.  (normally, I'd say: pleased to hear it but, I read it not heard it ;) )

First, you should indicate one or two modern affordable books as reference about the needed deep theory.
Those outrageously priced old books are always cited here because they showed the whole process from toy language source code (text) through useless working binary.
Strangely, it doesn't really take much "deep theory".  A recursive descent parser is easy to understand and implement.  As far as book recommendations, the one book I really like, which I have recommended a number of times already is definitely _not_ affordable.
Then, if possible, your tutorial should show the whole process: Preprocessor, Compiler, Assembler, Linker, Loader, and Memory.
I don't have a design at this time.  At this time, I just want to find out if there is a reasonably significant amount of interest in such a tutorial.  That said, I don't see a preprocessor being there.  The reason being that, in my experience, preprocessors tend to make a program more difficult to understand because what the programmer reads isn't what the compiler sees.  IMO, while a preprocessor is desirable in some ways, it creates too many undesirable problems resulting in potentially very hard to understand and maintain code.
Again, if possible, your tutorial should include at least the types: Integer, Float, String and Date. And Procedures and Functions.
Integer, float, records, pointers, procedures, functions, strong typing and inline scopes: definitely. Exception handling: almost certainly (though maybe not in the first implementation.) String and Date: I see them more as features that belong in a library, not in the compiler.  Just the "definitely(s)" add up to a fair amount of work.
Beyond being imperative procedural, it would be amazing if a minimum usable imperative object-oriented paradigm (OOP) could be added.
I would definitely not include any OOP because that can really complicate matters.
The topics for the compiler part could be: Lexical Analyses, Syntax Analyses, Semantic Analyses, Intermediate Code Generation, Error Handling, Exception Handling, Code Optimization, and Target Code Generation.
While I don't have a design at this time, I lean towards error handling implemented the "Turbo Pascal" way, every error stops the compilation.  That implies, the compiler has to be _blazingly_ fast, which is doable.
Code optimization and generation: that's where the real "fun" starts.  I would focus on "hand written assembly like" size code that is reasonably fast.  I really like small executables, that is a _must_.
Thank you for your input. I appreciate it.

ETA:
Use or build?
Use not build.  Compiler-Compilers can be used to ensure the grammar doesn't contain ambiguities and/or conflicts.  To me, that's their primary appeal but, when it comes to build it, hand coded scanners and parsers are it.

ETA[2]
I would be interested in seeing such a tutorial, and potentially using it, time permitting...
Noted.  Thank you for the feedback.
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 16, 2019, 03:17:21 pm
It would be nice to known where this stages are in the actual compiler (units), for reference purposes. So it is more possible to have a look at the actual sources for study without understanding the whole compiler and to compare this with information from books, how it is working in the wild.   :D
I'm rather pleased to see someone asking for that kind of structure.  Every program (not just a compiler) should be written in such a way that it is very intuitive (if not obvious) to determine in which file/unit a logical "process" (groups of functions and procedures that accomplish a task) is found.
+1
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 16, 2019, 03:45:07 pm
I believe that a reasonably knowledgeable programmer can get the necessary knowledge in about a week.  Of course, that does _not_ include becoming familiar with all the common parsers and their implications on a grammar.  In this case, it would just be one (1): Top down recursive descent, which is fairly easy.

Noting that the uptake of recursive /descent/ (Grau, Waychoff, Schorre et al.) was comparatively slow, possibly because Wirth- acknowledged as having the knack of turning out compilers on demand in the days when this was poorly understood- favoured recursive /ascent/ (Irons et al.).

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: lucamar on December 16, 2019, 05:32:26 pm
FWIW, I'd also like to see this tutorial made. It'd would be invaluable if it were made easy to understand for programmers with little formal training, whatever the level of experience needed.

I've tried to understand quite some books on compiler design and building and while I did learn quite a lot in the end I failed mainly because most (all?) of them seem to assume you're an university student/graduate of some kind and use difficult-to-parse terminology and lots of difficult-to-follow maths. :-[
Title: Re: Tutorial on creating a compiler
Post by: marcov on December 16, 2019, 05:50:11 pm
Download bytecode example compilers like P4..P6 ?
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 16, 2019, 06:22:37 pm
It's not just- in my opinion- that compiler texts assume that you're a graduate with good comprehension of maths. They appear to go out of their way to be obtuse, and to insist that "if you're not one of us (which you can be by studying this text) then you're not good enough to know about this stuff".

I'm not saying that the field is simple, but these days even "unusual" syntaxes like SQL can be parsed with standard tools (Lex, Yacc). I'm not so confident that FORTRAN and COBOL as understood in the 1950s are quite so tractable, and the necessity of considering those might have complicated the field quite a lot.

Provided that you avoid certain pitfalls, you can design a syntax that can be parsed by nice, simple, recursive descent. That's not to say that, for a ten billion-line unit, the runtime will be bearable, and a disproportionate amount of work has gone into improving the efficiency of such extreme cases.

Once you can parse your source, you can either emit a "pseudo-assembler" representation- see the Meta-2 document I contributed earlier- or you can build a parse tree on a statement-by-statement (or function-by-function) etc. basis.

Once you have a parse tree- see the Tree Meta link I gave earlier- you can recognise patterns that might not have been evident in the source and optimise them: for example if two constants are being added you can replace them by the sum at compilation time. From that you can start emitting code, recognising sequences amenable to peephole optimisation.

That does, of course, gloss over a lot of important details. First and foremost is the role of the symbol table: as the parser removes lexemes from the source it needs to record what's known about them, and it needs to refer to that record every time an entity is referred to- at the very least to determine representation (anywhere between a single bit and a string of arbitrary length) but also to resolve more obtuse issues ("do we actually know what type this is yet?").

As I've said, that glosses over a lot. But at this stage of the game I think it's worth trying to work out what really does need to be understood, and what is basically "smoke and mirrors" put in place by a computer science priesthood to demonstrate how smart they are and to perpetuate a closed shop.

If anybody thinks I'm being unfairly rude about computer science practitioners, I'd remind him of some of the things that Dijkstra wrote about electronic engineers :-)

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: lucamar on December 16, 2019, 07:33:03 pm
It's not just- in my opinion- that compiler texts assume that you're a graduate with good comprehension of maths. They appear to go out of their way to be obtuse, and to insist that "if you're not one of us (which you can be by studying this text) then you're not good enough to know about this stuff".

I think that the problem lies in that most texts emphasize the "theory" part, which leads them down an "abstraction" path that lets us, more "simple" folk, mired at almost the very beginning.

A more practical approach, like say Jack Crensahw's in his Let's build a compiler series, resonates better with us because it sounds as if you were just making a "small" tool step by easy-to-understand step, making the "theory" which led to it easier to grok.
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 09:19:36 pm
Unfortunately there is little reason to believe that the target audience are reasonably knowledgeable programmers.
The level of programming knowledge in the forum participants varies a great deal, no doubt about that.

As an aside, the Meta-2 PDF that I attached was originally published in 1964. When I started using it for teaching in about 1985 I thought that was ancient... and that was 35 years ago. But it's still useful.
I haven't read all of it yet but, had a quick look.  While the concepts have been "polished" and expanded over time, those presented in that documented haven't really changed that much.  Thank you for sharing!.



You may take a look at mselang, the Pascal compiler of LLVM.

https://github.com/mse-org/mselang
I'll have a look at it.  Thank you for pointing it out.



If possible, I'd like to contribute.
Definitely possible and most definitely welcome!.



Wirth <snip> favoured recursive /ascent/ (Irons et al.).
I have not used recursive ascent to any significant extent. I'll stick to recursive descent because it is very simple, very easy to explain and visualize and, I happen to know it quite well, as a result, I might be able to provide a clear and simple explanation. A very desirable characteristic in a tutorial. ;)



FWIW, I'd also like to see this tutorial made. It'd would be invaluable if it were made easy to understand for programmers with little formal training, whatever the level of experience needed.
I firmly believe that if someone _really_ understands something, they can explain it  in terms that just about anyone can understand.

One of my favorite things to ask mathematicians is "why does minus times minus equal plus?"... the explanation is _trivial_ if you understand it.   If you want to _understand_ why, watch the videos found at http://www.dimensions-math.org/Dim_tour_E.htm

If memory serves, I believe it was in chapter 5 where they made it obvious.  You'll be amazed at how much math you can learn when the person who explains it actually understands what he is talking about.   If such explanations pique your interest, look for James Tanton on youtube.  He is great.

Also very interesting are the book series, "proof without words" from Roger Nelson and some from James Tanton.  Warning: the Nelsen books are often quite overpriced.  Tanton's stuff is excellent and usually very affordable (a lot of it free on youtube.)

The saddest part is, while schools teach all kinds of stuff, they don't teach their students _how to think_.  Paul Zeitz has an excellent class, description and availability at https://www.thegreatcourses.com/courses/art-and-craft-of-mathematical-problem-solving.html

HTH.



I've tried to understand quite some books on compiler design and building and while I did learn quite a lot in the end I failed mainly because most (all?) of them seem to assume you're an university student/graduate of some kind and use difficult-to-parse terminology and lots of difficult-to-follow maths. :-[
I can relate.  Very often the language, in this case Math, used to "explain" a problem is more difficult to understand than how the solution to the problem is derived.  The minus times minus equal plus is a good example of that.



It's not just- in my opinion- that compiler texts assume that you're a graduate with good comprehension of maths. They appear to go out of their way to be obtuse, and to insist that "if you're not one of us (which you can be by studying this text) then you're not good enough to know about this stuff".
Playing devil's advocate, while I feel the same way, one of the principal reasons for the "obtuse" explanations comes from wanting to be extremely accurate and precise, i.e, leave no room for ambiguity (which is desirable, if not downright required.)  That's often not easy to accomplish and, too often collides with simplicity.

I'm not so confident that FORTRAN and COBOL as understood in the 1950s are quite so tractable, and the necessity of considering those might have complicated the field quite a lot.
The first FORTRAN compiler was a mess and, that is what triggered the thought: "there's got to be a better way of doing this", leading to the definitions of grammars, their characteristics, expressive abilities, etc.



I think that the problem lies in that most texts emphasize the "theory" part,
True.  Again, playing devil's advocate, they focus on explaining the power of a particular concept (in this case grammars) not how to use them.  It is (unfortunately) presumed that once the reader understands the power of a parsing technique and its limitations, that the reader will be able to use that knowledge to apply it correctly, effectively and when appropriate. 

Most people learn best by example and contrast.  Theory books are usually lacking in those areas.


Title: Re: Tutorial on creating a compiler
Post by: kupferstecher on December 16, 2019, 10:41:22 pm
What is your goal actually?
 Encouraging people to write their own compiler? Why?
 Encouraging people to participate in the FPC development?

I'd vote for the latter. But than a rough description how the fpc compiler works (and in which unit to find what part of the compiler) would be much better. I don't know if something like that already exists? I'd be interested.

I know there are examples out there but, so far, what I've found are either "toy" compilers, that is, languages that lack too many features to be used in the creation of "real" programs or, as in the case of FPC, a full compiler but, not exactly written in a way to teach someone how to write a compiler.
A compiler designed to be easy to understand only could be an abstraction of a real compiler, and that naturally results in a "toy" compiler. As you said, there already is literature about compiler development, so you better spend your efforts in a fpc specific description.
(Only my opinion)
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 16, 2019, 11:31:31 pm
Encouraging people to write their own compiler? Why?
No.  One goal is providing people the knowledge to write their own compiler if they so desire.  It should also be noted that, many algorithms used in implementing a compiler can be used to implement tasks that have nothing to do with compilers. Scanning and parsing probably being the most common. 

Encouraging people to participate in the FPC development?
No.  Even if it may not be obvious, just by their presence helping other forum members, people are contributing (though indirectly) to FPC development (there would be no development if there weren't people interested in FPC.)

What is your goal actually?
There isn't a single goal.  One goal, a personal one, is having a compiler I really like and, has all the features I want in a compiler (self hosting, as capable as C - that does NOT mean it would resemble C - and as capable as standard Pascal, syntactically simple, easy to learn - less than a day for an experienced Pascal programmer - lightning fast compilation yet produce fast and _extremely_ small executables that compare well to hand written assembly. 

I figured that IF (note the big IF), I decided to do it then, it could also serve as a tutorial on compiler writing and, force a specific level of simplicity (simplicity is _very_ desirable.) A benefit of not doing it all on my own is that, I may benefit from getting some help in some areas, which would definitely make my life easier.  Also, having a large number of eyeballs who _understand_ what is being done significantly lowers the probability that there may be subtle errors in the compiler.

So,  all the above, among others are "my goal".

But than a rough description how the fpc compiler works (and in which unit to find what part of the compiler) would be much better. I don't know if something like that already exists? I'd be interested.
FPC works like any compiler but, because of its size, which is considerable, it is _not_ a _succinct_ and practical way of learning how to write a compiler.  IOW, it's a forest with a lot of trees and inspecting every tree is cumbersome and impractical.

A compiler designed to be easy to understand only could be an abstraction of a real compiler, and that naturally results in a "toy" compiler.
Not the case at all.  My definition of a good program is a program that is _easy to understand_ and, a compiler is just another program.  If anything, Brinch Hansen proved that in his book "on Pascal compilers".  The main reason his implementation of Pascal could be considered a "toy" language is because he compiles for a virtual CPU, which admittedly is simpler than compiling for a real one but, the concepts are the same though the details certainly aren't.

As you said, there already is literature about compiler development, so you better spend your efforts in a fpc specific description.
(Only my opinion)
I disagree. This would be a tutorial on how to write a compiler not on how FPC is implemented.  That said, anyone who knows how to write a compiler will have a much easier time understanding FPC's implementation than someone who does not.

Title: Re: Tutorial on creating a compiler
Post by: dtoffe on December 17, 2019, 12:44:24 am
I am not really fond of Lex and Yacc, particularly for a tutorial.  lex and yacc have their place but,

It was just an example.

I strongly believe in what Einstein stated "Everything should be made as simple as possible, but no simpler".

You are probably aware that this was the principle after which Wirth got inspiration to design the Oberon language.....

Cheers,

Daniel
Title: Re: Tutorial on creating a compiler
Post by: Edson on December 17, 2019, 01:01:55 am
Hi.

I've written several articles (in spanish) about "Create your compiler from scratch". It uses Lazarus and MASM32 to create a very basic compiler, just for learning purposes. Fist part is here: http://blog.pucp.edu.pe/blog/tito/2019/01/05/crea-tu-propio-compilador-casero-parte-1/
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 17, 2019, 01:09:05 am
It was just an example.
I understand.  The reason I mentioned Lex and Yacc is because I wanted to make it clear that there would be no dependence on compiler construction tools (except initially, but only for prototyping and testing the grammar.)

You are probably aware that this was the principle after which Wirth got inspiration to design the Oberon language.....
I didn't remember that until you mentioned it now.



I've written several articles (in spanish) about "Create your compiler from scratch". It uses Lazarus and MASM32 to create a very basic compiler, just for learning purposes. First part is here: http://blog.pucp.edu.pe/blog/tito/2019/01/05/crea-tu-propio-compilador-casero-parte-1/
I've read them.  Nice work!.  What I have in mind is very much along the lines of what you've done, just taken farther.
Title: Re: Tutorial on creating a compiler
Post by: af0815 on December 17, 2019, 07:10:38 am
Hi.

I've written several articles (in spanish) about "Create your compiler from scratch". It uses Lazarus and MASM32 to create a very basic compiler, just for learning purposes. Fist part is here: http://blog.pucp.edu.pe/blog/tito/2019/01/05/crea-tu-propio-compilador-casero-parte-1/


It is very very good written :-) Thanks for the article. I read it with the google translation (i am not able to read spanish) and is complete understandable.

I a waiting hard for the next article "system function" and so on.
Title: Re: Tutorial on creating a compiler
Post by: julkas on December 17, 2019, 09:27:58 am
Hi.

I've written several articles (in spanish) about "Create your compiler from scratch". It uses Lazarus and MASM32 to create a very basic compiler, just for learning purposes. Fist part is here: http://blog.pucp.edu.pe/blog/tito/2019/01/05/crea-tu-propio-compilador-casero-parte-1/
Good done. Thanks.
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 17, 2019, 09:53:57 am
As an aside, the Meta-2 PDF that I attached was originally published in 1964. When I started using it for teaching in about 1985 I thought that was ancient... and that was 35 years ago. But it's still useful.
I haven't read all of it yet but, had a quick look.  While the concepts have been "polished" and expanded over time, those presented in that documented haven't really changed that much.  Thank you for sharing!.

I've long been reluctant to post that anywhere, on account of having formally got permission to prepare a copy for teaching. However since there's at least two scanned copies floating around- linked to from the Wp article- I feel that I can probably get away with it by now, and I'm fairly confident in the accuracy of my typesetting. I've got software to go with it if anybody's interested, it often gains a few more experimental facilities for Xmas (this year, if I can spare the time, it will be Smalltalk-style keyword parsing).

Alan Kay, the author of Smalltalk and a very talented worker, had this to say: "The paper itself is a wonderful gem which includes a number of excellent examples, including the bootstrapping of Meta II in itself (all this was done on an 8K (six bit bytes) 1401 ! ) "

On a similar vein, I also have a good two-part article on PROLOG and an implementation in Pascal. It was written for something like TP3 and is very tight indeed, I've often wondered whether I could do anything useful with it (there's precedent: some of the network setup stuff in Windows was for a long time written in PROLOG). Timothy Budd in his Leda book has some thoughts on multi-paradigm languages, but apart from the desirability of a parser being able to handle stuff which isn't really ALGOL-derived I think this is outside a tutorial's scope.

Quote
Wirth <snip> favoured recursive /ascent/ (Irons et al.).
I have not used recursive ascent to any significant extent. I'll stick to recursive descent because it is very simple, very easy to explain and visualize and, I happen to know it quite well, as a result, I might be able to provide a clear and simple explanation. A very desirable characteristic in a tutorial. ;)

Again, I'd cite Huub de Beer's historical info linked to from https://wiki.freepascal.org/Make_your_own_compiler,_interpreter,_parser,_or_expression_analyzer#Ancient_history I've no longer got my Wirth books but my recollection is that he started off with ascent but in later versions presented descent as an alternative. Waychoff describes inventing recursive descent as a natural derivative of ALGOL syntax diagrams but later yields precedence to Grau. Wirth's code as found in his Euler compiler is... well, basically one hell of a mess filled with jump tables written in Burroughs ALGOL (which had a language extension for that); if he was still using that sort of technique when he (I suspect hurriedly) constructed Pascal I can sympathise with his not wanting to fix the dangling else problem.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 17, 2019, 10:15:13 am
I've got software to go with it if anybody's interested,
Definitely interested. :)

Alan Kay, the author of Smalltalk and a very talented worker, had this to say: "The paper itself is a wonderful gem which includes a number of excellent examples, including the bootstrapping of Meta II in itself (all this was done on an 8K (six bit bytes) 1401 ! ) "
Considering the source, that is definitely high praise.

On a similar vein, I also have a good two-part article on PROLOG and an implementation in Pascal.
Prolog... that brings back some memories.  I just played with it when Turbo Prolog came out.  It took some effort to "think in Prolog" but, it was worth it. That said, it's been so long, it would probably take me a few days of retraining my brain to think in Prolog again, not to mention re-learning it.



On a different note, judging by the posts in this thread, I estimate a total of 10 participants interested in a compiler writing tutorial so far.  I have to admit, I was hoping for more but, I'm still open to it.

Title: Re: Tutorial on creating a compiler
Post by: marcov on December 17, 2019, 10:18:45 am
On a similar vein, I also have a good two-part article on PROLOG and an implementation in Pascal.

Here you have two that compile with FPC:

https://stackoverflow.com/questions/29908207/how-to-compile-vtprolog-from-source-code

https://stackoverflow.com/questions/29907432/how-to-compile-picoprolog-from-source-code

Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 17, 2019, 10:52:33 am
I've got software to go with it if anybody's interested,
Definitely interested. :)

Zipfile too large for forum, but temporarily at http://www.kdginstruments.co.uk/public/meta2.zip (http://www.kdginstruments.co.uk/public/meta2.zip). This should compile without significant issues, use meta2 --help to get command line, it tries to "do the right thing". Be warned that some of this is seriously old code: it's based on an initial implementation in about '86 with organic improvements since then.

There's some highly-tentative attempts to wrap it in a GUI which didn't really get anywhere, when considering that I was influenced by Visual Parse++ (I no longer have a copy but description available via https://mafiadoc.com/parsing-with-sandstones-visual-parse_5cc7544b097c4722768b4635.html (https://mafiadoc.com/parsing-with-sandstones-visual-parse_5cc7544b097c4722768b4635.html) but not recursive descent). I've also got examples which I've not attached, but having been built up gradually from rules very similar to what's in Schorre's paper they're best supplied individually with discussion.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 17, 2019, 10:56:29 am
https://stackoverflow.com/questions/29908207/how-to-compile-vtprolog-from-source-code

That's the one, although I can't remember if I seriously tackled it with FPC (it was discussed on the ML, and there's an embedded garbage collector which was considered problematic).

PDFs temporarily at https://www.kdginstruments.co.uk/public/vt_prolog1.pdf (https://www.kdginstruments.co.uk/public/vt_prolog1.pdf) https://www.kdginstruments.co.uk/public/vt_prolog2.pdf (https://www.kdginstruments.co.uk/public/vt_prolog2.pdf) in case anybody wants them, they were done for the original authors but were never republished. Still a fairly good read.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 17, 2019, 11:05:43 am
Zipfile too large for forum, but temporarily at http://www.kdginstruments.co.uk/public/meta2.zip (http://www.kdginstruments.co.uk/public/meta2.zip). This should compile without significant issues, use meta2 --help to get command line, it tries to "do the right thing". Be warned that some of this is seriously old code: it's based on an initial implementation in about '86 with organic improvements since then.
Thank you, much appreciated. :)
Title: Re: Tutorial on creating a compiler
Post by: marcov on December 17, 2019, 11:26:38 am
https://stackoverflow.com/questions/29908207/how-to-compile-vtprolog-from-source-code

That's the one, although I can't remember if I seriously tackled it with FPC (it was discussed on the ML, and there's an embedded garbage collector which was considered problematic).

I looked at it, and it seems that i only checked compilation and indeed. I'm not entirely sure, but it seems to mess with the internal info of the heap, something that will only work for certain implementations.
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 17, 2019, 11:36:40 am
Zipfile too large for forum, but temporarily at http://www.kdginstruments.co.uk/public/meta2.zip (http://www.kdginstruments.co.uk/public/meta2.zip). This should compile without significant issues, use meta2 --help to get command line, it tries to "do the right thing". Be warned that some of this is seriously old code: it's based on an initial implementation in about '86 with organic improvements since then.
Thank you, much appreciated. :)

The "T" notation at the end of the Meta2 syntax descriptions are archaic but still useful, T(META2,M2,META2) means that it's a translator from Meta2 source (.mt2) to .m2 executable expressed in Meta2 source. I notice that David Brailsford discusses this on Youtube https://www.youtube.com/watch?v=PjeE8Bc96HY (https://www.youtube.com/watch?v=PjeE8Bc96HY)

I like to think that I'm a bit more use in later life than I was as one of Prof Brailsford's (then merely Dr. David Brailsford) students :-)

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 19, 2019, 10:28:43 am
It's not just- in my opinion- that compiler texts assume that you're a graduate with good comprehension of maths. They appear to go out of their way to be obtuse, and to insist that "if you're not one of us (which you can be by studying this text) then you're not good enough to know about this stuff".
+1
Quote
I'm not saying that the field is simple, but these days even "unusual" syntaxes like SQL can be parsed with standard tools (Lex, Yacc). I'm not so confident that FORTRAN and COBOL as understood in the 1950s are quite so tractable, and the necessity of considering those might have complicated the field quite a lot.

Provided that you avoid certain pitfalls, you can design a syntax that can be parsed by nice, simple, recursive descent. That's not to say that, for a ten billion-line unit, the runtime will be bearable, and a disproportionate amount of work has gone into improving the efficiency of such extreme cases.

Once you can parse your source, you can either emit a "pseudo-assembler" representation- see the Meta-2 document I contributed earlier- or you can build a parse tree on a statement-by-statement (or function-by-function) etc. basis.

Once you have a parse tree- see the Tree Meta link I gave earlier- you can recognise patterns that might not have been evident in the source and optimise them: for example if two constants are being added you can replace them by the sum at compilation time. From that you can start emitting code, recognising sequences amenable to peephole optimisation.

That does, of course, gloss over a lot of important details. First and foremost is the role of the symbol table: as the parser removes lexemes from the source it needs to record what's known about them, and it needs to refer to that record every time an entity is referred to- at the very least to determine representation (anywhere between a single bit and a string of arbitrary length) but also to resolve more obtuse issues ("do we actually know what type this is yet?").

As I've said, that glosses over a lot. But at this stage of the game I think it's worth trying to work out what really does need to be understood, and what is basically "smoke and mirrors" put in place by a computer science priesthood to demonstrate how smart they are and to perpetuate a closed shop.

If anybody thinks I'm being unfairly rude about computer science practitioners, I'd remind him of some of the things that Dijkstra wrote about electronic engineers :-)
Some references about that might be helpful.

Just to remind I found this project is the Oberon equivalent of Free Pascal: https://github.com/kekcleader/FreeOberon (https://github.com/kekcleader/FreeOberon) It's even listed by the FSF: https://directory.fsf.org/wiki/Free_Oberon (https://directory.fsf.org/wiki/Free_Oberon)
There's also apparently people working on the operating system aspect of Oberon.
I don't know to what extent these are praiseworthy endeavours, unless they're also actively being used to teach newcomers to the profession (which for a long time was the objective of MINIX, as one particular example). Oberon (the language) dates back to when people looked at Ada and considered it too big and complex, but by today's standards 1980s Ada is probably rather svelte and feature-deficient.
As a result for a lack of job offers, universities are restricting or even dropping practical disciplines of Compiler and Operational System building.
Changing Compiler Building for only Parsers and Minix for Oberon might be examples of that.

News books are trying to catch up this tendency:
"...book offers a one semester introduction into compiler construction..."
Introduction to Compilers and Language Design, Douglas Thain, 2019, 246pp.
https://www.amazon.com/Introduction-Compilers-Language-Design-Douglas/dp/0359142834/

And we also must consider that current students don't have all that math from the 1960's and 1970's any more...
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 19, 2019, 10:31:04 am
It's not just- in my opinion- that compiler texts assume that you're a graduate with good comprehension of maths. They appear to go out of their way to be obtuse, and to insist that "if you're not one of us (which you can be by studying this text) then you're not good enough to know about this stuff".
I think that the problem lies in that most texts emphasize the "theory" part, which leads them down an "abstraction" path that lets us, more "simple" folk, mired at almost the very beginning.
A more practical approach, like say Jack Crensahw's in his Let's build a compiler series, resonates better with us because it sounds as if you were just making a "small" tool step by easy-to-understand step, making the "theory" which led to it easier to grok.
+1
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 19, 2019, 10:41:59 am
Wirth <snip> favoured recursive /ascent/ (Irons et al.).
I have not used recursive ascent to any significant extent. I'll stick to recursive descent because it is very simple, very easy to explain and visualize and, I happen to know it quite well, as a result, I might be able to provide a clear and simple explanation. A very desirable characteristic in a tutorial. ;)
+1

Quote
I think that the problem lies in that most texts emphasize the "theory" part,
True.  Again, playing devil's advocate, they focus on explaining the power of a particular concept (in this case grammars) not how to use them.  It is (unfortunately) presumed that once the reader understands the power of a parsing technique and its limitations, that the reader will be able to use that knowledge to apply it correctly, effectively and when appropriate. 
Most people learn best by example and contrast.  Theory books are usually lacking in those areas.~
+1
Title: Re: Tutorial on creating a compiler
Post by: Thaddy on December 19, 2019, 11:00:17 am
+1
An accumulated -2
Theory is what underlies all application.
Learning by example is a good thing, I agree, but at some point you will need to understand theory first.
Especially with any serious compiler construction.
(A possible example is FPC itself, during the transition from the 0.9.x series to the - based on solid fundamentals - 1.9.+ series )
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 19, 2019, 11:10:10 am
As I've said, that glosses over a lot. But at this stage of the game I think it's worth trying to work out what really does need to be understood, and what is basically "smoke and mirrors" put in place by a computer science priesthood to demonstrate how smart they are and to perpetuate a closed shop.

Quote
If anybody thinks I'm being unfairly rude about computer science practitioners, I'd remind him of some of the things that Dijkstra wrote about electronic engineers :-)
Some references about that might be helpful.

https://www.cs.utexas.edu/users/EWD/

He was one of the pioneers of robust system design, but (in my opinion) ended up with a massive chip on his shoulder due to his conviction that IBM computers were inferior (in part because one of his former students working for them hadn't applied what he'd been taught) and that his research group wasn't getting the support and recognition it deserved.

However one has to be cautious and read things in context. He is best known to the general population for this pronouncement:

Quote
FORTRAN —"the infantile disorder"—, by now nearly 20 years old, is hopelessly inadequate for whatever computer application you have in mind today: it is now too clumsy, too risky, and too expensive to use.

PL/I —"the fatal disease"— belongs more to the problem set than to the solution set.

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.

The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence.

APL is a mistake, carried through to perfection. It is the language of the future for the programming techniques of the past: it creates a new generation of coding bums.

But if you read the entire original http://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EWD498.html (http://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EWD498.html) you'll see that he was intentionally being provocative (and personally I suspect that he might have been smoking something :-)

Besides which, his comment on APL was pretty much bang on, and the "coding bums" term came from John McCarthy.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 19, 2019, 11:36:59 am
(and personally I suspect that he might have been smoking something :-)
Smoking stuff that develops an individual's sense of humor is very healthy. :) 

That's definitely better than indulging in mind-altering chemicals which cause their consumers to believe they can fly or that C is a programming language.

Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 19, 2019, 11:41:47 am
+1
An accumulated -2
Theory is what underlies all application.
Learning by example is a good thing, I agree, but at some point you will need to understand theory first.
Especially with any serious compiler construction.
(A possible example is FPC itself, during the transition from the 0.9.x series to the - based on solid fundamentals - 1.9.+ series )
As a former professor myself I understand what you mean by theory and solid base, but that is a reality for a very limited number of universities...

http://factsmaps.com/pisa-2018-worldwide-ranking-average-score-of-mathematics-science-reading/
https://qz.com/1759506/pisa-2018-results-the-best-and-worst-students-in-the-world/
"The OECD is trying to change the test to be about more than academics, in part to encourage countries to view education beyond traditional subjects. In the latest test, it assessed global competency, asking students to express how they relate to others and what they think of their lives and their future; in the next test, in 2021, it will assess creative thinking."
Title: Re: Tutorial on creating a compiler
Post by: lucamar on December 19, 2019, 03:12:24 pm
An accumulated -2
Theory is what underlies all application.
Learning by example is a good thing, I agree, but at some point you will need to understand theory first.
Especially with any serious compiler construction.

What I was talking about was of over-emphasizing the theory instead of first stablishing a base of, let's us call it, "facts" mingled with a little history of how those results were achieved. Somewhat like how physics are taught: Newton and his apple, Galileo launching iron balls down the Tower of Pisa, etc.

Granted that it's usually at university level (i.e. relatively advanced), but when you start learning "computer science" you're frequently thrown in the deepest part of a storming sea from which you can see no land ahead. If it's difficult for students, imagine then those trying on their own. I know you have a solid formal education in this field, but imagine trying to understand, say, Knuth's "bibles" without it; it's maybe not impossible ... but very close to it.
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 19, 2019, 03:27:13 pm
I'd suggest that too often the practical implementation of parsing is assumed to require understanding of the theory of (computer) language design, and computer language design is assumed to require understanding of post-Chomsky linguistics.

There has to be a better way, even if at the undergraduate level rules of thumb are applied ("don't do /this/ in your syntax design since it's difficult to resolve" rather than "let's spend a year studying ambiguous syntaxes").

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 20, 2019, 07:42:08 am
+1
An accumulated -2
Theory is what underlies all application.
Learning by example is a good thing, I agree, but at some point you will need to understand theory first.
Especially with any serious compiler construction.
(A possible example is FPC itself, during the transition from the 0.9.x series to the - based on solid fundamentals - 1.9.+ series )
As a former professor myself I understand what you mean by theory and solid base, but that is a reality for a very limited number of universities...

http://factsmaps.com/pisa-2018-worldwide-ranking-average-score-of-mathematics-science-reading/
https://qz.com/1759506/pisa-2018-results-the-best-and-worst-students-in-the-world/
"The OECD is trying to change the test to be about more than academics, in part to encourage countries to view education beyond traditional subjects. In the latest test, it assessed global competency, asking students to express how they relate to others and what they think of their lives and their future; in the next test, in 2021, it will assess creative thinking."
It seems that what I said is known for a long time:
-----------------------
A Modern Approach to Teaching Computer Translation Manuel E. Bermudez, University of Florida, Gainesville, FL, USA
ACM Computing Curricula, Final Draft, December 15, 2001, www.computer.org/education/cc2001/final
https://pdfs.semanticscholar.org/699c/a65ee0f28b85e1301e11ad9a59d4adac760d.pdf

II. THE DECLINE OF THE COMPILER COURSE

No one questions that the compilers course is less central to the Computer Science curriculum today, than it was some years ago. There are many reasons for this. First, there is the maturity of the compiler discipline itself. Compiler construction techniques have evolved at a much slower pace in recent years, and Computer Science curricula tend to give higher importance to emerging technologies. The traditional compilers course was a great opportunity for exposing the student to a wide variety of data structures: trees, symbol tables, graphs, etc., but those topics are now considered sufficiently covered in basic courses such as Data Structures.
Another reason for the decline is the proliferation of languages and paradigms in recent years, prompting CS departments, curriculum experts, and textbook writers to focus their efforts on issues of design and implementation of programming languages, rather than the implementation techniques traditionally covered in a compilers course. As a result, CS curricula today are much more likely to require a Programming Languages course, than a Compilers course. Yet another reason is that CS curricula are more aware now that few CS graduates will make a living either writing or maintaining compilers, and the skill-set required is considered highly specialized. Similarly, fewer faculty make compilers their main area of research, and as new, exciting topics in Computer Science have made their appearance, such as computer networking and issues related to the WWW, the number of faculty willing to teach the compilers course has diminished.
Finally, there is the matter of trends in CS Curriculum. During discussions leading to the 2001 ACM Computer Science Curriculum, Professor Mary Shaw of Carnegie-Mellon University said, “Let’s organize our courses around ideas rather than around artifacts ... Engineering schools don’t teach boiler design – they teach thermodynamics. Yet two of the mainstay software courses – compiler construction and operating systems – are system-artifact dinosaurs” [1]. The 2001 ACM-recommended curriculum addressed this problem by relegating the compilers course to the status of “advanced/supplemental” (i.e. elective) material.
-----------------------
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 20, 2019, 09:28:03 am
Quote
Another reason for the decline is the proliferation of languages and paradigms in recent years, prompting CS departments, curriculum experts, and textbook writers to focus their efforts on issues of design and implementation of programming languages, rather than the implementation techniques traditionally covered in a compilers course.

Also, I'd suggest, that language designers now understand what type of syntax is likely to cause compilers problems, and to avoid it where possible.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: marcov on December 20, 2019, 11:38:07 am
+1
An accumulated -2
Theory is what underlies all application.
Learning by example is a good thing, I agree, but at some point you will need to understand theory first.
Especially with any serious compiler construction.
(A possible example is FPC itself, during the transition from the 0.9.x series to the - based on solid fundamentals - 1.9.+ series )

Your example is counter to your statement. The question is if you would have succeed if you could have skipped 10 years of 1.0 series, and gone directly to 2.0 series.

Most of the theory is way to parsing centric, and often table based. If you make a recursive descent compiler, like Free Pascal and most Wirthian language are, reams of theory is irrelevant as you are not going to work with a grammar based compiler generator.

Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 20, 2019, 12:31:14 pm
Most of the theory is way to parsing centric, and often table based.
Definitely true and, rather unfortunately, very little is said about the structure of the languages that require more sophisticated parsers.  Generally, the more sophisticated the required parser, the more likely the resulting language is hard to _humanly_ understand.

If you make a recursive descent compiler, like Free Pascal and most Wirthian language are, reams of theory is irrelevant as you are not going to work with a grammar based compiler generator.
One thing lex and yacc type tools are good for is to ensure the grammar is free of ambiguities (the tools will complain if they aren't.)  I agree that "reams of theory" are unlikely to be useful but, still, even with in a recursive descent parser, it is important to understand basic compiler construction and grammars theory.
Title: Re: Tutorial on creating a compiler
Post by: marcov on December 20, 2019, 12:50:35 pm
One thing lex and yacc type tools are good for is to ensure the grammar is free of ambiguities (the tools will complain if they aren't.) 

Yeah, but if your grammar is not 100% sure the same as your implemented grammar, you get, euh, Delphi :-)
(the grammar in the Delphi manual was quite infamous for not being the same as the compiler behaviour)

Quote
I agree that "reams of theory" are unlikely to be useful but, still, even with in a recursive descent parser, it is important to understand basic compiler construction and grammars theory.

True. But even the useful part of gets drowned in details that only a LALR parser generator builder needs to know. Not even their user, specially since more modern tools have their own workarounds to disambiguate grammars, and don't need as many of the old refactoring tricks.

But there are no doubt books with LL(1) languages and RD as subject. I would just not advise to start with the heaviest category. You need to acquire quite a lot jargon to make heads or tails of it, and without classes and labwork to gently nudge that into you it will be hard.

Start slow and limit your scope, and after toying a while upgrade to heavier texts. Don't imagine for a second that the first thing you write is the definitive one.
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 21, 2019, 08:13:30 am
Interesting articles:

Want to Write a Compiler? Just Read These Two Papers
https://prog21.dadgum.com/30.html
    https://compilers.iecc.com/crenshaw/
    http://home.iae.nl/users/mhx/crenshaw/tiny.html
    https://cs.indiana.edu/~dyb/pubs/nano-jfp.pdf
    https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools

A Simple Compiler
http://www.semware.com/html/compiler.html
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 21, 2019, 09:30:04 am
Interesting articles:
The problem with those and all the tutorials I've seen so far is that they really leave too much out.  After reading any of those tutorials the reader will not be able to write a "real" compiler.  By "real" compiler I mean a compiler that can compile itself and generates reasonably decent code in a reasonable amount of time.

IOW, the tutorials are too basic.  They don't analyze the trade offs a compiler writer has to make.  The "heavy duty" compiler construction books drown the reader in theory about grammars, parser generators, NFAs to DFAs and other things I don't even care to remember.  After reading all that stuff, even if the reader understood it all, he/she is nowhere near writing a decent compiler.

In spite of the above, I am grateful to the authors for their efforts.  They deserve and earned due credit for their work.

The result in both cases leave a lot to be desired.  For instance (from the thread "ridiculously slow"):
Quote
Empty inner loop:
                       un-optimized         optimized
FPC:                      121 s                   109 s   
Mingw-gcc:            76  s                     27  s
The above are the timing results for a nested loop that does absolutely _nothing_.  It is inexcusable that a compiler with optimizations turned on, did not eliminate what is basically a fancy noop.   Just for the record, I compiled that nested loop with VC++ and the time was _zero_ (when optimization was turned on), which is what it should be.  The un-optimized version generated code that is basically identical to FPC's, because of that, I didn't bother to time it, the time should be about the same.


Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 21, 2019, 12:19:25 pm
IOW, the tutorials are too basic.  They don't analyze the trade offs a compiler writer has to make.  The "heavy duty" compiler construction books drown the reader in theory about grammars, parser generators, NFAs to DFAs and other things I don't even care to remember.  After reading all that stuff, even if the reader understood it all, he/she is nowhere near writing a decent compiler.
...
                       un-optimized         optimized
FPC:                      121 s                   109 s   
Mingw-gcc:            76  s                     27  s

But a high-performance compiler can be built up incrementally: GCC didn't spring full-formed from Stallman's brow, and as everybody knows he started off by trying to use Pastel ("an off-color Pascal") as its foundation.

I'd suggest that it boils down to questions of adequacy and review.

Is the underlying syntactic definition adequate to avoid ambiguities?

Is the language definition adequate to meet its goals (e.g. concurrency support in Ada)?

Does the syntax need to be reviewed in order to support something that has to be in the language?

Is a particular type of parser adequate to parse the syntactic description of the language?

Does the language need to be reviewed because of problems implicit to all types of parser?

Is a particular set of intermediate data structures adequate to represent the parsed language such that target-agnostic optimisations can be applied?

Does the parser need to be reviewed because of symbol table or parse tree problems (e.g. available storage)?

Is a particular final representation adequate to allow target-specific optimisations to be applied?

Does the parse tree or symbol table need to be reviewed because of code generation requirements?

At all points, the key is not painting oneself into a corner such that subsequent steps are more convoluted than they have to be. As a specific example, Pastel supported what appears to be an early implementation of generics, but since the scope wasn't limited the parse tree for the entire program had to be kept in memory- and in those days memory was scarce and expensive.

Perhaps more relevant to the current discussion- and to the state of FPC- is that the syntax tree, as the compiler's central element, should be sufficiently well-understood that any researcher can work out how to extend it to store additional context revealed by the parser and then use that context to contribute to experimental code generation.

MarkMLl








Title: Re: Tutorial on creating a compiler
Post by: marcov on December 21, 2019, 04:19:36 pm
The above are the timing results for a nested loop that does absolutely _nothing_.  It is inexcusable that a compiler with optimizations turned on, did not eliminate what is basically a fancy noop.   Just for the record, I compiled that nested loop with VC++ and the time was _zero_ (when optimization was turned on), which is what it should be.  The un-optimized version generated code that is basically identical to FPC's, because of that, I didn't bother to time it, the time should be about the same.

C has many of such optimizations because many old language bootstrapped using a "language X" to C transpiler. These transpilers were usually very simplistic, and generated bad code and other nonsense.

Similarly, as a fairly defective language C has relied on preprocessor code that could also generate a lot of noops, that should be optimized away.

That is not an excuse, and FPC could always optimize better, but there is some history and priorities involved.
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 21, 2019, 08:11:15 pm
The above are the timing results for a nested loop that does absolutely _nothing_.  It is inexcusable that a compiler with optimizations turned on, did not eliminate what is basically a fancy noop.   Just for the record, I compiled that nested loop with VC++ and the time was _zero_ (when optimization was turned on), which is what it should be.  The un-optimized version generated code that is basically identical to FPC's, because of that, I didn't bother to time it, the time should be about the same.
C has many of such optimizations because many old language bootstrapped using a "language X" to C transpiler. These transpilers were usually very simplistic, and generated bad code and other nonsense.
Similarly, as a fairly defective language C has relied on preprocessor code that could also generate a lot of noops, that should be optimized away.
That is not an excuse, and FPC could always optimize better, but there is some history and priorities involved.
What noops stands for?

DevOps Is Dead, Long Live NoOps
Learn why DevOps is not enough
Daniele Fontani
Nov 10, 2019
https://medium.com/better-programming/devop-noops-difference-504dfc4e9faa
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 21, 2019, 10:28:30 pm
FWIW, I'd also like to see this tutorial made. It'd would be invaluable if it were made easy to understand for programmers with little formal training, whatever the level of experience needed.
I firmly believe that if someone _really_ understands something, they can explain it  in terms that just about anyone can understand.

One of my favorite things to ask mathematicians is "why does minus times minus equal plus?"... the explanation is _trivial_ if you understand it.   If you want to _understand_ why, watch the videos found at http://www.dimensions-math.org/Dim_tour_E.htm

If memory serves, I believe it was in chapter 5 where they made it obvious.  You'll be amazed at how much math you can learn when the person who explains it actually understands what he is talking about.   If such explanations pique your interest, look for James Tanton on youtube.  He is great.

Also very interesting are the book series, "proof without words" from Roger Nelson and some from James Tanton.  Warning: the Nelsen books are often quite overpriced.  Tanton's stuff is excellent and usually very affordable (a lot of it free on youtube.)

The saddest part is, while schools teach all kinds of stuff, they don't teach their students _how to think_.  Paul Zeitz has an excellent class, description and availability at https://www.thegreatcourses.com/courses/art-and-craft-of-mathematical-problem-solving.html

HTH.
Hope That Helps, too.

Roger B. Nelsen
https://www.amazon.com/Roger-B-Nelsen/e/B001IOHABO/

James Tanton
https://www.amazon.com/slp/james-tanton/egpv7yxstmxv92a
http://www.jamestanton.com/

EUROPEAN JOURNAL OF PURE AND APPLIED MATHEMATICS, 2010
An Invitation to Proofs Without Words
Claudi Alsina and Roger B. Nelsen
http://www.labjor.unicamp.br/comciencia/files/matematica/ar_roger/ar_roger.pdf

Mathematically Gifted High School Students' Approaches to Developing Visual Proofs (VP) and Preliminary Ideas about VP
Isikhan Ugurel, H. Sevgi Morali, Ozge Karahan, Burcak Boz
https://files.eric.ed.gov/fulltext/EJ1102194.pdf

THE MATHEMATICAL ASSOCIATION OF AMERICA
ROGER NELSEN’S BOOKS, SO FAR
Surveyed by Tom Edgar, Pacific Lutheran Univer-sity, Tacoma, WA
https://community.plu.edu/~edgartj/nelsen.pdf

Proofs Without Words, July 1998
Alex Bogomolny
https://www.cut-the-knot.org/ctk/pww.shtml
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 21, 2019, 11:30:10 pm
@MarkMLI

... GCC didn't spring full-formed from Stallman's brow
Very true.  I'm sure I'm not the only one who wishes compiler writing was that easy :)

Every consideration you mentioned is definitely valid.  My point is, in spite of Stallman being quite knowledgeable in all aspects of compiler theory, the fact that GCC does not optimize away those nested empty loops clearly indicates a compiler design deficiency.  Null code removal is not only a fairly simple optimization, it is also a down optimization, i.e, optimizations that follow other optimizations.  It is part and parcel of an optimizer to eliminate instructions as part of the optimization process (often simply replaced by noops in one of the optimizing passes) and, as a result of that, an optimizer should check that, after a number of optimizations, the resulting optimized statement still does something and, get rid of it if it does not.

It's an example that extensive knowledge of compiler theory does not guarantee that the result will take care of even the basics.  Null code removal is basic.

Perhaps more relevant to the current discussion- and to the state of FPC- is that the syntax tree, as the compiler's central element, should be sufficiently well-understood that any researcher can work out how to extend it to store additional context revealed by the parser and then use that context to contribute to experimental code generation.
I agree and, I would add that the structure or structures used to represent the AST should lend themselves to simple algorithms to analyze the code. Complex and, in many cases, "one size fits all" structures are at the root of some optimizations being complex in their implementation.



@valdir.marcos

I'm a firm believer in "proofs without words".  I consider it an "acid" test.  If someone can represent a problem with some visual construct, it reflects that the person understands the problem (presuming the construct is correct.)  Inability to do that, more often than not, reflects that the individual has _not_ understood the problem and, obviously, without a correct understanding of the problem, a solution is not very likely to be forthcoming.


Title: Re: Tutorial on creating a compiler
Post by: PascalDragon on December 22, 2019, 12:28:35 pm
Every consideration you mentioned is definitely valid.  My point is, in spite of Stallman being quite knowledgeable in all aspects of compiler theory, the fact that GCC does not optimize away those nested empty loops clearly indicates a compiler design deficiency.  Null code removal is not only a fairly simple optimization, it is also a down optimization, i.e, optimizations that follow other optimizations.
An empty loop could also be by design. Especially on embedded platforms it might be used to implement delays.
Title: Re: Tutorial on creating a compiler
Post by: JdeHaan on December 23, 2019, 08:54:07 am
To get an idea of other programming languages, concepts, paradigms, etc, have a look at this Wikipedia page:

https://en.wikipedia.org/wiki/List_of_programming_languages#A


What will be the next step in this journey?
Title: Re: Tutorial on creating a compiler
Post by: Leledumbo on December 23, 2019, 11:11:52 am
Attached is a list of my upcoming (with no deadline) compiler tutorial. The higher the number in the start of folder name, the more complex it becomes. However, each is a standalone working compiler. Very far from complete, but should be somewhat good foundation for self hosting in the end (hence the choice of simple procedural design with as basic as possible structure). The document part is meant to be used as a lesson (for teaching) instead of tutorial, so the attached archive is expected to be the answer to questions/quizzes/exams/whatever on each chapter. I'm not confident enough with the document, so I'll keep it for private now.
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 23, 2019, 11:46:06 am
Attached is a list of my upcoming (with no deadline) compiler tutorial. The higher the number in the start of folder name, the more complex it becomes.
Thank you.  :)

I've just had a quick look.  The unit "recognizer" in "02 - recognizer" needs to have the unit "messages" added to it uses list.  It doesn't compile as it currently is.

Would it be possible for you to post the language definition ?   (I can build it mentally from the source but, it would be much nicer to get it from the language's creator ;) )
Title: Re: Tutorial on creating a compiler
Post by: marcov on December 23, 2019, 01:04:57 pm
What noops stands for?

no-ops, lines or assembler instructions that don't do anything.
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 23, 2019, 01:16:10 pm
no-ops, lines or assembler instructions that don't do anything.
Thank you for answering that.  I thought he was asking "tongue in cheek" but after reviewing his post, I guess not.

ETA:

noop or no-op = no operation

Most CPUs have a no-op instruction that accomplishes two things: 1.) it takes space and 2.) it consumes one clock cycle.
Title: Re: Tutorial on creating a compiler
Post by: Leledumbo on December 23, 2019, 01:26:50 pm
Would it be possible for you to post the language definition ?   (I can build it mentally from the source but, it would be much nicer to get it from the language's creator ;) )
Since the document isn't complete, I honestly don't have the complete grammar. My plan was to add it gradually as the lesson progresses. What I have in mind is on the level of Oberon-0, but with self hosting capabilities.
Title: Re: Tutorial on creating a compiler
Post by: marcov on December 23, 2019, 01:48:05 pm
no-ops, lines or assembler instructions that don't do anything.
Thank you for answering that.  I thought he was asking "tongue in cheek" but after reviewing his post, I guess not.

I googled it and I indeed got only links to some devops spinoff.

Quote
ETA:

noop or no-op = no operation

Most CPUs have a no-op instruction that accomplishes two things: 1.) it takes space and 2.) it consumes one clock cycle.

Yeah. And in this specific case it is more macros that are used in plain C (e.g. to emulate a form of OOP) and generate weird C code, leaving it for the compiler to sort out.

The ugliest I have ever seen was the old FreeBSD startup code (CSU).
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 23, 2019, 02:37:05 pm
Since the document isn't complete, I honestly don't have the complete grammar. My plan was to add it gradually as the lesson progresses. What I have in mind is on the level of Oberon-0, but with self hosting capabilities.
Fair enough... good to know that is "work in progress" ;)  thank you.


... macros that are used in plain C (e.g. to emulate a form of OOP) and generate weird C code, leaving it for the compiler to sort out.
Unfortunately, there are times when the programmer has to sort it out first. 
Title: Re: Tutorial on creating a compiler
Post by: marcov on December 23, 2019, 06:16:34 pm
... macros that are used in plain C (e.g. to emulate a form of OOP) and generate weird C code, leaving it for the compiler to sort out.
Unfortunately, there are times when the programmer has to sort it out first.

I meant it as an example why it is more worthwhile for a C compiler to optimize strange constructs.
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on December 23, 2019, 06:56:30 pm
What noops stands for?
no-ops, lines or assembler instructions that don't do anything.
no-ops, lines or assembler instructions that don't do anything.
Thank you for answering that.  I thought he was asking "tongue in cheek" but after reviewing his post, I guess not.

ETA:
noop or no-op = no operation
Most CPUs have a no-op instruction that accomplishes two things: 1.) it takes space and 2.) it consumes one clock cycle.
no-ops, lines or assembler instructions that don't do anything.
Thank you for answering that.  I thought he was asking "tongue in cheek" but after reviewing his post, I guess not.
I googled it and I indeed got only links to some devops spinoff.
Quote
ETA:
noop or no-op = no operation
Most CPUs have a no-op instruction that accomplishes two things: 1.) it takes space and 2.) it consumes one clock cycle.
Yeah. And in this specific case it is more macros that are used in plain C (e.g. to emulate a form of OOP) and generate weird C code, leaving it for the compiler to sort out.
The ugliest I have ever seen was the old FreeBSD startup code (CSU).
Thank you both for answering that.
Before asking, I always search for words or concepts that I don't know. For example, I didn't know the expression "tongue in cheek":
https://en.wikipedia.org/wiki/Tongue-in-cheek

Generally, being humoristic or ironic or even unpolite is not useful or practical on international conversation and only produce noise. So, I try to avoid it as far as I can to keep a good communication process:
https://courses.lumenlearning.com/wmopen-organizationalbehavior/chapter/key-components-of-communication/
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 23, 2019, 07:21:36 pm
Generally, being humoristic or ironic or even unpolite is not useful or practical on international conversation and only produce noise. So, I try to avoid it as far as I can to keep a good communication process:

I agree, but sometimes pointing that out to the people with whom you are having difficulty communicating can make things worse.

Poing back to the no-op point, there's actually two or perhaps even three distinct things here.

The first is when a compiler inserts NOP because it is /required/ after a jump so that the CPU can sort itself out.

The second is when a compiler inserts NOP as part of a construct expansion, but it can be safely optimised away.

The third is when a compiler inserts some other construct- perhaps a single-iteration loop- as part of a construct expansion, and it should always be optimised away.

A language such as C (but others, even FORTRAN, have been used) is very prone to the third case when it is being used as the backend of some sort of translation process.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 23, 2019, 07:29:00 pm
https://en.wikipedia.org/wiki/Tongue-in-cheek

Generally, being humoristic or ironic or even unpolite is not useful or practical on international conversation and only produce noise. So, I try to avoid it as far as I can to keep a good communication process:
https://courses.lumenlearning.com/wmopen-organizationalbehavior/chapter/key-components-of-communication/
The explanation in the Wikipedia article is historically informative but, it's really not all that good at describing the usage of "tongue in cheek" these days.

The Wiktionary, on the other hand, provides a more contemporaneous definition and usage:
Quote
(idiomatic) Not intended seriously; jocular or humorous.

He gave a tongue-in-cheek explanation of why the sky was blue, offering a theory about some primordial discount on light blue paint.

Just in case, it's mostly used in a humorous way these days.  Mostly often, when you know something but ask anyway (as a humorous way of pretending you don't know.)
Title: Re: Tutorial on creating a compiler
Post by: Leledumbo on December 23, 2019, 07:34:41 pm
Fair enough... good to know that is "work in progress" ;)  thank you.
If you want to take a peek, though, I can send in private whatever the document has now. The last available chapter is incomplete (code generation part is missing) even though the state of the implementation itself is working. An example if you need one (requires lesson 11 compiler):
Code: [Select]
[x,y]
x := 1 + 2
y := x + 3
! y
Title: Re: Tutorial on creating a compiler
Post by: julkas on December 25, 2019, 11:46:52 am
Is there any relations with FPC and - https://edgeconsult.me/lib/Software%20Development/Others/TP%20Lex%20and%20Yacc.pdf ?
Title: Re: Tutorial on creating a compiler
Post by: Leledumbo on December 25, 2019, 10:14:28 pm
Is there any relations with FPC and - https://edgeconsult.me/lib/Software%20Development/Others/TP%20Lex%20and%20Yacc.pdf ?
A little: those two are ported and available in standard FPC distribution. That manual can be reused as well, FPC incompatibilities with TP still apply, though.
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 26, 2019, 11:01:58 am
Somewhere in this thread I believe that somebody has reminded us of one school of language design:

Define a syntax.

Using that syntax, construct all possible utterances.

Define a parser.

I might be mis-quoting, and I'm sure that I'm expressing myself badly. However I'd like to ask: is the "construct all possible" stage necessary or even desirable? It might have been necessary when bottom-up parsing demanded tables covering every eventuality, but particularly if top-down parsing is being used I'm not sure that it contributes to detecting and resolving the real issue wihich is ambiguity in the syntax rules.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 26, 2019, 12:20:23 pm
Somewhere in this thread I believe that somebody has reminded us of one school of language design:

Define a syntax.

Using that syntax, construct all possible utterances.

Define a parser.

I might be mis-quoting, and I'm sure that I'm expressing myself badly. However I'd like to ask: is the "construct all possible" stage necessary or even desirable?
Personally, I don't think constructing "all possible utterances" is useful nor practical.  The parser is supposed to define the limits of valid constructions using information that is derived/obtained during the scanning process.

A humorous example from "Engineering a compiler" 2nd edition by Keith Cooper and Linda Torczon is: given a production for a simple English sentence
Quote
1 Sentence → Subject verb Object endmark
2 Subject → noun
3 Subject → Modifier noun
4 Object → noun
5 Object → Modifier noun
6 Modifier → adjective
The parser can determine that the sentence "Compilers are engineered objects" is a semantically valid sentence and, the sentence "Rocks are green vegetables", while being structurally correct is not semantically valid, i.e, rocks are not members of the vegetables "data type", therefore it is not semantically sensible.


I'm not sure that it contributes to detecting and resolving the real issue wihich is ambiguity in the syntax rules.
The problem with language design is that it is extremely easy to inadvertently introduce ambiguities in the grammar.  Tools like lex and yacc will detect them because, one way or another, they need to ensure that for a given sequence of tokens there must be one and, only one, parsing rule applicable to it.  If anything in the grammar breaks that rule, they will complain, which IMO is their nicest feature.

The "else" problem in languages like Pascal and C is an example of that.  Without a convention that stipulates the "else" always belongs to the inner if, then the "else" could associate with the inner or outer if.  Compiler-compilers detect problems like that. This example is now obvious to just about every programmer but, subtle similar problems are easy to inadvertently introduce in a grammar and scanner and parser generators are great to catch those mistakes.

NOTE: Cooper and Torczon book is reasonably easy to understand and covers a lot of theory with accessible explanations _but_ unlike with Brinch Hansen's book, you will _not_ end up with a compiler ("toy" or otherwise.)  Not quite as easy as Brinch Hansen's book "On Pascal compilers" but, definitely much more readable than the Dragon book or similar books.  And with a little shopping around it can often be had in the $25 to $30 ballpark (add reasonable shipping, make sure the shipping charge _is_ reasonable.)  One downside of that book is that, while you will have a much better understanding of the theory you will still not have much of an idea of how to build a reasonably decent compiler, IOW, you'll be eating your heart out <evil smile>.

Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 26, 2019, 12:45:38 pm
Syntax analysis
The parser can determine that the sentence "Compilers are engineered objects" is a semantically valid sentence and, the sentence "Rocks are green vegetables", while being structurally correct is not semantically valid, i.e, rocks are not members of the vegetables "data type", therefore it is not semantically sensible.

I suspect that the idea goes back to what I understand is one way of analysing somebody else's spoken language (an Amazonian tribe or whatever): string words together in all possible sequences and see which sequences they react to as meaningful. Rinse and repeat.

The "else" problem in languages like Pascal and C is an example of that.  Without a convention that stipulates the "else" always belongs to the inner if, then the "else" could associate with the inner or outer if.  Compiler-compilers detect problems like that. This example is now obvious to just about every programmer but, subtle similar problems are easy to inadvertently introduce in a grammar and scanner and parser generators are great to catch those mistakes.

I'm sure I can't find a good example in the available time (despite its being Boxing Day I've got stuff I really do have to be working on) but my recollection is that at least some early ALGOL manuals made /extremely/ heavy going of explaining nested conditional statements. From memory, they completely ignored the fact that they were nested (i.e. amenable to recursive definition and analysis), and instead tried to treat an  if ... if ... if...  sequence as a single complex boolean expression to be evaluated. Over the intervening 60 years our ability to abstract and explain language structures has improved a lot, but regrettably our ability to explain parsers without lapsing into priestly gobbledegook hasn't.

definitely much more readable than the Dragon book

I suggest that its density and the reverence in which it's held merit the word "tome" rather than "book" :-)

Somewhat later: I find myself doing better than I expected, so I've transcribed this as an example of something which is spectacularly beginner-unfriendly:

Quote
The arithmetic expressions following the delimiters THEN and ELSE may also be conditional arithmetic expressions. As a result, a conditional arithmetic expression could contain a series of IF clauses in the expression following either or both of the delimiters.

In the case of a conditional arithmetic expression following the delimter THEN, the Boolean expression(s) in the IF clause(s) are evaluated from left to right as long as they yield a logical value of TRUE. If they all yield a logical value of TRUE, the expression following the last delimiter THEN is executed, thus completing the evaluation of the whole expression. If any of the Boolean expressions yields a logical value of FALSE, the expression following the corresponding delimiter ELSE is executed.

In the case of the conditional arithmetic expression following the delimiter ELSE, the respective Boolean expressions in the IF clauses are evaluated from left to right until a logical value of TRUE is found. Then the value of the succeeding arithmetic expression is the value of the entire arithmetic expression. If no TRUE value is found, the value of the whole expression is that of the expression following the last ELSE.

In nested IF clauses, the first THEN corresponds to the last ELSE, and the innermost THEN to the following (i.e., the innermost) ELSE. The delimiters THEN and ELSE between these extremes follow the logical pattern established, i.e., the next outermost THEN corresponds to the next outermost ELSE, and so on until the innermost THEN-ELSE pair has been matched.

Appropriate positioning of parentheses may serve to establish a different order of execution of operations within an expression.

That was not accompanied by any form of example, let alone one that demonstrated useful indentation.

bitsavers.informatik.uni-stuttgart.de/pdf/burroughs/B5000_5500_5700/1028024_B5500_ExtendedAlgol_Jul67.pdf

I consider that to be a most unfortunate example of garbled explanation, and it is doubly unfortunate in that the technical author undoubtedly had a direct line of communication to Richard Waychoff who has a good claim to being at least the co-inventor of recursive descent compilation. Regrettably, the entire reference manual is not much better, with a complete lack of separation between basic syntax, types, operators, and procedures which these days would be carefully put in separate libraries.

Wirth's ALGOL-W manual is much clearer, but I don't know how much of that was written by him in person. I can't comment on the quality of comparable documentation from e.g. IBM, but there's a possibility that we owe Wirth as much for his clarity of exposition as for his skill as an early compiler writer.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: 440bx on December 26, 2019, 02:50:29 pm
Somewhat later: I find myself doing better than I expected, so I've transcribed this as an example of something which is spectacularly beginner-unfriendly:

Quote
The arithmetic expressions following the delimiters THEN and ELSE may also be conditional arithmetic expressions. As a result, a conditional arithmetic expression could contain a series of IF clauses in the expression following either or both of the delimiters.

In the case of a conditional arithmetic expression following the delimter THEN, the Boolean expression(s) in the IF clause(s) are evaluated from left to right as long as they yield a logical value of TRUE. If they all yield a logical value of TRUE, the expression following the last delimiter THEN is executed, thus completing the evaluation of the whole expression. If any of the Boolean expressions yields a logical value of FALSE, the expression following the corresponding delimiter ELSE is executed.

In the case of the conditional arithmetic expression following the delimiter ELSE, the respective Boolean expressions in the IF clauses are evaluated from left to right until a logical value of TRUE is found. Then the value of the succeeding arithmetic expression is the value of the entire arithmetic expression. If no TRUE value is found, the value of the whole expression is that of the expression following the last ELSE.

In nested IF clauses, the first THEN corresponds to the last ELSE, and the innermost THEN to the following (i.e., the innermost) ELSE. The delimiters THEN and ELSE between these extremes follow the logical pattern established, i.e., the next outermost THEN corresponds to the next outermost ELSE, and so on until the innermost THEN-ELSE pair has been matched.

Appropriate positioning of parentheses may serve to establish a different order of execution of operations within an expression.

That was not accompanied by any form of example, let alone one that demonstrated useful indentation.
I would hypothesize that expecting an example from someone who considers the above text to be an explanation is clearly an instance of having set one's expectations too high ;)

I consider that to be a most unfortunate example of garbled explanation,
You call THAT thing an explanation ?... you are very kind :)

there's a possibility that we owe Wirth as much for his clarity of exposition as for his skill as an early compiler writer.
Quite likely.  IMO, brilliance is what produces clarity/simplicity.  N. Wirth is a good example of that.

Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 26, 2019, 04:32:36 pm
I would hypothesize that expecting an example from someone who considers the above text to be an explanation is clearly an instance of having set one's expectations too high ;)

Well it's not actually wrong... but if you compare it with Waychoff's own account referring to roughly five years earlier:

Quote
The main thing that we were doing was wearing out our copies of the "Report on the Algorithmic Language ALGOL 60". We were thumbing back and forth and getting nowhere. Finally. I said "My problem is that I need to see this language laid out before me, all in one place. Let's draw a flowchart of the language." Bob (Barton) said "No it will not work. This is a recursive language and all you would have is a few boxes with lines leading back into themselves." He even drew a single box on the blackboard with a lot of lines running out of it and back it again.

Due to my great respect for Bob, I would never question anything that he said. So I just let the subject drop. So we thumbed through our reports and stared at each other for two more days. Then I brought up the same subject again. But this time I got up to the blackboard and tried to express myself with chalk. ALL of a sudden Lloyd (Turner) said "I see what you are talking about". He was as excited as I was, but he had to run home for lunch. When he came back he was really excited. He said that the only thing that I had been missing was that we needed two shapes of boxes. One meaning that the metalinguistic variable was defined here, and another meaning that it was used here. Lloyd used a rectangle and a diamond for the two shapes.

At that point Barton agreed that we really had something and he went away. Lloyd and I diagrammed the entire language on his blackboard that afternoon.' Then we called in our newly assigned documenter, who was Warren Taylor, and showed him the blackboard. Warren immediately liked the idea and got a draftsman assigned to work with him. They slaved for days trying to achieve the best possible arrangement of the boxes and pickles as Warren called them. It was Warren's idea to add the circles to indicate the basic symbols in the language. I do not know for sure, but I think that Warren decided to use the coordinate system.

Finally, the chart was complete. I have no idea who the draftsman was, but Warren would probably remember.

Automatic Programming was still in Marketing at that time, so it did not take long for our sales force (Ralph Pearson by name) to get wind of the fact that we had finally done something to make Algol understandable to the masses. Some kind of chemistry took place that I did not understand, and all of a sudden, we were flooded with syntax charts of every size, shape and composition. We had three foot long wall charts, plastic syntax charts that would fold up and fit in our pockets. and eight and a half by 11 inch charts that we could lay on cur desks. Every programmer in the section (which had grown to 15 people by then) received all forms of the chart.

Nearly everyone received the charts enthusiastically. I suspect that even the Burroughs Automatic Programmers had been as bewildered by Algol as the rest of the computing community. Ken Meyers, whose office was directly across the hall from me, showed his true technical acumen at that early age by taking Scotch tape and carefully constructing a wastebasket liner out of his wall chart. He sat it in the middle of the table in his office about 10 feet from my door for all to see.

With the advent of the syntax chart, the ice was broken and the B5000 was moving again. Recursive Descent was a small step after a few hours of gazing at the chart, and our sales force (Ralph) felt that he had a tool that would really sell the concept.

That summer (1961), Bob and I went to the University of Michigan Engineering Summer Conferences. All of the greats of the computer world were there. To list a few, there was Newell (of Newell, Simon and Shaw); Galler, Graham, and Arden; Alan Perlis; Phillipe Dreyfuss (from Bull in France); Bob Bemer (from IBM); Bob Rosen; and Alston Householder. Alston Householder was such a big name in numerical analysis that I had him ranked in my mind with Aristotle. I literally thought that Alston Householder had died somewhere around the sixteenth century.

It was somehow decided that all of these great minds should get together for a discussion one evening after the conclusion of the days classes. Barton was obviously invited. Somehow I had felt all along that he was kind of unimpressed by the syntax chart. Probably because he had told me that it was impossible. In any case, he asked me to come to this meeting with him. Even though my ego was pretty high, I was frankly scared to go into the room with those people. I think I practically held Bob's hand as we walked in.

Almost as if it were choreographed, each person along the table made some kind of opening statement. When it became Barton's turn, he started talking about the syntax chart. As I recall, I was hiding under the table and peaking over the edge. Bob said that I had brought along a copy (at his direction) in case anyone was interested. That immediately broke up the meeting. Everyone was advancing on me to see the chart. I found myself shaking hands with a 400 year old man (i.e. Householder).

Alan Perlis was the editor of "The Communications Of The Association For Computing Machinery". He was also one of the original 13 architects of Algol (which places him somewhere above the 12 Disciples Of Christ in my mind). He was more enthusiastic than anyone. -Alan immediately started referring to it as "The Algol Roadmap". He said that if I would prepare it as a paper and submit it to him that he would publish it in the "Communications Of The ACM". Warren did all of the work. It was published in September, 1961. It was undoubtedly the first centerfold for any magazine other than "Playboy". To my knowledge, they have not published anything in foldout form since then.

http://ed-thelen.org/comp-hist/B5000-AlgolRWaychoff.html#10

So there were undoubtedly people accessible to the author of that manual who did "get it", but /something/ went wrong.

Assuming, of course, that the above- written by a company man for company men- is accurate. I'd have expected the author and his colleagues to have been familar with BNF since (if https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form is accurate) it was used in the ALGOL 60 report which Waychoff et al. had in front of them, and surely translating from text to a graphical representation wasn't /that/ big a jump.

Anyway, the bottom line is that these days people know how to describe a programming language accessibly. They haven't really achieved the same lucidity for compiler operation.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on December 26, 2019, 11:09:22 pm
To list a few, there was Newell (of Newell, Simon and Shaw);

Quote
Fifty years ago, Newell and Simon (1956) invented a "thinking machine" called the Logic Theorist. The Logic Theorist was a computer program that could prove theorems in symbolic logic from Whitehead and Russell's Principia Mathematica. This was perhaps the first working program that simulated some aspects of peoples' ability to solve complex problems.

https://www.researchgate.net/publication/276216226_Newell_and_Simon's_Logic_Theorist_Historical_Background_and_Impact_on_Cognitive_Modeling

With the caveat that there's always been the temptation of assigning a "deus ex machina" role to new technology.

MarkMLl
Title: Re: Tutorial on creating a compiler
Post by: lucamar on January 09, 2020, 05:24:21 pm
For completeness sake, a rather nice book (thanks to this post by Ñuño (https://forum.lazarus.freepascal.org/index.php/topic,47844.msg345652.html#msg345652)): Robert Nystrom's Crafting Interpreters (https://craftinginterpreters.com).

This caught my atention at once:
Quote from: Crafting Interpreters
[...] this text is lighter on theory than others. As we build each piece of the system, I will introduce the history and concepts behind it. I’ll try to get you familiar with the lingo so that if you ever find yourself in a cocktail party full of PL (programming language) researchers, you’ll fit in.

But we’re mostly going to spend our brain juice getting the language up and running. This is not to say theory isn’t important. Being able to reason precisely and formally about syntax and semantics is a vital skill when working on a language. But, personally, I learn best by doing. It’s hard for me to wade through paragraphs full of abstract concepts and really absorb them. But if I’ve coded something, run it, and debugged it, then I get it.

That’s my goal for you. I want you to come away with a solid intuition of how a real language lives and breathes. My hope is that when you read other, more theoretical books later, the concepts there will firmly stick in your mind, adhered to this tangible substrate.
(emphasis mine)
Title: Re: Tutorial on creating a compiler
Post by: MarkMLl on January 09, 2020, 05:46:38 pm
Thanks for the cross-posting, which gets everything usefully in one place.

MarkMLl


Title: Re: Tutorial on creating a compiler
Post by: JdeHaan on January 09, 2020, 06:02:12 pm
Bob Nystrom explains very clearly the step by step creation of an interpreter (in Java) and compiler (in C). I used his text to create an interpreter in Free Pascal (https://github.com/jdehaan2014/GearLanguage ), and now busy to translate his C-code into FP for the compiler part.
He also wrote an article on Pratt parsers :
http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
Title: Re: Tutorial on creating a compiler
Post by: valdir.marcos on January 10, 2020, 04:42:04 am
Thanks for the cross-posting, which gets everything usefully in one place.
Adding the links below, we'll have an overview of recent useful extra information:

Best Tutorial
https://forum.lazarus.freepascal.org/index.php/topic,45078.0.html

Wiki stub: Make your own compiler, interpreter, parser, or expression analyzer
https://forum.lazarus.freepascal.org/index.php/topic,42556.0.html

Book recommendation on programming language concepts and design ?
https://forum.lazarus.freepascal.org/index.php/topic,42156.0.html

Books on building an object oriented compiler
https://forum.lazarus.freepascal.org/index.php/topic,47844.0.html

Books and Articles on building a debugger
https://forum.lazarus.freepascal.org/index.php/topic,48075.0.html

Standard Pascal ISO 7185:1990 and Extended Pascal ISO/IEC 10206:1991
https://forum.lazarus.freepascal.org/index.php/topic,47587.0.html

Modula, Oberon or Ada
https://forum.lazarus.freepascal.org/index.php/topic,47800.0.html

A suggestion for a new FPC feature
https://forum.lazarus.freepascal.org/index.php/topic,45832.0.html

P65Pas a new 6502 Pascal compiler
https://forum.lazarus.freepascal.org/index.php/topic,42408.0.html

Midnight trolling: Don't take everything serious!
https://forum.lazarus.freepascal.org/index.php/topic,48033.0.html
TinyPortal © 2005-2018