Recent

Author Topic: Tutorial on creating a compiler  (Read 21018 times)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11455
  • FPC developer.
Re: Tutorial on creating a compiler
« Reply #60 on: December 20, 2019, 12:50:35 pm »
One thing lex and yacc type tools are good for is to ensure the grammar is free of ambiguities (the tools will complain if they aren't.) 

Yeah, but if your grammar is not 100% sure the same as your implemented grammar, you get, euh, Delphi :-)
(the grammar in the Delphi manual was quite infamous for not being the same as the compiler behaviour)

Quote
I agree that "reams of theory" are unlikely to be useful but, still, even with in a recursive descent parser, it is important to understand basic compiler construction and grammars theory.

True. But even the useful part of gets drowned in details that only a LALR parser generator builder needs to know. Not even their user, specially since more modern tools have their own workarounds to disambiguate grammars, and don't need as many of the old refactoring tricks.

But there are no doubt books with LL(1) languages and RD as subject. I would just not advise to start with the heaviest category. You need to acquire quite a lot jargon to make heads or tails of it, and without classes and labwork to gently nudge that into you it will be hard.

Start slow and limit your scope, and after toying a while upgrade to heavier texts. Don't imagine for a second that the first thing you write is the definitive one.


440bx

  • Hero Member
  • *****
  • Posts: 4063
Re: Tutorial on creating a compiler
« Reply #62 on: December 21, 2019, 09:30:04 am »
Interesting articles:
The problem with those and all the tutorials I've seen so far is that they really leave too much out.  After reading any of those tutorials the reader will not be able to write a "real" compiler.  By "real" compiler I mean a compiler that can compile itself and generates reasonably decent code in a reasonable amount of time.

IOW, the tutorials are too basic.  They don't analyze the trade offs a compiler writer has to make.  The "heavy duty" compiler construction books drown the reader in theory about grammars, parser generators, NFAs to DFAs and other things I don't even care to remember.  After reading all that stuff, even if the reader understood it all, he/she is nowhere near writing a decent compiler.

In spite of the above, I am grateful to the authors for their efforts.  They deserve and earned due credit for their work.

The result in both cases leave a lot to be desired.  For instance (from the thread "ridiculously slow"):
Quote
Empty inner loop:
                       un-optimized         optimized
FPC:                      121 s                   109 s   
Mingw-gcc:            76  s                     27  s
The above are the timing results for a nested loop that does absolutely _nothing_.  It is inexcusable that a compiler with optimizations turned on, did not eliminate what is basically a fancy noop.   Just for the record, I compiled that nested loop with VC++ and the time was _zero_ (when optimization was turned on), which is what it should be.  The un-optimized version generated code that is basically identical to FPC's, because of that, I didn't bother to time it, the time should be about the same.


(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6692
Re: Tutorial on creating a compiler
« Reply #63 on: December 21, 2019, 12:19:25 pm »
IOW, the tutorials are too basic.  They don't analyze the trade offs a compiler writer has to make.  The "heavy duty" compiler construction books drown the reader in theory about grammars, parser generators, NFAs to DFAs and other things I don't even care to remember.  After reading all that stuff, even if the reader understood it all, he/she is nowhere near writing a decent compiler.
...
                       un-optimized         optimized
FPC:                      121 s                   109 s   
Mingw-gcc:            76  s                     27  s

But a high-performance compiler can be built up incrementally: GCC didn't spring full-formed from Stallman's brow, and as everybody knows he started off by trying to use Pastel ("an off-color Pascal") as its foundation.

I'd suggest that it boils down to questions of adequacy and review.

Is the underlying syntactic definition adequate to avoid ambiguities?

Is the language definition adequate to meet its goals (e.g. concurrency support in Ada)?

Does the syntax need to be reviewed in order to support something that has to be in the language?

Is a particular type of parser adequate to parse the syntactic description of the language?

Does the language need to be reviewed because of problems implicit to all types of parser?

Is a particular set of intermediate data structures adequate to represent the parsed language such that target-agnostic optimisations can be applied?

Does the parser need to be reviewed because of symbol table or parse tree problems (e.g. available storage)?

Is a particular final representation adequate to allow target-specific optimisations to be applied?

Does the parse tree or symbol table need to be reviewed because of code generation requirements?

At all points, the key is not painting oneself into a corner such that subsequent steps are more convoluted than they have to be. As a specific example, Pastel supported what appears to be an early implementation of generics, but since the scope wasn't limited the parse tree for the entire program had to be kept in memory- and in those days memory was scarce and expensive.

Perhaps more relevant to the current discussion- and to the state of FPC- is that the syntax tree, as the compiler's central element, should be sufficiently well-understood that any researcher can work out how to extend it to store additional context revealed by the parser and then use that context to contribute to experimental code generation.

MarkMLl








MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11455
  • FPC developer.
Re: Tutorial on creating a compiler
« Reply #64 on: December 21, 2019, 04:19:36 pm »
The above are the timing results for a nested loop that does absolutely _nothing_.  It is inexcusable that a compiler with optimizations turned on, did not eliminate what is basically a fancy noop.   Just for the record, I compiled that nested loop with VC++ and the time was _zero_ (when optimization was turned on), which is what it should be.  The un-optimized version generated code that is basically identical to FPC's, because of that, I didn't bother to time it, the time should be about the same.

C has many of such optimizations because many old language bootstrapped using a "language X" to C transpiler. These transpilers were usually very simplistic, and generated bad code and other nonsense.

Similarly, as a fairly defective language C has relied on preprocessor code that could also generate a lot of noops, that should be optimized away.

That is not an excuse, and FPC could always optimize better, but there is some history and priorities involved.

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Tutorial on creating a compiler
« Reply #65 on: December 21, 2019, 08:11:15 pm »
The above are the timing results for a nested loop that does absolutely _nothing_.  It is inexcusable that a compiler with optimizations turned on, did not eliminate what is basically a fancy noop.   Just for the record, I compiled that nested loop with VC++ and the time was _zero_ (when optimization was turned on), which is what it should be.  The un-optimized version generated code that is basically identical to FPC's, because of that, I didn't bother to time it, the time should be about the same.
C has many of such optimizations because many old language bootstrapped using a "language X" to C transpiler. These transpilers were usually very simplistic, and generated bad code and other nonsense.
Similarly, as a fairly defective language C has relied on preprocessor code that could also generate a lot of noops, that should be optimized away.
That is not an excuse, and FPC could always optimize better, but there is some history and priorities involved.
What noops stands for?

DevOps Is Dead, Long Live NoOps
Learn why DevOps is not enough
Daniele Fontani
Nov 10, 2019
https://medium.com/better-programming/devop-noops-difference-504dfc4e9faa

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Tutorial on creating a compiler
« Reply #66 on: December 21, 2019, 10:28:30 pm »
FWIW, I'd also like to see this tutorial made. It'd would be invaluable if it were made easy to understand for programmers with little formal training, whatever the level of experience needed.
I firmly believe that if someone _really_ understands something, they can explain it  in terms that just about anyone can understand.

One of my favorite things to ask mathematicians is "why does minus times minus equal plus?"... the explanation is _trivial_ if you understand it.   If you want to _understand_ why, watch the videos found at http://www.dimensions-math.org/Dim_tour_E.htm

If memory serves, I believe it was in chapter 5 where they made it obvious.  You'll be amazed at how much math you can learn when the person who explains it actually understands what he is talking about.   If such explanations pique your interest, look for James Tanton on youtube.  He is great.

Also very interesting are the book series, "proof without words" from Roger Nelson and some from James Tanton.  Warning: the Nelsen books are often quite overpriced.  Tanton's stuff is excellent and usually very affordable (a lot of it free on youtube.)

The saddest part is, while schools teach all kinds of stuff, they don't teach their students _how to think_.  Paul Zeitz has an excellent class, description and availability at https://www.thegreatcourses.com/courses/art-and-craft-of-mathematical-problem-solving.html

HTH.
Hope That Helps, too.

Roger B. Nelsen
https://www.amazon.com/Roger-B-Nelsen/e/B001IOHABO/

James Tanton
https://www.amazon.com/slp/james-tanton/egpv7yxstmxv92a
http://www.jamestanton.com/

EUROPEAN JOURNAL OF PURE AND APPLIED MATHEMATICS, 2010
An Invitation to Proofs Without Words
Claudi Alsina and Roger B. Nelsen
http://www.labjor.unicamp.br/comciencia/files/matematica/ar_roger/ar_roger.pdf

Mathematically Gifted High School Students' Approaches to Developing Visual Proofs (VP) and Preliminary Ideas about VP
Isikhan Ugurel, H. Sevgi Morali, Ozge Karahan, Burcak Boz
https://files.eric.ed.gov/fulltext/EJ1102194.pdf

THE MATHEMATICAL ASSOCIATION OF AMERICA
ROGER NELSEN’S BOOKS, SO FAR
Surveyed by Tom Edgar, Pacific Lutheran Univer-sity, Tacoma, WA
https://community.plu.edu/~edgartj/nelsen.pdf

Proofs Without Words, July 1998
Alex Bogomolny
https://www.cut-the-knot.org/ctk/pww.shtml

440bx

  • Hero Member
  • *****
  • Posts: 4063
Re: Tutorial on creating a compiler
« Reply #67 on: December 21, 2019, 11:30:10 pm »
@MarkMLI

... GCC didn't spring full-formed from Stallman's brow
Very true.  I'm sure I'm not the only one who wishes compiler writing was that easy :)

Every consideration you mentioned is definitely valid.  My point is, in spite of Stallman being quite knowledgeable in all aspects of compiler theory, the fact that GCC does not optimize away those nested empty loops clearly indicates a compiler design deficiency.  Null code removal is not only a fairly simple optimization, it is also a down optimization, i.e, optimizations that follow other optimizations.  It is part and parcel of an optimizer to eliminate instructions as part of the optimization process (often simply replaced by noops in one of the optimizing passes) and, as a result of that, an optimizer should check that, after a number of optimizations, the resulting optimized statement still does something and, get rid of it if it does not.

It's an example that extensive knowledge of compiler theory does not guarantee that the result will take care of even the basics.  Null code removal is basic.

Perhaps more relevant to the current discussion- and to the state of FPC- is that the syntax tree, as the compiler's central element, should be sufficiently well-understood that any researcher can work out how to extend it to store additional context revealed by the parser and then use that context to contribute to experimental code generation.
I agree and, I would add that the structure or structures used to represent the AST should lend themselves to simple algorithms to analyze the code. Complex and, in many cases, "one size fits all" structures are at the root of some optimizations being complex in their implementation.



@valdir.marcos

I'm a firm believer in "proofs without words".  I consider it an "acid" test.  If someone can represent a problem with some visual construct, it reflects that the person understands the problem (presuming the construct is correct.)  Inability to do that, more often than not, reflects that the individual has _not_ understood the problem and, obviously, without a correct understanding of the problem, a solution is not very likely to be forthcoming.


(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

PascalDragon

  • Hero Member
  • *****
  • Posts: 5486
  • Compiler Developer
Re: Tutorial on creating a compiler
« Reply #68 on: December 22, 2019, 12:28:35 pm »
Every consideration you mentioned is definitely valid.  My point is, in spite of Stallman being quite knowledgeable in all aspects of compiler theory, the fact that GCC does not optimize away those nested empty loops clearly indicates a compiler design deficiency.  Null code removal is not only a fairly simple optimization, it is also a down optimization, i.e, optimizations that follow other optimizations.
An empty loop could also be by design. Especially on embedded platforms it might be used to implement delays.

JdeHaan

  • Full Member
  • ***
  • Posts: 121
Re: Tutorial on creating a compiler
« Reply #69 on: December 23, 2019, 08:54:07 am »
To get an idea of other programming languages, concepts, paradigms, etc, have a look at this Wikipedia page:

https://en.wikipedia.org/wiki/List_of_programming_languages#A


What will be the next step in this journey?

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Tutorial on creating a compiler
« Reply #70 on: December 23, 2019, 11:11:52 am »
Attached is a list of my upcoming (with no deadline) compiler tutorial. The higher the number in the start of folder name, the more complex it becomes. However, each is a standalone working compiler. Very far from complete, but should be somewhat good foundation for self hosting in the end (hence the choice of simple procedural design with as basic as possible structure). The document part is meant to be used as a lesson (for teaching) instead of tutorial, so the attached archive is expected to be the answer to questions/quizzes/exams/whatever on each chapter. I'm not confident enough with the document, so I'll keep it for private now.

440bx

  • Hero Member
  • *****
  • Posts: 4063
Re: Tutorial on creating a compiler
« Reply #71 on: December 23, 2019, 11:46:06 am »
Attached is a list of my upcoming (with no deadline) compiler tutorial. The higher the number in the start of folder name, the more complex it becomes.
Thank you.  :)

I've just had a quick look.  The unit "recognizer" in "02 - recognizer" needs to have the unit "messages" added to it uses list.  It doesn't compile as it currently is.

Would it be possible for you to post the language definition ?   (I can build it mentally from the source but, it would be much nicer to get it from the language's creator ;) )
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11455
  • FPC developer.
Re: Tutorial on creating a compiler
« Reply #72 on: December 23, 2019, 01:04:57 pm »
What noops stands for?

no-ops, lines or assembler instructions that don't do anything.

440bx

  • Hero Member
  • *****
  • Posts: 4063
Re: Tutorial on creating a compiler
« Reply #73 on: December 23, 2019, 01:16:10 pm »
no-ops, lines or assembler instructions that don't do anything.
Thank you for answering that.  I thought he was asking "tongue in cheek" but after reviewing his post, I guess not.

ETA:

noop or no-op = no operation

Most CPUs have a no-op instruction that accomplishes two things: 1.) it takes space and 2.) it consumes one clock cycle.
« Last Edit: December 23, 2019, 01:19:26 pm by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8757
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Tutorial on creating a compiler
« Reply #74 on: December 23, 2019, 01:26:50 pm »
Would it be possible for you to post the language definition ?   (I can build it mentally from the source but, it would be much nicer to get it from the language's creator ;) )
Since the document isn't complete, I honestly don't have the complete grammar. My plan was to add it gradually as the lesson progresses. What I have in mind is on the level of Oberon-0, but with self hosting capabilities.

 

TinyPortal © 2005-2018