Recent

Author Topic: Tutorial on creating a compiler  (Read 20729 times)

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Tutorial on creating a compiler
« Reply #15 on: December 16, 2019, 09:42:26 am »
This is not something that can be done by a single tutorial. It would take roughly an academic year to teach the computer science underpinnings (data structures, a suitable implementation language, and the basics of parsing), another to review target hardware and implementation approaches (many of which have turned out to be dead ends), and a third to look at actual code generation and the fundamentals of optimisation. Follow that by another year doing guided intermediate-level research, at the end of which the student might possibly have some appreciation of what he doesn't yet know hence might possibly be some use for something.

I attach one PDF which I have found overwhelmingly useful over the years: it's old, but is one of the best introductions to parsing and naive code generation. After that I'd suggest reviewing https://en.wikipedia.org/wiki/TREE-META since it continues the same theme and hints at how one can start optimisation. There's some useful historical links at https://wiki.freepascal.org/Make_your_own_compiler,_interpreter,_parser,_or_expression_analyzer#Ancient_history , broadly speaking I suggest that understanding how languages got to their current state and appreciating that there are some "nice features" that could be put in a syntax that would make it unparseable is more useful than knowing every last detail of post-Chomsky linguistic research.

But finally, I do have to emphasise that the hard work of writing a compiler is understanding the low-level stuff: you have to understand the implications of apparently-simple language features (e.g. exceptions) on the implementation and you have to have an intimate understanding of the target architecture.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Tutorial on creating a compiler
« Reply #16 on: December 16, 2019, 11:12:24 am »
About parsing techniques, it's worth checking out Pratt-parsing (which I used in my interpreter written in FPC and happy to share).
Thank you for mentioning Pratt parsing.  I was not aware of that method of parsing expressions.  Definitely a parsing technique I'm going to look at closely.




@MarkMLl

I agree that  covering all the aspects you mentioned would result in quite a "tutorial" (if being that extensive can still be called a "tutorial".)

You may not be familiar with "Per Brinch Hansen on Pascal Compilers" but in his book he narrows down the field quite a bit.  Specifically, he identifies a problem and offers one of the large number of possible solutions.  I find that to be a good approach and I would do the same.  Of course, using that approach implies that the reader has a decent programming foundation.  IOW, not for complete beginners.

A reasonably knowledgeable programmer can read/study and understand Brinch Hansen's book in about a month.  Unfortunately, his book is rather pricey at this time.

This is not something that can be done by a single tutorial.
Agree.  Some areas would require a progression from a very simple example to what will be the target implementation.

It would take roughly an academic year to teach the computer science underpinnings (data structures, a suitable implementation language, and the basics of parsing),
I believe that a reasonably knowledgeable programmer can get the necessary knowledge in about a week.  Of course, that does _not_ include becoming familiar with all the common parsers and their implications on a grammar.  In this case, it would just be one (1): Top down recursive descent, which is fairly easy.

another to review target hardware and implementation approaches (many of which have turned out to be dead ends), and a third to look at actual code generation and the fundamentals of optimisation.
Once the compiler deals with hardware and the O/S, real complexity starts.  In this area, the tutorial would present just enough to enable the reader to read the code and understand how things are getting done.  IOW, a fair amount of personal effort from the reader would be required.

Follow that by another year doing guided intermediate-level research, at the end of which the student might possibly have some appreciation of what he doesn't yet know hence might possibly be some use for something.
You're right but, to go beyond what is presented in the tutorial, Google is the student's best friend.

I attach one PDF which I have found overwhelmingly useful over the years: it's old, but is one of the best introductions to parsing and naive code generation.
Thank you.

appreciating that there are some "nice features" that could be put in a syntax that would make it unparseable is more useful than knowing every last detail of post-Chomsky linguistic research.
Yes, there are a lot of "nice features" that someone who is not familiar with language grammars would wish for and would make the grammar either incorrect or downright unparsable.  That's another area that the "student" will have to investigate on his/her own if they are inclined to know more.

But finally, I do have to emphasise that the hard work of writing a compiler is understanding the low-level stuff: you have to understand the implications of apparently-simple language features (e.g. exceptions) on the implementation and you have to have an intimate understanding of the target architecture.
Very true.  That's probably why a lot of compiler books sidestep the issue by creating a virtual CPU and compiling to that instead of a real CPU.

I want to make sure this is clear, I am fully aware that such a tutorial is a lot of work and, to keep it of reasonable size, many topics will not even be mentioned and some will only be explained to the extent that would enable an interested party to understand it fully only after some personal effort on their part.  In some cases, such as for someone who isn't familiar with the CPU's instruction's set and assembly language, a significant personal effort.

Thank you for the feedback.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Thaddy

  • Hero Member
  • *****
  • Posts: 14214
  • Probably until I exterminate Putin.
Re: Tutorial on creating a compiler
« Reply #17 on: December 16, 2019, 11:22:42 am »
The current curricula refer to
Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, Compilers: Principles, Techniques, and Tools, second edition, Pearson New International Edition, 2013, ISBN 9781292024349.

Which basically does the same theory as 40 years ago.

Then again: refer to the search options on this forum, you may be surprised.
« Last Edit: December 16, 2019, 11:25:39 am by Thaddy »
Specialize a type, not a var.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Tutorial on creating a compiler
« Reply #18 on: December 16, 2019, 11:30:18 am »
> You may not be familiar with "Per Brinch Hansen on Pascal Compilers"

I'm not familiar with that specific book but IMO Brinch Hansen is rather an unsung hero.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Thaddy

  • Hero Member
  • *****
  • Posts: 14214
  • Probably until I exterminate Putin.
Re: Tutorial on creating a compiler
« Reply #19 on: December 16, 2019, 11:34:57 am »
I'm not familiar with that specific book but IMO Brinch Hansen is rather an unsung hero.

MarkMLl
No, he is not! forgotten...

But you and others in this discussion should warn against "language theology": A computer language is a tool, a means of communication, NOT computer science.
« Last Edit: December 16, 2019, 11:37:03 am by Thaddy »
Specialize a type, not a var.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Tutorial on creating a compiler
« Reply #20 on: December 16, 2019, 11:44:00 am »
@Thaddy: I agree, and I think I did warn against obsessively studying "every last detail of post-Chomsky linguistic research".

Dijkstra had some scathing things to say about computer scientists. In his latter years he had scathing things to say about almost everybody...

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

440bx

  • Hero Member
  • *****
  • Posts: 3946
Re: Tutorial on creating a compiler
« Reply #21 on: December 16, 2019, 11:47:43 am »
But you and others in this discussion should warn against "language theology": A computer language is a tool, a means of communication, NOT computer science.
Leave theology out of this. Apparently you didn't notice that no one in this thread has made any claims about any language and, whether you believe it or not, writing a compiler is applied computer science.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Tutorial on creating a compiler
« Reply #22 on: December 16, 2019, 12:06:34 pm »
I believe that a reasonably knowledgeable programmer can get the necessary knowledge in about a week.

Unfortunately there is little reason to believe that the target audience are reasonably knowledgeable programmers.

Please don't get me wrong: they might be vastly knowledgeable and experienced in their own fields. But not necessarily this one.

As an aside, the Meta-2 PDF that I attached was originally published in 1964. When I started using it for teaching in about 1985 I thought that was ancient... and that was 35 years ago. But it's still useful.

MarkMLl
« Last Edit: December 16, 2019, 12:09:48 pm by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

Fred vS

  • Hero Member
  • *****
  • Posts: 3158
    • StrumPract is the musicians best friend
Re: Tutorial on creating a compiler
« Reply #23 on: December 16, 2019, 12:58:19 pm »
Hello.

You may take a look at mselang, the Pascal compiler of LLVM.

https://github.com/mse-org/mselang

It uses fpc to compile himself.

Fre;D
I use Lazarus 2.2.0 32/64 and FPC 3.2.2 32/64 on Debian 11 64 bit, Windows 10, Windows 7 32/64, Windows XP 32,  FreeBSD 64.
Widgetset: fpGUI, MSEgui, Win32, GTK2, Qt.

https://github.com/fredvs
https://gitlab.com/fredvs
https://codeberg.org/fredvs

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Tutorial on creating a compiler
« Reply #24 on: December 16, 2019, 03:15:00 pm »
Although a very big work
Yes, it is a lot of work.  IF (note the big IF) I decided to go ahead with it, I would definitely like and appreciate some help.
If possible, I'd like to contribute.

Quote
I would like to see such a tutorial.
Pleased to read that.  (normally, I'd say: pleased to hear it but, I read it not heard it ;) )

First, you should indicate one or two modern affordable books as reference about the needed deep theory.
Those outrageously priced old books are always cited here because they showed the whole process from toy language source code (text) through useless working binary.
Strangely, it doesn't really take much "deep theory".  A recursive descent parser is easy to understand and implement.  As far as book recommendations, the one book I really like, which I have recommended a number of times already is definitely _not_ affordable.
Then, if possible, your tutorial should show the whole process: Preprocessor, Compiler, Assembler, Linker, Loader, and Memory.
I don't have a design at this time.  At this time, I just want to find out if there is a reasonably significant amount of interest in such a tutorial.  That said, I don't see a preprocessor being there.  The reason being that, in my experience, preprocessors tend to make a program more difficult to understand because what the programmer reads isn't what the compiler sees.  IMO, while a preprocessor is desirable in some ways, it creates too many undesirable problems resulting in potentially very hard to understand and maintain code.
Again, if possible, your tutorial should include at least the types: Integer, Float, String and Date. And Procedures and Functions.
Integer, float, records, pointers, procedures, functions, strong typing and inline scopes: definitely. Exception handling: almost certainly (though maybe not in the first implementation.) String and Date: I see them more as features that belong in a library, not in the compiler.  Just the "definitely(s)" add up to a fair amount of work.
Beyond being imperative procedural, it would be amazing if a minimum usable imperative object-oriented paradigm (OOP) could be added.
I would definitely not include any OOP because that can really complicate matters.
The topics for the compiler part could be: Lexical Analyses, Syntax Analyses, Semantic Analyses, Intermediate Code Generation, Error Handling, Exception Handling, Code Optimization, and Target Code Generation.
While I don't have a design at this time, I lean towards error handling implemented the "Turbo Pascal" way, every error stops the compilation.  That implies, the compiler has to be _blazingly_ fast, which is doable.
Code optimization and generation: that's where the real "fun" starts.  I would focus on "hand written assembly like" size code that is reasonably fast.  I really like small executables, that is a _must_.
Thank you for your input. I appreciate it.

ETA:
Use or build?
Use not build.  Compiler-Compilers can be used to ensure the grammar doesn't contain ambiguities and/or conflicts.  To me, that's their primary appeal but, when it comes to build it, hand coded scanners and parsers are it.

ETA[2]
I would be interested in seeing such a tutorial, and potentially using it, time permitting...
Noted.  Thank you for the feedback.

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1106
Re: Tutorial on creating a compiler
« Reply #25 on: December 16, 2019, 03:17:21 pm »
It would be nice to known where this stages are in the actual compiler (units), for reference purposes. So it is more possible to have a look at the actual sources for study without understanding the whole compiler and to compare this with information from books, how it is working in the wild.   :D
I'm rather pleased to see someone asking for that kind of structure.  Every program (not just a compiler) should be written in such a way that it is very intuitive (if not obvious) to determine in which file/unit a logical "process" (groups of functions and procedures that accomplish a task) is found.
+1

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Tutorial on creating a compiler
« Reply #26 on: December 16, 2019, 03:45:07 pm »
I believe that a reasonably knowledgeable programmer can get the necessary knowledge in about a week.  Of course, that does _not_ include becoming familiar with all the common parsers and their implications on a grammar.  In this case, it would just be one (1): Top down recursive descent, which is fairly easy.

Noting that the uptake of recursive /descent/ (Grau, Waychoff, Schorre et al.) was comparatively slow, possibly because Wirth- acknowledged as having the knack of turning out compilers on demand in the days when this was poorly understood- favoured recursive /ascent/ (Irons et al.).

MarkMLl
« Last Edit: December 16, 2019, 06:24:59 pm by MarkMLl »
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

lucamar

  • Hero Member
  • *****
  • Posts: 4219
Re: Tutorial on creating a compiler
« Reply #27 on: December 16, 2019, 05:32:26 pm »
FWIW, I'd also like to see this tutorial made. It'd would be invaluable if it were made easy to understand for programmers with little formal training, whatever the level of experience needed.

I've tried to understand quite some books on compiler design and building and while I did learn quite a lot in the end I failed mainly because most (all?) of them seem to assume you're an university student/graduate of some kind and use difficult-to-parse terminology and lots of difficult-to-follow maths. :-[
Turbo Pascal 3 CP/M - Amstrad PCW 8256 (512 KB !!!) :P
Lazarus/FPC 2.0.8/3.0.4 & 2.0.12/3.2.0 - 32/64 bits on:
(K|L|X)Ubuntu 12..18, Windows XP, 7, 10 and various DOSes.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11384
  • FPC developer.
Re: Tutorial on creating a compiler
« Reply #28 on: December 16, 2019, 05:50:11 pm »
Download bytecode example compilers like P4..P6 ?

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Tutorial on creating a compiler
« Reply #29 on: December 16, 2019, 06:22:37 pm »
It's not just- in my opinion- that compiler texts assume that you're a graduate with good comprehension of maths. They appear to go out of their way to be obtuse, and to insist that "if you're not one of us (which you can be by studying this text) then you're not good enough to know about this stuff".

I'm not saying that the field is simple, but these days even "unusual" syntaxes like SQL can be parsed with standard tools (Lex, Yacc). I'm not so confident that FORTRAN and COBOL as understood in the 1950s are quite so tractable, and the necessity of considering those might have complicated the field quite a lot.

Provided that you avoid certain pitfalls, you can design a syntax that can be parsed by nice, simple, recursive descent. That's not to say that, for a ten billion-line unit, the runtime will be bearable, and a disproportionate amount of work has gone into improving the efficiency of such extreme cases.

Once you can parse your source, you can either emit a "pseudo-assembler" representation- see the Meta-2 document I contributed earlier- or you can build a parse tree on a statement-by-statement (or function-by-function) etc. basis.

Once you have a parse tree- see the Tree Meta link I gave earlier- you can recognise patterns that might not have been evident in the source and optimise them: for example if two constants are being added you can replace them by the sum at compilation time. From that you can start emitting code, recognising sequences amenable to peephole optimisation.

That does, of course, gloss over a lot of important details. First and foremost is the role of the symbol table: as the parser removes lexemes from the source it needs to record what's known about them, and it needs to refer to that record every time an entity is referred to- at the very least to determine representation (anywhere between a single bit and a string of arbitrary length) but also to resolve more obtuse issues ("do we actually know what type this is yet?").

As I've said, that glosses over a lot. But at this stage of the game I think it's worth trying to work out what really does need to be understood, and what is basically "smoke and mirrors" put in place by a computer science priesthood to demonstrate how smart they are and to perpetuate a closed shop.

If anybody thinks I'm being unfairly rude about computer science practitioners, I'd remind him of some of the things that Dijkstra wrote about electronic engineers :-)

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018