* * *

Author Topic: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github  (Read 3588 times)

mse

  • Sr. Member
  • ****
  • Posts: 286
Without debuginfo and -O3 MSElang compiles 7778 lines in 26 units in 0.064 seconds to LLVM bitcode. In order to build the binary LLVM needs 1.146s.
With debuginfo and no optimization the times are 0.095s for MSElang and 0.594s for LLVM.
A make after a build is 0.014s for MSElang and 0.598s for LLVM.
Using unit object files instead to use unit bitcode files and the LLVM linker total make time after a build is 0.059s.
« Last Edit: October 29, 2018, 01:33:11 pm by mse »

PascalDragon

  • Full Member
  • ***
  • Posts: 203
  • Compiler Developer
If FPC would only create LLVM IR rather than assembly, it could be much simpler, would it not?

Then you could save a lot of development time. And it would be more stable, since llvm probably has more users, so it is better tested.
No it would not save a lot of development time, because FPC does support a different set of platforms than LLVM supports (e.g. FPC supports i8086, AVR and m68k), also LLVM is planning to kick out powerpc-darwin support. Not to mention cross compiling: e.g. it is in principle possible to cross compile from m68k-amiga to x86_64-win64, cause we're not using any third party utilities for x86_64-win64 as long as the internal assembler and linker are used (which is the default). With LLVM that would not be possible.
Not to mention that for some people (e.g. Florian) the most enjoyable part of the compiler is the code generator (and optimizer).
Also for i386 there would be the problem that LLVM does not support Borland's register calling convention.

Akira1364

  • Sr. Member
  • ****
  • Posts: 381
It's not, though. Using the "llvm-stress" tool to generate a 50,000 line random IR file, then calling "llc thefile.ll -o file.s" and lastly "clang thefile.o -o thefile.exe" is like a 20 second process at most.

Keep in mind we're not talking about compiling C or C++ code here.
I would like to hear what you think is slow then. 20 seconds for 50000 lines is an eternity

Well, that's three steps (calling three different programs on the command line, one of which is generating the source to begin with), and a pretty loose estimate. It's much faster than that on my machine.

dtoffe

  • New member
  • *
  • Posts: 13
    Slightly off-topic, but given that there are so many people here with good knowledge of compilers, I'll ask anyway.
    Suppose I want to fork this project to play a bit modifying the language features. I'll be tweaking the language grammar, so I would want to use some tool like COCO/R or the Gold Parser and separate the LLVM IR generation code from the parser generator generated code.
    What compiler generator would be better in these case, or what tradeoff would they involve ? Am I missing something else ? This is just for learning how this all works.

Thanks in advance.

Daniel

Leledumbo

  • Hero Member
  • *****
  • Posts: 7971
  • Programming + Glam Metal + Tae Kwon Do = Me
What compiler generator would be better in these case
Neither.
or what tradeoff would they involve ?
Same tradeoff between RDP and LALR parsers.
Am I missing something else ? This is just for learning how this all works.
If you want to learn, do as the author does: do NOT use any generators. Only use them if you really don't have time but needs to make something quickly, they give no lesson in understanding how parser works.
The code itself is modular enough already, though. AST is not coupled to either lexer or parser, and only codegen depends on AST in one direction. So theoretically provided you can generate the same TModule structure, you're good to go.

Thaddy

  • Hero Member
  • *****
  • Posts: 7149
Best place to start has always been here: https://compilers.iecc.com/crenshaw/ contains pascal sources.
There's also a forth version that is even easier to turn into a real compiler, at the cost that many people (not me) find forth-like languages hard to use.
« Last Edit: November 03, 2018, 12:00:36 pm by Thaddy »
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

PeterBB

  • Newbie
  • Posts: 2
Hi Akira,

I was able to compile this compiler on Linux by hacking the file date routines in fileutils.pas

However, I find the generated IR code will not compile. The problem seems to be the IR syntax change a few years back. The GitHub is around three years old, and the code seems to predate that change.

....
Here's an example of some IR output from using it to compile its System unit (with JavaScript code-embedding tags just for the "C-style" syntax highlighting):

Code: Javascript  [Select]
  1. define fastcc i8* @System.TObject.ClassType(i8* %Self)
  2. {
  3.         %Self.addr = alloca i8*, align 4
  4.  
  5.         %Result.addr = alloca i8*, align 4
  6.  
  7.         %.1 = bitcast i8* %Self to i8**
  8.         %.2 = load i8*, i8** %.1
  9.         store i8* %.2, i8** %Result.addr
  10.         br label %.quit
  11. .quit:
  12.         %.3 = load i8*, i8** %Result.addr
  13.         ret i8* %.3
  14. }


Regarding the code above, how did you arrive at the correct syntax for the load statements?  For me, the compiler produces this;

Code: Javascript  [Select]
  1. define fastcc i8* @Sys.TObject.ClassType(i8* %Self)
  2. {
  3.         %Self.addr = alloca i8*, align 4
  4.  
  5.         %Result.addr = alloca i8*, align 4
  6.  
  7.         %.1 = bitcast i8* %Self to i8**
  8.         %.2 = load i8** %.1
  9.         store i8* %.2, i8** %Result.addr
  10.         br label %.quit
  11. .quit:
  12.         %.3 = load i8** %Result.addr
  13.         ret i8* %.3
  14. }
  15.  

from the relevant piece of system.pas. The syntax is incorrect and it will not compile. The load statements for %.2 & %.3 require an additional comma and an extra type. Syntax is also incorrect for getelementptr calls ...

Code: Text  [Select]
  1. LLVM-Pas$ clang-6.0 -c -o sys.obj sys.ll
  2. sys.ll:28:55: error: expected comma after getelementptr's type
  3.   i8* bitcast(i8** getelementptr(%Sys.TObject.$vmt.t* @Sys.TObject.$vmt, i32 0, i32 19) to i8*)
  4.                                                       ^
  5. 1 error generated.
  6.  


Is there an updated version somewhere?

Regards,
Peter

Akira1364

  • Sr. Member
  • ****
  • Posts: 381
Regarding the code above, how did you arrive at the correct syntax for the load statements?

I actually spent a few hours modifying the code emitter to output what LLVM currently accepts after I found the repository. It's written in such a way that it wasn't particularly difficult to just search and replace the necessary format string patterns. Forgot to mention that in my original comment.

I don't have that version available to me right now, but I should be able to upload it sometime tomorrow if you want.
« Last Edit: November 08, 2018, 05:21:49 am by Akira1364 »

Thaddy

  • Hero Member
  • *****
  • Posts: 7149
I would be very interested.
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

PeterBB

  • Newbie
  • Posts: 2
That would be most welcome.

BTW, trying to compile a small program, it crashed in TListOp.Replace in unit inst.pas. "Count" was equal to zero.  Don't know how serious that is, but as TListOp.Replace just seems to be doing a tidy up, maybe it can just exit if count = zero, at least as a temporary fix? Seems to work anyway.

ISTM that for serious work, and to stand any chance of being self-hosting, the compiler needs to be able to compile Classes & Sysutils, or at least the routines therein that it uses.

Cheers,
Peter

Akira1364

  • Sr. Member
  • ****
  • Posts: 381
My apologies for the delay! I've attached to this comment my (now somewhat heavily modified, formatted, e.t.c) version of (most of) the repo. I noticed a variety of places that the original author had been specifying alignments that would definitely only work on 32-bit, as well as various places where they were assuming the size of a pointer was always 4 bytes (whereas it's of course 8 on 64-bit systems), so I thought I'd take the time to correct that as well and add a bunch of ifdefs to fully account for 32 vs 64 bit in various areas the best I could.

I also added a few extra command line options to it (you can now specify -Sys specifically for the system unit, so that it won't try to load the system unit, for one.)

Note that overall I've found the code generator for this compiler is rather more advanced than the understanding its parser currently has of Pascal (which is better than the reverse of that, I'd say.) There are a number of things that should in theory compile (at the Pascal level, not the LLVM level) but don't.

As to the "LLVM level", secondly note that I've commented out all the code that inserts exception handling stuff into methods, as I just didn't have the time to fully update all of that in a cross-platform way, and it would not compile in LLVM versions higher than 3.7.

I will say though that again the code generator is definitely sufficiently advanced enough with a solid enough foundation to be a good platform for additional work, whether that means by attaching it to a better parser (FCL-Passrc perhaps?) or by simply finishing/improving the existing one.
« Last Edit: November 11, 2018, 05:20:34 am by Akira1364 »

dtoffe

  • New member
  • *
  • Posts: 13
    Thanks @Leledumbo, I understand learning compilers is best done coding from scratch, but even if my compiler classes were many years ago and a refresher is needed, I'm more interested in learning something about LLVM code generation, of which I know nothing. I've decided to pick COCO/R, I'm looking at the C# implementation to learn how it works.
    Thanks @Thaddy, beginning the part 4 now, the book is pretty easy to read and follow.

Thanks,

Daniel

Leledumbo

  • Hero Member
  • *****
  • Posts: 7971
  • Programming + Glam Metal + Tae Kwon Do = Me
I'm more interested in learning something about LLVM code generation, of which I know nothing.
I suggest coding the LLVM IR by hand first, it's not like your usual assembly but it's not far either. Just familiarize yourself with the instruction set first, make a working program using handcoded LLVM IR that covers up to what you want to be able to do later with your own compiler. Only then you should move focus to the compiler.

dtoffe

  • New member
  • *
  • Posts: 13
    @Leledumbo: You mean, begin coding programs "only" in LLVM IR, just as if it would be a standalone assembly program ?  That's a great idea I think, and probably would not have come up with it myself.
    Thanks for your help !!

Daniel

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus