Lazarus

Free Pascal => FPC development => Topic started by: Akira1364 on October 25, 2018, 06:10:09 pm

Title: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Akira1364 on October 25, 2018, 06:10:09 pm
Thought this might be of interest to the compiler devs. The link is here:
https://github.com/75557032/llvmpas (https://github.com/75557032/llvmpas)

Note that the author seems to be a native Chinese speaker as most of the comments are written in Chinese, but the code itself is all clean "Borland style" stuff with clear English naming so it's still quite easy to get an idea of what's going on.

To be clear, it doesn't just implement a simple procedural Pascal like you might be expecting at first, but full Delphi/FPC style Object Pascal with class methods/interfaces/exceptions/e.t.c, complete with its own System unit. Overall I'd say it's at around a just-before-Delphi-2009 syntax level.

Here's an example of some IR output from using it to compile its System unit (with JavaScript code-embedding tags just for the "C-style" syntax highlighting):

Code: Javascript  [Select][+][-]
  1. define fastcc i8* @System.TObject.ClassType(i8* %Self)
  2. {
  3.         %Self.addr = alloca i8*, align 4
  4.  
  5.         %Result.addr = alloca i8*, align 4
  6.  
  7.         %.1 = bitcast i8* %Self to i8**
  8.         %.2 = load i8*, i8** %.1
  9.         store i8* %.2, i8** %Result.addr
  10.         br label %.quit
  11. .quit:
  12.         %.3 = load i8*, i8** %Result.addr
  13.         ret i8* %.3
  14. }

I think this shows that it is indeed very feasible to fully represent Object Pascal in LLVM, considering the project seems to be the work of just one person, and makes me wonder what exactly the issues with the FPC LLVM backend were?
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: marcov on October 25, 2018, 06:24:30 pm
Afaik versioning.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Thaddy on October 26, 2018, 09:34:16 am
It is not only Borland style stuff: it seems to be written in FPC and made compatible with Delphi.
I wonder if some of the llvm code from Jonas was used here.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Akira1364 on October 26, 2018, 06:24:03 pm
It is not only Borland style stuff: it seems to be written in FPC and made compatible with Delphi.
I wonder if some of the llvm code from Jonas was used here.

When I said Borland style I meant the coding style. I'm aware it's written in FPC, as I was able to compile it with FPC, and it has LPI project files.

I really don't think it re-uses any code from the FPC codebase though.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Thaddy on October 26, 2018, 06:41:57 pm
The llvm codebase is already merged in trunk. It even works a bit but not on every platform.
Note from the date I can see it is later than llvm for fpc. Hence my remark.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: BeniBela on October 26, 2018, 06:43:48 pm
If FPC would only create LLVM IR rather than assembly, it could be much simpler, would it not?

Then you could save a lot of development time. And it would be more stable, since llvm probably has more users, so it is better tested.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Thaddy on October 26, 2018, 06:45:32 pm
FPC (the llvm branch) creates llvm ir. What makes you think otherwise? A.f.a.i.k that is the goal. The llvm toolchain should do the optimizations.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: marcov on October 26, 2018, 08:40:24 pm
It happens there were some mails from Jonas on the subject last week, and he named two things (this does not include platform specific issues):

- exceptions (dwarf vs setjmp)
- LLVM assumes globals are non-volatile which would probably break existing FPC/Delphi code.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: mse on October 27, 2018, 08:36:34 am
FPC (the llvm branch) creates llvm ir.
AFAIK Free Pascal LLVM backend produces LLVM assembler text not LLVM bitcode, at least it was so some time ago. MSElang on the other hand produces LLVM bitcode directly. The advantage of using LLVM bitcode is that contrary to LLVM assembler text the bitcode format is backwards compatible between LLVM versions.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Akira1364 on October 28, 2018, 02:46:48 am
The advantage of using LLVM bitcode is that contrary to LLVM assembler text the bitcode format is backwards compatible between LLVM versions.

LLVM IR (or as you called it "assembler text") has been a stable format for five or six years now. The last time there was a breaking change was around version 3.9. LLVM is currently at version 8. In my opinion targeting the lower-level bitcode format in 2018 is just making unnecessary work for yourself.

- exceptions (dwarf vs setjmp)
- LLVM assumes globals are non-volatile which would probably break existing FPC/Delphi code.

As far as the exceptions, I think the answer nowadays is "obviously dwarf." Setjmp used the way FPC uses it is a glaring bit of legacy functionality you won't find in any other language.

Regarding the globals, while that's true by default, the idea is that you use the various builtin IR "attributes" to specify how particular sections/functions/variables in the code should be handled. Consider that Clang is fully capable of representing all of C and C++ in IR, and there's plenty of other languages that do support global variables in many cases and have no issue targeting it.

There's no way such an obvious unsolvable problem would be allowed to exist for very long in the first place.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Thaddy on October 28, 2018, 07:43:13 am
Setjmp used the way the way FPC uses it is a glaring bit of legacy functionality you won't find in any other language.
Except in e.g. plain C and PowerBasic. This use has also C heritage, not necessary Pascal only.
It is not legacy perse: it is used a lot in embedded systems for the same goals because it is lightweight.

Otherwise I agree with most if indeed fpc-llvm can not yet produce llvm ir.

regarding the compiler that is discussed above:
Note you have to make some changes for it to be cross-platform and remove some dependencies on windows unit (trivial):
1) fileutils.pas needs some change
2)the filrerunner run.pas needs some changes to be cross-platform

Otherwise the code is already cross-platform
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: mse on October 28, 2018, 08:01:42 am
LLVM IR (or as you called it "assembler text") has been a stable format for five or six years now. The last time there was a breaking change was around version 3.9. LLVM is currently at version 8.
IIRC I started with 3.0.
Quote
In my opinion targeting the lower-level bitcode format in 2018 is just making unnecessary work for yourself.
And improves compiling performance. As LLVM is terribly slow that is a good thing IMHO.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Akira1364 on October 29, 2018, 06:51:12 am
And improves compiling performance. As LLVM is terribly slow that is a good thing IMHO.

It's not, though. Using the "llvm-stress" tool to generate a 50,000 line random IR file, then calling "llc thefile.ll -o file.s" and lastly "clang thefile.o -o thefile.exe" is like a 20 second process at most.

Keep in mind we're not talking about compiling C or C++ code here.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: mse on October 29, 2018, 08:07:46 am
I don't understand what you want to say, my English knowledge is very limited, please explain.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Laksen on October 29, 2018, 09:58:28 am
It's not, though. Using the "llvm-stress" tool to generate a 50,000 line random IR file, then calling "llc thefile.ll -o file.s" and lastly "clang thefile.o -o thefile.exe" is like a 20 second process at most.

Keep in mind we're not talking about compiling C or C++ code here.
I would like to hear what you think is slow then. 20 seconds for 50000 lines is an eternity
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: mse on October 29, 2018, 12:29:44 pm
Without debuginfo and -O3 MSElang compiles 7778 lines in 26 units in 0.064 seconds to LLVM bitcode. In order to build the binary LLVM needs 1.146s.
With debuginfo and no optimization the times are 0.095s for MSElang and 0.594s for LLVM.
A make after a build is 0.014s for MSElang and 0.598s for LLVM.
Using unit object files instead to use unit bitcode files and the LLVM linker total make time after a build is 0.059s.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: PascalDragon on October 31, 2018, 12:10:55 pm
If FPC would only create LLVM IR rather than assembly, it could be much simpler, would it not?

Then you could save a lot of development time. And it would be more stable, since llvm probably has more users, so it is better tested.
No it would not save a lot of development time, because FPC does support a different set of platforms than LLVM supports (e.g. FPC supports i8086, AVR and m68k), also LLVM is planning to kick out powerpc-darwin support. Not to mention cross compiling: e.g. it is in principle possible to cross compile from m68k-amiga to x86_64-win64, cause we're not using any third party utilities for x86_64-win64 as long as the internal assembler and linker are used (which is the default). With LLVM that would not be possible.
Not to mention that for some people (e.g. Florian) the most enjoyable part of the compiler is the code generator (and optimizer).
Also for i386 there would be the problem that LLVM does not support Borland's register calling convention.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Akira1364 on November 01, 2018, 07:54:42 am
It's not, though. Using the "llvm-stress" tool to generate a 50,000 line random IR file, then calling "llc thefile.ll -o file.s" and lastly "clang thefile.o -o thefile.exe" is like a 20 second process at most.

Keep in mind we're not talking about compiling C or C++ code here.
I would like to hear what you think is slow then. 20 seconds for 50000 lines is an eternity

Well, that's three steps (calling three different programs on the command line, one of which is generating the source to begin with), and a pretty loose estimate. It's much faster than that on my machine.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: dtoffe on November 02, 2018, 02:48:01 pm
    Slightly off-topic, but given that there are so many people here with good knowledge of compilers, I'll ask anyway.
    Suppose I want to fork this project to play a bit modifying the language features. I'll be tweaking the language grammar, so I would want to use some tool like COCO/R or the Gold Parser and separate the LLVM IR generation code from the parser generator generated code.
    What compiler generator would be better in these case, or what tradeoff would they involve ? Am I missing something else ? This is just for learning how this all works.

Thanks in advance.

Daniel
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Leledumbo on November 03, 2018, 07:15:00 am
What compiler generator would be better in these case
Neither.
or what tradeoff would they involve ?
Same tradeoff between RDP and LALR parsers.
Am I missing something else ? This is just for learning how this all works.
If you want to learn, do as the author does: do NOT use any generators. Only use them if you really don't have time but needs to make something quickly, they give no lesson in understanding how parser works.
The code itself is modular enough already, though. AST is not coupled to either lexer or parser, and only codegen depends on AST in one direction. So theoretically provided you can generate the same TModule structure, you're good to go.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Thaddy on November 03, 2018, 11:54:30 am
Best place to start has always been here: https://compilers.iecc.com/crenshaw/ contains pascal sources.
There's also a forth version that is even easier to turn into a real compiler, at the cost that many people (not me) find forth-like languages hard to use.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: PeterBB on November 07, 2018, 06:03:58 pm
Hi Akira,

I was able to compile this compiler on Linux by hacking the file date routines in fileutils.pas

However, I find the generated IR code will not compile. The problem seems to be the IR syntax change a few years back. The GitHub is around three years old, and the code seems to predate that change.

....
Here's an example of some IR output from using it to compile its System unit (with JavaScript code-embedding tags just for the "C-style" syntax highlighting):

Code: Javascript  [Select][+][-]
  1. define fastcc i8* @System.TObject.ClassType(i8* %Self)
  2. {
  3.         %Self.addr = alloca i8*, align 4
  4.  
  5.         %Result.addr = alloca i8*, align 4
  6.  
  7.         %.1 = bitcast i8* %Self to i8**
  8.         %.2 = load i8*, i8** %.1
  9.         store i8* %.2, i8** %Result.addr
  10.         br label %.quit
  11. .quit:
  12.         %.3 = load i8*, i8** %Result.addr
  13.         ret i8* %.3
  14. }


Regarding the code above, how did you arrive at the correct syntax for the load statements?  For me, the compiler produces this;

Code: Javascript  [Select][+][-]
  1. define fastcc i8* @Sys.TObject.ClassType(i8* %Self)
  2. {
  3.         %Self.addr = alloca i8*, align 4
  4.  
  5.         %Result.addr = alloca i8*, align 4
  6.  
  7.         %.1 = bitcast i8* %Self to i8**
  8.         %.2 = load i8** %.1
  9.         store i8* %.2, i8** %Result.addr
  10.         br label %.quit
  11. .quit:
  12.         %.3 = load i8** %Result.addr
  13.         ret i8* %.3
  14. }
  15.  

from the relevant piece of system.pas. The syntax is incorrect and it will not compile. The load statements for %.2 & %.3 require an additional comma and an extra type. Syntax is also incorrect for getelementptr calls ...

Code: Text  [Select][+][-]
  1. LLVM-Pas$ clang-6.0 -c -o sys.obj sys.ll
  2. sys.ll:28:55: error: expected comma after getelementptr's type
  3.   i8* bitcast(i8** getelementptr(%Sys.TObject.$vmt.t* @Sys.TObject.$vmt, i32 0, i32 19) to i8*)
  4.                                                       ^
  5. 1 error generated.
  6.  


Is there an updated version somewhere?

Regards,
Peter
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Akira1364 on November 08, 2018, 05:18:52 am
Regarding the code above, how did you arrive at the correct syntax for the load statements?

I actually spent a few hours modifying the code emitter to output what LLVM currently accepts after I found the repository. It's written in such a way that it wasn't particularly difficult to just search and replace the necessary format string patterns. Forgot to mention that in my original comment.

I don't have that version available to me right now, but I should be able to upload it sometime tomorrow if you want.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Thaddy on November 08, 2018, 08:24:31 am
I would be very interested.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: PeterBB on November 08, 2018, 04:04:17 pm
That would be most welcome.

BTW, trying to compile a small program, it crashed in TListOp.Replace in unit inst.pas. "Count" was equal to zero.  Don't know how serious that is, but as TListOp.Replace just seems to be doing a tidy up, maybe it can just exit if count = zero, at least as a temporary fix? Seems to work anyway.

ISTM that for serious work, and to stand any chance of being self-hosting, the compiler needs to be able to compile Classes & Sysutils, or at least the routines therein that it uses.

Cheers,
Peter
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Akira1364 on November 11, 2018, 03:44:14 am
My apologies for the delay! I've attached to this comment my (now somewhat heavily modified, formatted, e.t.c) version of (most of) the repo. I noticed a variety of places that the original author had been specifying alignments that would definitely only work on 32-bit, as well as various places where they were assuming the size of a pointer was always 4 bytes (whereas it's of course 8 on 64-bit systems), so I thought I'd take the time to correct that as well and add a bunch of ifdefs to fully account for 32 vs 64 bit in various areas the best I could.

I also added a few extra command line options to it (you can now specify -Sys specifically for the system unit, so that it won't try to load the system unit, for one.)

Note that overall I've found the code generator for this compiler is rather more advanced than the understanding its parser currently has of Pascal (which is better than the reverse of that, I'd say.) There are a number of things that should in theory compile (at the Pascal level, not the LLVM level) but don't.

As to the "LLVM level", secondly note that I've commented out all the code that inserts exception handling stuff into methods, as I just didn't have the time to fully update all of that in a cross-platform way, and it would not compile in LLVM versions higher than 3.7.

I will say though that again the code generator is definitely sufficiently advanced enough with a solid enough foundation to be a good platform for additional work, whether that means by attaching it to a better parser (FCL-Passrc perhaps?) or by simply finishing/improving the existing one.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: dtoffe on November 12, 2018, 04:25:01 pm
    Thanks @Leledumbo, I understand learning compilers is best done coding from scratch, but even if my compiler classes were many years ago and a refresher is needed, I'm more interested in learning something about LLVM code generation, of which I know nothing. I've decided to pick COCO/R, I'm looking at the C# implementation to learn how it works.
    Thanks @Thaddy, beginning the part 4 now, the book is pretty easy to read and follow.

Thanks,

Daniel
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: Leledumbo on November 14, 2018, 01:10:22 am
I'm more interested in learning something about LLVM code generation, of which I know nothing.
I suggest coding the LLVM IR by hand first, it's not like your usual assembly but it's not far either. Just familiarize yourself with the instruction set first, make a working program using handcoded LLVM IR that covers up to what you want to be able to do later with your own compiler. Only then you should move focus to the compiler.
Title: Re: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github
Post by: dtoffe on November 14, 2018, 03:56:32 pm
    @Leledumbo: You mean, begin coding programs "only" in LLVM IR, just as if it would be a standalone assembly program ?  That's a great idea I think, and probably would not have come up with it myself.
    Thanks for your help !!

Daniel
TinyPortal © 2005-2018