Recent

Author Topic: Found this fairly complete/working Object Pascal to LLVM IR compiler on Github  (Read 4344 times)

Akira1364

  • Sr. Member
  • ****
  • Posts: 382
Thought this might be of interest to the compiler devs. The link is here:
https://github.com/75557032/llvmpas

Note that the author seems to be a native Chinese speaker as most of the comments are written in Chinese, but the code itself is all clean "Borland style" stuff with clear English naming so it's still quite easy to get an idea of what's going on.

To be clear, it doesn't just implement a simple procedural Pascal like you might be expecting at first, but full Delphi/FPC style Object Pascal with class methods/interfaces/exceptions/e.t.c, complete with its own System unit. Overall I'd say it's at around a just-before-Delphi-2009 syntax level.

Here's an example of some IR output from using it to compile its System unit (with JavaScript code-embedding tags just for the "C-style" syntax highlighting):

Code: Javascript  [Select]
  1. define fastcc i8* @System.TObject.ClassType(i8* %Self)
  2. {
  3.         %Self.addr = alloca i8*, align 4
  4.  
  5.         %Result.addr = alloca i8*, align 4
  6.  
  7.         %.1 = bitcast i8* %Self to i8**
  8.         %.2 = load i8*, i8** %.1
  9.         store i8* %.2, i8** %Result.addr
  10.         br label %.quit
  11. .quit:
  12.         %.3 = load i8*, i8** %Result.addr
  13.         ret i8* %.3
  14. }

I think this shows that it is indeed very feasible to fully represent Object Pascal in LLVM, considering the project seems to be the work of just one person, and makes me wonder what exactly the issues with the FPC LLVM backend were?
« Last Edit: November 03, 2018, 11:42:54 pm by Akira1364 »

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 6617
Afaik versioning.

Thaddy

  • Hero Member
  • *****
  • Posts: 7182
It is not only Borland style stuff: it seems to be written in FPC and made compatible with Delphi.
I wonder if some of the llvm code from Jonas was used here.
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

Akira1364

  • Sr. Member
  • ****
  • Posts: 382
It is not only Borland style stuff: it seems to be written in FPC and made compatible with Delphi.
I wonder if some of the llvm code from Jonas was used here.

When I said Borland style I meant the coding style. I'm aware it's written in FPC, as I was able to compile it with FPC, and it has LPI project files.

I really don't think it re-uses any code from the FPC codebase though.

Thaddy

  • Hero Member
  • *****
  • Posts: 7182
The llvm codebase is already merged in trunk. It even works a bit but not on every platform.
Note from the date I can see it is later than llvm for fpc. Hence my remark.
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

BeniBela

  • Hero Member
  • *****
  • Posts: 634
    • homepage
If FPC would only create LLVM IR rather than assembly, it could be much simpler, would it not?

Then you could save a lot of development time. And it would be more stable, since llvm probably has more users, so it is better tested.

Thaddy

  • Hero Member
  • *****
  • Posts: 7182
FPC (the llvm branch) creates llvm ir. What makes you think otherwise? A.f.a.i.k that is the goal. The llvm toolchain should do the optimizations.
« Last Edit: October 26, 2018, 06:48:46 pm by Thaddy »
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 6617
It happens there were some mails from Jonas on the subject last week, and he named two things (this does not include platform specific issues):

- exceptions (dwarf vs setjmp)
- LLVM assumes globals are non-volatile which would probably break existing FPC/Delphi code.

mse

  • Sr. Member
  • ****
  • Posts: 286
FPC (the llvm branch) creates llvm ir.
AFAIK Free Pascal LLVM backend produces LLVM assembler text not LLVM bitcode, at least it was so some time ago. MSElang on the other hand produces LLVM bitcode directly. The advantage of using LLVM bitcode is that contrary to LLVM assembler text the bitcode format is backwards compatible between LLVM versions.

Akira1364

  • Sr. Member
  • ****
  • Posts: 382
The advantage of using LLVM bitcode is that contrary to LLVM assembler text the bitcode format is backwards compatible between LLVM versions.

LLVM IR (or as you called it "assembler text") has been a stable format for five or six years now. The last time there was a breaking change was around version 3.9. LLVM is currently at version 8. In my opinion targeting the lower-level bitcode format in 2018 is just making unnecessary work for yourself.

- exceptions (dwarf vs setjmp)
- LLVM assumes globals are non-volatile which would probably break existing FPC/Delphi code.

As far as the exceptions, I think the answer nowadays is "obviously dwarf." Setjmp used the way FPC uses it is a glaring bit of legacy functionality you won't find in any other language.

Regarding the globals, while that's true by default, the idea is that you use the various builtin IR "attributes" to specify how particular sections/functions/variables in the code should be handled. Consider that Clang is fully capable of representing all of C and C++ in IR, and there's plenty of other languages that do support global variables in many cases and have no issue targeting it.

There's no way such an obvious unsolvable problem would be allowed to exist for very long in the first place.
« Last Edit: November 03, 2018, 11:45:11 pm by Akira1364 »

Thaddy

  • Hero Member
  • *****
  • Posts: 7182
Setjmp used the way the way FPC uses it is a glaring bit of legacy functionality you won't find in any other language.
Except in e.g. plain C and PowerBasic. This use has also C heritage, not necessary Pascal only.
It is not legacy perse: it is used a lot in embedded systems for the same goals because it is lightweight.

Otherwise I agree with most if indeed fpc-llvm can not yet produce llvm ir.

regarding the compiler that is discussed above:
Note you have to make some changes for it to be cross-platform and remove some dependencies on windows unit (trivial):
1) fileutils.pas needs some change
2)the filrerunner run.pas needs some changes to be cross-platform

Otherwise the code is already cross-platform
« Last Edit: October 28, 2018, 08:57:52 am by Thaddy »
inline variables like in D10.3 are a bit like Brexit: if you are given the wrong information it sounds like a good idea. Every kid loves candy, but it makes you fat and your teeth will disappear.

mse

  • Sr. Member
  • ****
  • Posts: 286
LLVM IR (or as you called it "assembler text") has been a stable format for five or six years now. The last time there was a breaking change was around version 3.9. LLVM is currently at version 8.
IIRC I started with 3.0.
Quote
In my opinion targeting the lower-level bitcode format in 2018 is just making unnecessary work for yourself.
And improves compiling performance. As LLVM is terribly slow that is a good thing IMHO.

Akira1364

  • Sr. Member
  • ****
  • Posts: 382
And improves compiling performance. As LLVM is terribly slow that is a good thing IMHO.

It's not, though. Using the "llvm-stress" tool to generate a 50,000 line random IR file, then calling "llc thefile.ll -o file.s" and lastly "clang thefile.o -o thefile.exe" is like a 20 second process at most.

Keep in mind we're not talking about compiling C or C++ code here.

mse

  • Sr. Member
  • ****
  • Posts: 286
I don't understand what you want to say, my English knowledge is very limited, please explain.

Laksen

  • Hero Member
  • *****
  • Posts: 605
    • J-Software
It's not, though. Using the "llvm-stress" tool to generate a 50,000 line random IR file, then calling "llc thefile.ll -o file.s" and lastly "clang thefile.o -o thefile.exe" is like a 20 second process at most.

Keep in mind we're not talking about compiling C or C++ code here.
I would like to hear what you think is slow then. 20 seconds for 50000 lines is an eternity