Recent

Author Topic: Performance Optimization with Xeon 36-Core 2.0 GHz for Lazarus / Free Pascal Pro  (Read 886 times)

dahapo8728

  • Newbie
  • Posts: 1
Hi everyone,

I’m currently experimenting with a Xeon 36-Core 2.0 GHz https://---.com/cpus-and-processors/xeon-36-core/2-0ghz workstation and wanted to ask if anyone here has experience optimizing Lazarus / Free Pascal builds for high-core-count CPUs.

A few specific questions:

How well does the Free Pascal compiler scale with many cores?

Are there compiler flags or build options that can improve parallel compilation or multi-threaded performance?

Has anyone benchmarked large Lazarus projects on multi-socket Xeon systems (for example, 2 × 18-core or 2 × 36-core setups)?

Are there any known bottlenecks when using thread pools or the TTask / TThread classes on systems with over 32 hardware threads?

I’m mostly building data-processing and simulation tools that could benefit from parallel execution, so I’m curious if there are best practices or pitfalls when running Lazarus on high-core Xeon hardware.

Thanks in advance for any advice or shared experiences!
« Last Edit: November 04, 2025, 03:18:03 pm by Martin_fr »

Thaddy

  • Hero Member
  • *****
  • Posts: 18363
  • Here stood a man who saw the Elbe and jumped it.
FPC scales as good as, say, C++ or plain C.
So you can use the benchmarks from these.
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11822
  • Debugger - SynEdit - and more
    • wiki
This refers to

1) How well will your multithreaded project perform, when you run your app

2) How fast can FPC compile your project (using all the cores)?

I guess both.

As for 1)
Quote
How well does the Free Pascal compiler scale with many cores?

Compiling a single unit, uses only 1 core, and does not get any faster.

Compiling multiple units, can be done in parallel, if there are units that don't require each other.  Within Lazarus, that happens only if you group your units into package. Package that don't depend on each other, will be build in parallel.

The project itself will then be build at the very end, as it will require all package, it will use a single core.

Building something like the IDE itself, uses on avg 4 to 6 cores (on my PC), but for brief times may use up to over 10. The code could be rearranged and optimized for an even quicker build, but that would be a lot of work.
« Last Edit: November 04, 2025, 03:24:17 pm by Martin_fr »

LV

  • Sr. Member
  • ****
  • Posts: 359
How well does the Free Pascal compiler scale with many cores?

What was at hand. CFD benchmark: Sod problem on a 1 million-cell grid.
Laptop with i7-12700H (2.30 GHz) processor.

1. C++ (gcc -O3 -march=native -fopenmp -ffast-math -funroll-loops).
Code: Text  [Select][+][-]
  1. Elapsed time: 12.859 seconds
  2. Max threads: 20
  3. Count: 4388
  4. OK
  5.  

2. FPC 3.2.2 (-O3 thread pool).
Code: Text  [Select][+][-]
  1. Elapsed time: 13.735 seconds
  2. NumThreads: 20
  3. count: 4388
  4. OK
  5.  

The difference is approximately 7%.

Thaddy

  • Hero Member
  • *****
  • Posts: 18363
  • Here stood a man who saw the Elbe and jumped it.
And all that without seeing any code? I doubt that very much.
Show us the code! (You dare not, because you fear you are proven wrong???)
Also show which version of C++ compiler you are using, the platform, etc. That alone can explain differences between ~10% margin.
This only shows FPC is well within limits of your "tests".
Average Pascal programmers are not very  good in translating C++ and average C++ programmers are even worse in translating to Pascal,
so show the code.

That said, even without code the ~7% margin is respectable either way.
And don't compare languages, but generated code: the language itself is just a minor detail, it is how well things get translated into machinecode.
You can also write a compiler for Brainfuck that is fast, nothing to do with the underlying high level language:
That is an endeering misconception by amateurs (or C++ programmers  :o ).

Note that e.g. the GNU C++ compiler is extremely well optimized, thanks to large backing. If the FPC compiler for your platform is within ~7% on your tested platform, that means it is very, very good.
But first, show the code.

(Dead giveaway:string intensive applications FreePascal probably wins....Note I am just as versed in C++ as in Pascal)

A good example is the LLVM backend, which does not rely on high level language, generates code that is exactly just as fast as all other LLVM backed compilers.
« Last Edit: November 04, 2025, 06:51:12 pm by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

LV

  • Sr. Member
  • ****
  • Posts: 359
And all that without seeing any code? I doubt that very much.
Show us the code! (You dare not, because you fear you are proven wrong???)

I rode my bike. Now I will take a shower. Then I will send you the source codes.  ;)

LV

  • Sr. Member
  • ****
  • Posts: 359
The Cauchy problem for the Euler equations was solved on a unit interval. Both codes used a simple first-order approximation scheme with splitting into physical processes.

С++: gcc 14.2.0 compiler; options -O3 -march=native -fopenmp -ffast-math -funroll-loops

Pascal: fpc 3.2.2 compiler; options -O3

P.S. Windows 11
« Last Edit: November 04, 2025, 09:43:16 pm by LV »

Thaddy

  • Hero Member
  • *****
  • Posts: 18363
  • Here stood a man who saw the Elbe and jumped it.
I see you have used openmp for the threading. That is cheating since it is not part of standard C++. I had to install the openmp sources.
You also compile the Pascal code with the least efficient threading model possible: TThread instances, with TThread's extremely conservative approach.
Note once threads are running, it should not matter much since there is not much context switching.
I will have to rewrite the code a bit to level the playing field.
After that, I will use fpc-llvm and clang++ 18.1.3 (both using the same llvm infra)  and report back.
« Last Edit: November 05, 2025, 08:08:50 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12536
  • FPC developer.

Thaddy

  • Hero Member
  • *****
  • Posts: 18363
  • Here stood a man who saw the Elbe and jumped it.
I know, but never lead to fruition.
Anyway, a more lightweight approach to multithreading compared to the behemoth TThread is not THAT difficult: it is all in system.
« Last Edit: November 05, 2025, 11:38:28 am by Thaddy »
Due to censorship, I changed this to "Nelly the Elephant". Keeps the message clear.

 

TinyPortal © 2005-2018