Recent

Author Topic: Benchmarks  (Read 2780 times)

turrican

  • Full Member
  • ***
  • Posts: 140
  • Pascal is my life.
    • Homepage
Benchmarks
« on: December 04, 2024, 08:32:51 pm »
Hi!

Take a look at this wonderful project:
https://github.com/bddicken/languages which measures the execution speeds of different programming languages.
You can view the results on the following page: https://benjdd.com/languages
A couple of weeks ago, I added the Fibonacci and Loops benchmarks for FPC ported directly from C.

Here you can find the benchmarks results : https://x.com/i/status/1863977678690541570

Pascal execution times on my machine are :

- Benchmark "loops" - "./code 40" is 2.158s
- Benchmark "fibonacci" - "./code 40" is 879ms

Sadly FPC is not fast as I expected... I'm very surprised that it doesn't run faster... Delphi (Windows) has always had a better compiler and has generated much more optimal code than FPC.

I would appreciate it if someone could help optimize this code or use a more refined compiler.

turrican

  • Full Member
  • ***
  • Posts: 140
  • Pascal is my life.
    • Homepage
Re: Benchmarks
« Reply #1 on: December 04, 2024, 08:35:36 pm »
Btw this is the loops code :

fpc -O3

Code: Pascal  [Select][+][-]
  1. program LoopsPascal;
  2.  
  3. uses
  4.   SysUtils;
  5.  
  6. var
  7.   u, r, i, j: Int32;
  8.   a: array[0..9999] of Int32;
  9.  
  10. begin
  11.   if ParamCount < 1 then
  12.   begin
  13.     WriteLn('Usage: ', ParamStr(0), ' <number>');
  14.     Exit;
  15.   end;
  16.  
  17.   u := StrToInt(ParamStr(1));          // Get an input number from the command line
  18.   Randomize;
  19.   r := Random(10000);                  // Get a random integer 0 <= r < 10k
  20.  
  21.   for i := 0 to 9999 do                // 10k outer loop iterations
  22.   begin
  23.     for j := 0 to 99999 do             // 100k inner loop iterations, per outer loop iteration
  24.     begin
  25.       a[i] := a[i] + j mod u;          // Simple sum
  26.     end;
  27.     a[i] := a[i] + r;                  // Add a random value to each element in array
  28.   end;
  29.  
  30.   WriteLn(a[r]);                       // Print out a single element from the array
  31. end.

turrican

  • Full Member
  • ***
  • Posts: 140
  • Pascal is my life.
    • Homepage
Re: Benchmarks
« Reply #2 on: December 04, 2024, 08:47:07 pm »

Fibonacci

  • Hero Member
  • *****
  • Posts: 647
  • Internal Error Hunter
Re: Benchmarks
« Reply #3 on: December 04, 2024, 08:58:15 pm »
Times on my PC

Loops:
1312 ms original
594 ms - u, r, i, j: Int32 -> UInt32 + at least -O3

Fibonacci:
703 ms original
343 ms inlined

turrican

  • Full Member
  • ***
  • Posts: 140
  • Pascal is my life.
    • Homepage
Re: Benchmarks
« Reply #4 on: December 04, 2024, 09:18:04 pm »
Times on my PC

Loops:
1312 ms original
594 ms - u, r, i, j: Int32 -> UInt32 + at least -O3

Fibonacci:
703 ms original
343 ms inlined

Thanks Fibonacci!

Well, I tried to keep the types as similar to C as possible, so I decided to leave the integers signed. However, I'm surprised that it makes such a difference. On my PC, it's much slower because I'm using WSL2 (understandable). Do you think I should switch to uint32? In C, the types they use are int32_t.

Fibonacci

  • Hero Member
  • *****
  • Posts: 647
  • Internal Error Hunter
Re: Benchmarks
« Reply #5 on: December 04, 2024, 09:22:34 pm »
Do you think I should switch to uint32? In C, the types they use are int32_t.

If its allowed by the rules then sure

This cuts another 20% of time (266 ms):

Code: Pascal  [Select][+][-]
  1. function Fibonacci(n: UInt32): UInt32; inline;
  2. begin
  3.   if n <= 1 then exit(n);
  4.   //if n = 0 then
  5.   //  Fibonacci := 0
  6.   //else if n = 1 then
  7.   //  Fibonacci := 1
  8.   //else
  9.     Fibonacci := Fibonacci(n - 1) + Fibonacci(n - 2);
  10. end;

turrican

  • Full Member
  • ***
  • Posts: 140
  • Pascal is my life.
    • Homepage
Re: Benchmarks
« Reply #6 on: December 04, 2024, 09:29:52 pm »
Do you think I should switch to uint32? In C, the types they use are int32_t.

If its allowed by the rules then sure

This cuts another 20% of time (266 ms):

Code: Pascal  [Select][+][-]
  1. function Fibonacci(n: UInt32): UInt32; inline;
  2. begin
  3.   if n <= 1 then exit(n);
  4.   //if n = 0 then
  5.   //  Fibonacci := 0
  6.   //else if n = 1 then
  7.   //  Fibonacci := 1
  8.   //else
  9.     Fibonacci := Fibonacci(n - 1) + Fibonacci(n - 2);
  10. end;

Great! you are helping a lot! I'm going to update the programs.

Is this line worth?

a := 0;                         // Array of 10k elements initialized to 0


Warfley

  • Hero Member
  • *****
  • Posts: 1852
Re: Benchmarks
« Reply #7 on: December 04, 2024, 09:44:22 pm »
Benchmarks like this are quite pointless, because they don't measure anything close to what software looks like in the real world. Take a look at the github measurements:
Code: Pascal  [Select][+][-]
  1. C = 0.77
  2. Go = 2.07
  3. Node = 0.79
  4. Bun = 0.83
  5. Deno = 1.13
  6. PyPy = 1.61
  7. Java = 0.64
Node and Bun, two runtimes for an interpreted language are nearly as fast as C and Java a VM based language is faster than C? Well what the hell is going on here?

Simply put, it's JIT compilation, in interpreted languages, when the interpreter notices that the same operations are done over and over again, instead of interpreting them, it will compile them into very optimized code.
This benchmark only executes the exact same 3 lines of code in the exact same context, which is the optimal scenario fro JIT.

But any real codebase consists of more than 3 lines of code called in different contexts with different constraints.

This benchmark is just plain usless. There are no good benchmarks, because no benchmark can capture the nuonces of different programming languages, different paradigms and idioms. But this is by far the worst benchmark I have seen so far. It's literally just 3 lines of code that are benchmarked

turrican

  • Full Member
  • ***
  • Posts: 140
  • Pascal is my life.
    • Homepage
Re: Benchmarks
« Reply #8 on: December 04, 2024, 09:50:50 pm »
Benchmarks like this are quite pointless, because they don't measure anything close to what software looks like in the real world. Take a look at the github measurements:
Code: Pascal  [Select][+][-]
  1. C = 0.77
  2. Go = 2.07
  3. Node = 0.79
  4. Bun = 0.83
  5. Deno = 1.13
  6. PyPy = 1.61
  7. Java = 0.64
Node and Bun, two runtimes for an interpreted language are nearly as fast as C and Java a VM based language is faster than C? Well what the hell is going on here?

Simply put, it's JIT compilation, in interpreted languages, when the interpreter notices that the same operations are done over and over again, instead of interpreting them, it will compile them into very optimized code.
This benchmark only executes the exact same 3 lines of code in the exact same context, which is the optimal scenario fro JIT.

But any real codebase consists of more than 3 lines of code called in different contexts with different constraints.

This benchmark is just plain usless. There are no good benchmarks, because no benchmark can capture the nuonces of different programming languages, different paradigms and idioms. But this is by far the worst benchmark I have seen so far. It's literally just 3 lines of code that are benchmarked

Yes, you are right, but it's fun :) I just wanted to contribute to this repository with the Pascal implementation and asked for help because obviously there are people here who know a lot, as just demonstrated. I'm not worried at all, because I understand that future benchmarks will benefit AOT languages more and better.

alpine

  • Hero Member
  • *****
  • Posts: 1322
Re: Benchmarks
« Reply #9 on: December 04, 2024, 10:04:37 pm »
I have learned something today - a recursive function can be inlined. That's simply amazing!
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

Fibonacci

  • Hero Member
  • *****
  • Posts: 647
  • Internal Error Hunter
Re: Benchmarks
« Reply #10 on: December 04, 2024, 10:14:10 pm »
I have learned something today - a recursive function can be inlined. That's simply amazing!

Sarcasm? :D It is inlined, in the loop

Warfley

  • Hero Member
  • *****
  • Posts: 1852
Re: Benchmarks
« Reply #11 on: December 04, 2024, 10:17:29 pm »
I have learned something today - a recursive function can be inlined. That's simply amazing!
You can see the optimizations that FPC supports in the documentation: https://www.freepascal.org/docs-html/prog/progsu58.html#x65-640001.2.58

As you can see it can optimize tail recursion, meaning a function where the result is computed by with a final function call, can be converted into non recursive code.

alpine

  • Hero Member
  • *****
  • Posts: 1322
Re: Benchmarks
« Reply #12 on: December 04, 2024, 10:23:09 pm »
I have learned something today - a recursive function can be inlined. That's simply amazing!

Sarcasm? :D It is inlined, in the loop
Not at all. I've already examined the assembly. Each inner call is inlined exactly twice, which is actually a real benefit.
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

alpine

  • Hero Member
  • *****
  • Posts: 1322
Re: Benchmarks
« Reply #13 on: December 04, 2024, 10:24:55 pm »
I have learned something today - a recursive function can be inlined. That's simply amazing!
You can see the optimizations that FPC supports in the documentation: https://www.freepascal.org/docs-html/prog/progsu58.html#x65-640001.2.58

As you can see it can optimize tail recursion, meaning a function where the result is computed by with a final function call, can be converted into non recursive code.
Not a tail recursion.
"I'm sorry Dave, I'm afraid I can't do that."
—HAL 9000

Thaddy

  • Hero Member
  • *****
  • Posts: 16406
  • Censorship about opinions does not belong here.
Re: Benchmarks
« Reply #14 on: December 05, 2024, 06:36:01 am »
You can speed things up a little by adding {$I-} if you did not already do that.
and global variables don't need initialization because the heap is initialized.
In this case changing the memory manager to cmem also speeds things up.
All these need no code changes.

You can also replace strtoint with val which is faster and you can remove sysutils.
These do not affect the loop, but has some influence.
For the loop, the biggest gain can be achieved by using pointermath instead of array syntax.
That should be legal under any rules.

Remark: here the fibonacci code is not inlined, but it uses loopunroll from O3
Code: Bash  [Select][+][-]
  1. fib.pas(23,12) Note: Call to subroutine "function Fibonacci(n:LongInt):System.LongInt;" marked as inline is not inlined
« Last Edit: December 05, 2024, 08:11:15 am by Thaddy »
There is nothing wrong with being blunt. At a minimum it is also honest.

 

TinyPortal © 2005-2018