Recent

Author Topic: Bounties to speed up FPC  (Read 10805 times)

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 305
  • I use FPC [main] 💪🐯💪
Re: Bounties to speed up FPC
« Reply #15 on: February 02, 2025, 08:41:13 am »
The biggest thought that makes me sad is that you can achieve less optimal code (asm) in Pascal, but by various hacks and complicating the code.
In other languages, in particular C and C++, where the compiler backend is stuffed with a lot of different optimizations, you almost don't need to think about how to structure the code so that the compiler understands it correctly and generates optimal code. Often the compiler itself understands it and generates ideal code.

So the idea is that over time, as the FreePascal backend starts to learn more optimizations too - no one will rewrite previously written code with hacks that is harder to read than without hacks.

I want to just write code, understandable code, and have the compiler turn it into optimal code by itself.
I may seem rude - please don't take it personally

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 305
  • I use FPC [main] 💪🐯💪
Re: Bounties to speed up FPC
« Reply #16 on: February 02, 2025, 08:43:49 am »
Although this is more a discussion not about programming languages as such, but about their backends. I haven't tested it, but I'm sure that LLVM would produce optimal code in the cases you mentioned.
I may seem rude - please don't take it personally

Okoba

  • Hero Member
  • *****
  • Posts: 621
Re: Bounties to speed up FPC
« Reply #17 on: February 02, 2025, 09:44:22 am »
Most probably, but native FPC code generation is the way to go.

LV

  • Sr. Member
  • ****
  • Posts: 378
Re: Bounties to speed up FPC
« Reply #18 on: February 02, 2025, 10:36:54 am »
The results below appear unusual. When the debugger options are disabled, the program runs more slowly.

Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}
  4. {$optimization on}
  5. {$DEFINE AVX2}
  6. {$ASMMODE INTEL}
  7.  
  8. uses
  9.   SysUtils;
  10.  
  11. type
  12.   TTest = record
  13.     V: integer;
  14.     V2: integer;
  15.   end;
  16.  
  17.   procedure Test1;
  18.   var
  19.     I: integer;
  20.     A: TTest;
  21.   begin
  22.     A := Default(TTest);
  23.     for I := 1 to 1000 * 1000 * 1000 do
  24.       A.V += 1;
  25.   end;
  26.  
  27.   procedure Test2;
  28.   var
  29.     I, Temp: integer;
  30.     A: TTest;
  31.   begin
  32.     A := Default(TTest);
  33.     Temp := A.V;
  34.     for I := 1 to 1000 * 1000 * 1000 do
  35.       Temp += 1;
  36.     A.V := Temp;
  37.   end;
  38.  
  39.   procedure Test3;
  40.   var
  41.     A: TTest;
  42.   begin
  43.     A := Default(TTest);
  44.     asm
  45.              MOVDQU  XMM0, A.V
  46.              PXOR    XMM1, XMM1
  47.              MOV     ECX, 1000000000 / 4
  48.              @Loop:
  49.              PADDD   XMM0, XMM1
  50.              LOOP    @Loop
  51.              MOVDQU  A.V, XMM0
  52.     end;
  53.   end;
  54.  
  55. var
  56.   T: QWord;
  57.   i: integer;
  58.  
  59. begin
  60.   for i := 1 to 3 do
  61.   begin
  62.     T := GetTickCount64;
  63.     Test1;
  64.     WriteLn('Test1   ', GetTickCount64 - T);
  65.  
  66.     T := GetTickCount64;
  67.     Test2;
  68.     WriteLn('Test2   ', GetTickCount64 - T);
  69.  
  70.     T := GetTickCount64;
  71.     Test3;
  72.     WriteLn('Test3   ', GetTickCount64 - T);
  73.  
  74.     WriteLn('---------------------');
  75.  
  76.   end;
  77.  
  78.   ReadLn;
  79. end.
  80.  

AMD Ryzen 7 4700U
Windows 11
FPC 3.2.2
Lazarus 3.4

Compiler Options ->
  Config and Target -> Default
  Compilation and Linking ->
    Optimization levels -O1
  Debugging ->
    Run uses the debugger ON
    Generate info for debugger ON
    Type of debug info - Dwarf 3
    Display line numbers in run-time ON

output:
Test1   281
Test2   250
Test3   63
---------------------
Test1   250
Test2   234
Test3   63
---------------------
Test1   250
Test2   234
Test3   63
---------------------

Compiler Options ->
  Config and Target -> Default
  Compilation and Linking ->
    Optimization levels -O1
  Debugging ->
    Run uses the debugger  OFF
    Generate info for debugger OFF
    Type of debug info - Dwarf 3
    Display line numbers in run-time OFF

output:
Test1   485
Test2   484
Test3   63
---------------------
Test1   484
Test2   469
Test3   62
---------------------
Test1   485
Test2   484
Test3   63
---------------------
« Last Edit: February 02, 2025, 10:53:02 am by LV »

Okoba

  • Hero Member
  • *****
  • Posts: 621
Re: Bounties to speed up FPC
« Reply #19 on: February 02, 2025, 01:08:24 pm »
Can you please share them on the issue?

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11923
  • Debugger - SynEdit - and more
    • wiki
Re: Bounties to speed up FPC
« Reply #20 on: February 02, 2025, 01:14:42 pm »
The results below appear unusual. When the debugger options are disabled, the program runs more slowly.

Adding/removing debug info may have changed alignment in memory => The code is loaded to a different address...

Try to add
    {$CodeAlign proc=$40}

at the top => that may have an impact too ($20 is probably enough).

You can also play with the value for loop=



If you have that directive, in Test3 start adding nop at the start of the asm block.

For me: 1 to 3 nop => no diff
4th nop => slower

But that is hardly because that nop uses that much time. And it is before the loop, so it is executed just once.
That is because the loop moves to  a different alignment.


As I said, benchmarks like this are extremely easy to get it wrong.

(there a prior discussions on this somewhere hidden in the forum (or maybe on the mail list), IIRC including a link to a youtube video with some explanations.

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 305
  • I use FPC [main] 💪🐯💪
Re: Bounties to speed up FPC
« Reply #21 on: February 02, 2025, 01:23:29 pm »
Code: ASM  [Select][+][-]
  1. MOVDQU  XMM0, A.V
  2. PXOR    XMM1, XMM1
  3. MOV     ECX, 1000000000 / 4
  4. @Loop:
  5.   PADDD   XMM0, XMM1
  6. LOOP    @Loop
  7. MOVDQU  A.V, XMM0

Are you sure your assembler variant is the functional equivalent of the original Pascal variant?

That's it, I've seen a generally sufficient equivalent
« Last Edit: February 02, 2025, 01:41:49 pm by ALLIGATOR »
I may seem rude - please don't take it personally

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11923
  • Debugger - SynEdit - and more
    • wiki
Re: Bounties to speed up FPC
« Reply #22 on: February 02, 2025, 01:24:12 pm »
Btw, my times

FPC 3.3.1 (very recent)
-O3

with the codealign

Code: Text  [Select][+][-]
  1. Test1   1203
  2. Test2   219
  3. Test3   281
  4. ---------------------
  5. Test1   1203
  6. Test2   219
  7. Test3   281
  8. ---------------------
  9. Test1   1203
  10. Test2   219
  11. Test3   281
  12. ---------------------

And without the codealign
Code: Text  [Select][+][-]
  1. Test1   1219
  2. Test2   437
  3. Test3   282
  4. ---------------------
  5. Test1   1203
  6. Test2   453
  7. Test3   266
  8. ---------------------
  9. Test1   1203
  10. Test2   453
  11. Test3   281


And the asm fpc generated for Test2 (which is as fast (or faster) as test3 when aligned correctly)

Code: ASM  [Select][+][-]
  1. 0000000100001780 488D6424C8               lea rsp,[rsp-$38]
  2. C:\Users\martin\AppData\Local\Temp\project1.lpr:29  var
  3. 0000000100001785 488D4C2428               lea rcx,[rsp+$28]
  4. 000000010000178A 4531C0                   xor r8d,r8d
  5. 000000010000178D BA08000000               mov edx,$00000008
  6. 0000000100001792 E869050000               call +$00000569    # $0000000100001D00 FillChar x86_64.inc:365
  7. C:\Users\martin\AppData\Local\Temp\project1.lpr:33  A := Default(TTest);
  8. 0000000100001797 488B442428               mov rax,[rsp+$28]
  9. 000000010000179C 4889442420               mov [rsp+$20],rax
  10. C:\Users\martin\AppData\Local\Temp\project1.lpr:34  Temp := A.V;
  11. 00000001000017A1 8B4C2420                 mov ecx,[rsp+$20]
  12. C:\Users\martin\AppData\Local\Temp\project1.lpr:35  for I := 1 to 1000 * 1000 * 1000 do
  13. 00000001000017A5 31C0                     xor eax,eax
  14. 00000001000017A7 90                       nop
  15. 00000001000017A8 83C001                   add eax,$01
  16. C:\Users\martin\AppData\Local\Temp\project1.lpr:36  Temp += 1;
  17. 00000001000017AB 83C101                   add ecx,$01
  18. C:\Users\martin\AppData\Local\Temp\project1.lpr:35  for I := 1 to 1000 * 1000 * 1000 do
  19. 00000001000017AE 3D00CA9A3B               cmp eax,$3B9ACA00
  20. 00000001000017B3 7CF3                     jl -$0D    # $00000001000017A8 Test2+40 project1.lpr:35
  21. C:\Users\martin\AppData\Local\Temp\project1.lpr:37  A.V := Temp;
  22. 00000001000017B5 894C2420                 mov [rsp+$20],ecx
  23. C:\Users\martin\AppData\Local\Temp\project1.lpr:38  end;
  24. 00000001000017B9 90                       nop
  25. 00000001000017BA 488D642438               lea rsp,[rsp+$38]
  26. 00000001000017BF C3                       ret
  27.  

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 305
  • I use FPC [main] 💪🐯💪
Re: Bounties to speed up FPC
« Reply #23 on: February 02, 2025, 01:34:18 pm »
...
Try to add
               ... $40 ...
...
...    that may have an impact too ($20 is probably enough)

I think the top starter was saying $200 and/or $100  ::)
I may seem rude - please don't take it personally

ALLIGATOR

  • Sr. Member
  • ****
  • Posts: 305
  • I use FPC [main] 💪🐯💪
Re: Bounties to speed up FPC
« Reply #24 on: February 02, 2025, 01:50:20 pm »
Code: ASM  [Select][+][-]
  1. @Loop:
  2. PADDD   XMM0, XMM1
  3. LOOP    @Loop
  4.  

try this:
Code: ASM  [Select][+][-]
  1. @Loop:
  2. PADDD   XMM0, XMM1
  3. dec ecx
  4. jnz @Loop
  5.  
I may seem rude - please don't take it personally

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 11923
  • Debugger - SynEdit - and more
    • wiki
Re: Bounties to speed up FPC
« Reply #25 on: February 02, 2025, 01:53:34 pm »
I think the top starter was saying $200 and/or $100  ::)

Ask him if "$200" mean "USD 200" or "512 of some currency"? ;)

BrunoK

  • Hero Member
  • *****
  • Posts: 751
  • Retired programmer
Re: Bounties to speed up FPC
« Reply #26 on: February 02, 2025, 02:58:46 pm »
At end of test3, A.V, on my system, does no contain loop count.

For the rest, timings ratios similar to Martin_fr's

LV

  • Sr. Member
  • ****
  • Posts: 378
Re: Bounties to speed up FPC
« Reply #27 on: February 02, 2025, 05:03:10 pm »
I am not a professional programmer; this is my hobby. I learned a lot from this forum. Thank you.
@Okoba. This is an interesting topic.
@Martin_fr. Now it's clear.
@ALLIGATOR, @BrunoK. Test 3 has been tweaked.
I switched to Intel I7 8700 and added test 4, it seems to be faster.

Code: Pascal  [Select][+][-]
  1. program project1;
  2.  
  3. {$mode objfpc}
  4. {$optimization on}
  5. {$ASMMODE INTEL}
  6. {$CodeAlign proc=$40}
  7.  
  8. uses
  9.   SysUtils;
  10.  
  11. type
  12.   TTest = record
  13.     V: integer;
  14.     V2: integer;
  15.   end;
  16.  
  17.   procedure Test1;
  18.   var
  19.     I: integer;
  20.     A: TTest;
  21.   begin
  22.     A := Default(TTest);
  23.     for I := 1 to 1000 * 1000 * 1000 do
  24.       A.V += 1;
  25.   end;
  26.  
  27.   procedure Test2;
  28.   var
  29.     I, Temp: integer;
  30.     A: TTest;
  31.   begin
  32.     A := Default(TTest);
  33.     Temp := A.V;
  34.     for I := 1 to 1000 * 1000 * 1000 do
  35.       Temp += 1;
  36.     A.V := Temp;
  37.   end;
  38.  
  39.   procedure Test3;
  40.   var
  41.     A: TTest;
  42.   begin
  43.     A := Default(TTest);
  44.     asm
  45.              MOV     EAX, 1000000000
  46.              MOV     EDX, [A.V]
  47.              @loop:
  48.              ADD     EDX, 1
  49.              DEC     EAX
  50.              JNZ     @loop
  51.              MOV     [A.V], EDX
  52.     end;
  53.     writeln('A.V = ', A.V);
  54.   end;
  55.  
  56.   procedure Test4;
  57.   var
  58.     A: TTest;
  59.     IncrementVector: array[0..7] of integer = (1, 1, 1, 1, 1, 1, 1, 1);
  60.     TempResult: integer;
  61.   begin
  62.     A := Default(TTest);
  63.     asm
  64.              VMOVDQU YMM0, [IncrementVector]
  65.              VMOVDQU YMM1, YMM0
  66.              MOV     ECX, 1000000000 / 8
  67.              @Loop:
  68.              VPADDD  YMM0, YMM0, YMM1
  69.              DEC     ECX
  70.              JNZ     @Loop
  71.  
  72.              // Sum all 8 integers in YMM0
  73.              VEXTRACTI128 XMM1, YMM0, 1
  74.              VPADDD  XMM0, XMM0, XMM1
  75.              VPHADDD XMM0, XMM0, XMM0
  76.              VPHADDD XMM0, XMM0, XMM0
  77.              VMOVD   TempResult, XMM0
  78.              VZEROUPPER
  79.     end;
  80.     A.V := TempResult - 8; // Store the summed result into A.V
  81.     writeln('A.V = ', A.V);
  82.   end;
  83.  
  84. var
  85.   T: QWord;
  86.   i: integer;
  87.  
  88. begin
  89.   for i := 1 to 3 do
  90.   begin
  91.     T := GetTickCount64;
  92.     Test1;
  93.     WriteLn('Test1   ', GetTickCount64 - T);
  94.  
  95.     T := GetTickCount64;
  96.     Test2;
  97.     WriteLn('Test2   ', GetTickCount64 - T);
  98.  
  99.     T := GetTickCount64;
  100.     Test3;
  101.     WriteLn('Test3   ', GetTickCount64 - T);
  102.  
  103.  
  104.     T := GetTickCount64;
  105.     Test4;
  106.     WriteLn('Test4   ', GetTickCount64 - T);
  107.  
  108.     WriteLn('---------------------');
  109.  
  110.   end;
  111.  
  112.   ReadLn;
  113. end.
  114.  

Code: Text  [Select][+][-]
  1. Test1   1266
  2. Test2   234
  3. A.V = 1000000000
  4. Test3   234
  5. A.V = 1000000000
  6. Test4   32
  7. ---------------------
  8. Test1   1234
  9. Test2   234
  10. A.V = 1000000000
  11. Test3   219
  12. A.V = 1000000000
  13. Test4   31
  14. ---------------------
  15. Test1   1235
  16. Test2   234
  17. A.V = 1000000000
  18. Test3   235
  19. A.V = 1000000000
  20. Test4   31
  21. ---------------------
  22.  

BrunoK

  • Hero Member
  • *****
  • Posts: 751
  • Retired programmer
Re: Bounties to speed up FPC
« Reply #28 on: February 02, 2025, 05:47:22 pm »
I switched to Intel I7 8700 and added test 4, it seems to be faster.
There AFAIK the objective is to hopefully improve FPC code generation in a general way. Not optimize a specific and not very useful loop.

Okoba

  • Hero Member
  • *****
  • Posts: 621
Re: Bounties to speed up FPC
« Reply #29 on: February 02, 2025, 05:53:02 pm »
Exactly.
I am looking for people who understand enough of the compiler and like to spend time to improve these issues.

 

TinyPortal © 2005-2018