Recent

Author Topic: Optimization of for loops in the near zero range  (Read 2777 times)

lagprogramming

  • Sr. Member
  • ****
  • Posts: 404
Optimization of for loops in the near zero range
« on: March 21, 2023, 12:26:00 pm »
  for signedvariable:=value to -1 do produces code where the value of a register is compared with -1.
  for (un)signedvariable:=value downto 1 do produces code where the value of a register is compared with 1.
  In both cases the compiler might produce code that compares with 0, not with 1 or -1 because, depending on the target, it may optimize the comparisons.
  For example:
  - instead of cmp a register value with 1 and then jumping on the condition >=1, it's better to test the register and then jump on the condition >0.
  - instead of cmp a register value with -1 and then jumping on the condition <=-1, it's better to test the register and then jump on the condition <0.

Thaddy

  • Hero Member
  • *****
  • Posts: 14157
  • Probably until I exterminate Putin.
Re: Optimization of for loops in the near zero range
« Reply #1 on: March 21, 2023, 01:58:44 pm »
Did you look at -OoLOOPUNROLL optimization?
Specialize a type, not a var.

lagprogramming

  • Sr. Member
  • ****
  • Posts: 404
Re: Optimization of for loops in the near zero range
« Reply #2 on: March 22, 2023, 11:57:38 am »
Did you look at -OoLOOPUNROLL optimization?
It can't help in optimizing loops like:
"for i:=cnt downto 1 do" in sstrings.inc
"for i:=jz downto 1 do" in genmath.inc
"for i := length( s ) downto 1 do" in flt_core.inc
and the list continues. A search for the text "downto 1" in fpc's or MSEGui's directory will show more of them.
Most of these loops are tight and this optimization is useful in tight loops. It can't be done manually by the programmer. A couple of years ago I've done some tests and the loops that had a test instead of cmp ran noticeable faster. As far as I remember, the test was something like "for x:=abigvalue1 downto 0 do for y:=abigvalue2 downto 0 do..." vs. "for x:=abigvalue1 downto 1 do for y:=abigvalue2 downto 1 do...". I know that for these tight loops the gains are CPU dependent but FPC is a cross-compiler with many target CPUs.
I always try to compare numbers with 0, not 1 or -1, but regarding the "for" loops this comparison has to be done automatically by the compiler.


By the way, here is an example. Notice the existence of a cmpq instruction instead of testq at the following function.
Code: Pascal  [Select][+][-]
  1. function a(const param:sizeint):sizeint;
  2. begin
  3.   result:=param;
  4.   if result<=-1 then dec(result);
  5.   if result>=1 then inc(result);
  6.   if result<0 then dec(result);
  7.   if result>0 then inc(result);
  8. end;
  9.  
  10. # [32] begin
  11.         movq    %rdi,%rax
  12. # Var param located in register rax
  13. # Var $result located in register rax
  14. # Var param located in register rax
  15. .Ll2:
  16. # [34] if result<=-1 then dec(result);
  17.         cmpq    $-1,%rdi
  18.         setle   %cl
  19.         movzbl  %cl,%ecx
  20.         subq    %rcx,%rax
  21. .Ll3:
  22. # [35] if result>=1 then inc(result);
  23.         testq   %rax,%rax
  24.         setnle  %cl
  25.         movzbl  %cl,%ecx
  26.         addq    %rcx,%rax
  27. .Ll4:
  28. # [36] if result<0 then dec(result);
  29.         testq   %rax,%rax
  30.         setl    %cl
  31.         movzbl  %cl,%ecx
  32.         subq    %rcx,%rax
  33. .Ll5:
  34. # [37] if result>0 then inc(result);
  35.         testq   %rax,%rax
  36.         setg    %cl
  37.         movzbl  %cl,%ecx
  38.         addq    %rcx,%rax
  39. .Lc3:
  40. .Ll6:
  41. # [38] end;
  42.         ret

lagprogramming

  • Sr. Member
  • ****
  • Posts: 404
Re: Optimization of for loops in the near zero range
« Reply #3 on: May 27, 2023, 03:56:09 pm »
This is an improved example of near zero comparisons:
Code: Pascal  [Select][+][-]
  1. function foo(const param:sizeint):sizeint;
  2. begin
  3.   result:=param;
  4.   if result>-1 then dec(result);
  5.   if result<1 then inc(result);
  6.   if result<=-1 then dec(result);
  7.   if result>=1 then inc(result);
  8.   if result<0 then dec(result);
  9.   if result>0 then inc(result);
  10. end;
The code produced is
Code: Pascal  [Select][+][-]
  1. .section .text.n_unit1_$$_foo$int64$$int64,"ax"
  2.         .balign 16,0x90
  3. .globl  UNIT1_$$_FOO$INT64$$INT64
  4.         .hidden UNIT1_$$_FOO$INT64$$INT64
  5.         .type   UNIT1_$$_FOO$INT64$$INT64,@function
  6. UNIT1_$$_FOO$INT64$$INT64:
  7. .Lc2:
  8. .Ll1:
  9. # [unit1.pas]
  10. # [33] begin
  11.         movq    %rdi,%rax
  12. # Var param located in register rax
  13. # Var $result located in register rax
  14. # Var param located in register rax
  15. .Ll2:
  16. # [35] if result>-1 then dec(result);
  17.         cmpq    $-1,%rdi
  18.         setg    %cl
  19.         movzbl  %cl,%ecx
  20.         subq    %rcx,%rax
  21. .Ll3:
  22. # [36] if result<1 then inc(result);
  23.         testq   %rax,%rax
  24.         setle   %cl
  25.         movzbl  %cl,%ecx
  26.         addq    %rcx,%rax
  27. .Ll4:
  28. # [37] if result<=-1 then dec(result);
  29.         cmpq    $-1,%rax
  30.         setle   %cl
  31.         movzbl  %cl,%ecx
  32.         subq    %rcx,%rax
  33. .Ll5:
  34. # [38] if result>=1 then inc(result);
  35.         testq   %rax,%rax
  36.         setnle  %cl
  37.         movzbl  %cl,%ecx
  38.         addq    %rcx,%rax
  39. .Ll6:
  40. # [39] if result<0 then dec(result);
  41.         testq   %rax,%rax
  42.         setl    %cl
  43.         movzbl  %cl,%ecx
  44.         subq    %rcx,%rax
  45. .Ll7:
  46. # [40] if result>0 then inc(result);
  47.         testq   %rax,%rax
  48.         setg    %cl
  49.         movzbl  %cl,%ecx
  50.         addq    %rcx,%rax
  51. .Lc3:
  52. .Ll8:
  53. # [41] end;
  54.         ret
  55. .Lc1:
  56. .Lt1:
  57. .Le0:
  58.         .size   UNIT1_$$_FOO$INT64$$INT64, .Le0 - UNIT1_$$_FOO$INT64$$INT64
  59. .Ll9:

I think the compiler should replace the two cmp instructions with test ones.
Comparisons with 0 and 1 are ok so it should be trivial to modify the compiler in order to optimize these -1 comparisons. Probably the compiler developer who implemented this kind of optimization forgot about the -1 scenario.

AlexTP

  • Hero Member
  • *****
  • Posts: 2365
    • UVviewsoft
Re: Optimization of for loops in the near zero range
« Reply #4 on: July 29, 2023, 04:55:23 pm »

AlexTP

  • Hero Member
  • *****
  • Posts: 2365
    • UVviewsoft
Re: Optimization of for loops in the near zero range
« Reply #5 on: August 15, 2023, 09:29:22 am »

 

TinyPortal © 2005-2018