Forum > Suggestions

Optimization of for loops in the near zero range

(1/2) > >>

lagprogramming:
  for signedvariable:=value to -1 do produces code where the value of a register is compared with -1.
  for (un)signedvariable:=value downto 1 do produces code where the value of a register is compared with 1.
  In both cases the compiler might produce code that compares with 0, not with 1 or -1 because, depending on the target, it may optimize the comparisons.
  For example:
  - instead of cmp a register value with 1 and then jumping on the condition >=1, it's better to test the register and then jump on the condition >0.
  - instead of cmp a register value with -1 and then jumping on the condition <=-1, it's better to test the register and then jump on the condition <0.

Thaddy:
Did you look at -OoLOOPUNROLL optimization?

lagprogramming:

--- Quote from: Thaddy on March 21, 2023, 01:58:44 pm ---Did you look at -OoLOOPUNROLL optimization?

--- End quote ---
It can't help in optimizing loops like:
"for i:=cnt downto 1 do" in sstrings.inc
"for i:=jz downto 1 do" in genmath.inc
"for i := length( s ) downto 1 do" in flt_core.inc
and the list continues. A search for the text "downto 1" in fpc's or MSEGui's directory will show more of them.
Most of these loops are tight and this optimization is useful in tight loops. It can't be done manually by the programmer. A couple of years ago I've done some tests and the loops that had a test instead of cmp ran noticeable faster. As far as I remember, the test was something like "for x:=abigvalue1 downto 0 do for y:=abigvalue2 downto 0 do..." vs. "for x:=abigvalue1 downto 1 do for y:=abigvalue2 downto 1 do...". I know that for these tight loops the gains are CPU dependent but FPC is a cross-compiler with many target CPUs.
I always try to compare numbers with 0, not 1 or -1, but regarding the "for" loops this comparison has to be done automatically by the compiler.


By the way, here is an example. Notice the existence of a cmpq instruction instead of testq at the following function.

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function a(const param:sizeint):sizeint;begin  result:=param;  if result<=-1 then dec(result);  if result>=1 then inc(result);  if result<0 then dec(result);  if result>0 then inc(result);end; # [32] begin        movq    %rdi,%rax# Var param located in register rax# Var $result located in register rax# Var param located in register rax.Ll2:# [34] if result<=-1 then dec(result);        cmpq    $-1,%rdi        setle   %cl        movzbl  %cl,%ecx        subq    %rcx,%rax.Ll3:# [35] if result>=1 then inc(result);        testq   %rax,%rax        setnle  %cl        movzbl  %cl,%ecx        addq    %rcx,%rax.Ll4:# [36] if result<0 then dec(result);        testq   %rax,%rax        setl    %cl        movzbl  %cl,%ecx        subq    %rcx,%rax.Ll5:# [37] if result>0 then inc(result);        testq   %rax,%rax        setg    %cl        movzbl  %cl,%ecx        addq    %rcx,%rax.Lc3:.Ll6:# [38] end;        ret

lagprogramming:
This is an improved example of near zero comparisons:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function foo(const param:sizeint):sizeint;begin  result:=param;  if result>-1 then dec(result);  if result<1 then inc(result);  if result<=-1 then dec(result);  if result>=1 then inc(result);  if result<0 then dec(result);  if result>0 then inc(result);end;The code produced is

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---.section .text.n_unit1_$$_foo$int64$$int64,"ax"        .balign 16,0x90.globl  UNIT1_$$_FOO$INT64$$INT64        .hidden UNIT1_$$_FOO$INT64$$INT64        .type   UNIT1_$$_FOO$INT64$$INT64,@functionUNIT1_$$_FOO$INT64$$INT64:.Lc2:.Ll1:# [unit1.pas]# [33] begin        movq    %rdi,%rax# Var param located in register rax# Var $result located in register rax# Var param located in register rax.Ll2:# [35] if result>-1 then dec(result);        cmpq    $-1,%rdi        setg    %cl        movzbl  %cl,%ecx        subq    %rcx,%rax.Ll3:# [36] if result<1 then inc(result);        testq   %rax,%rax        setle   %cl        movzbl  %cl,%ecx        addq    %rcx,%rax.Ll4:# [37] if result<=-1 then dec(result);        cmpq    $-1,%rax        setle   %cl        movzbl  %cl,%ecx        subq    %rcx,%rax.Ll5:# [38] if result>=1 then inc(result);        testq   %rax,%rax        setnle  %cl        movzbl  %cl,%ecx        addq    %rcx,%rax.Ll6:# [39] if result<0 then dec(result);        testq   %rax,%rax        setl    %cl        movzbl  %cl,%ecx        subq    %rcx,%rax.Ll7:# [40] if result>0 then inc(result);        testq   %rax,%rax        setg    %cl        movzbl  %cl,%ecx        addq    %rcx,%rax.Lc3:.Ll8:# [41] end;        ret.Lc1:.Lt1:.Le0:        .size   UNIT1_$$_FOO$INT64$$INT64, .Le0 - UNIT1_$$_FOO$INT64$$INT64.Ll9:
I think the compiler should replace the two cmp instructions with test ones.
Comparisons with 0 and 1 are ok so it should be trivial to modify the compiler in order to optimize these -1 comparisons. Probably the compiler developer who implemented this kind of optimization forgot about the -1 scenario.

AlexTP:
Last post info - posted to https://gitlab.com/freepascal.org/fpc/source/-/issues/40370

Navigation

[0] Message Index

[#] Next page

Go to full version