Forum > Suggestions
Optimization of for loops in the near zero range
lagprogramming:
for signedvariable:=value to -1 do produces code where the value of a register is compared with -1.
for (un)signedvariable:=value downto 1 do produces code where the value of a register is compared with 1.
In both cases the compiler might produce code that compares with 0, not with 1 or -1 because, depending on the target, it may optimize the comparisons.
For example:
- instead of cmp a register value with 1 and then jumping on the condition >=1, it's better to test the register and then jump on the condition >0.
- instead of cmp a register value with -1 and then jumping on the condition <=-1, it's better to test the register and then jump on the condition <0.
Thaddy:
Did you look at -OoLOOPUNROLL optimization?
lagprogramming:
--- Quote from: Thaddy on March 21, 2023, 01:58:44 pm ---Did you look at -OoLOOPUNROLL optimization?
--- End quote ---
It can't help in optimizing loops like:
"for i:=cnt downto 1 do" in sstrings.inc
"for i:=jz downto 1 do" in genmath.inc
"for i := length( s ) downto 1 do" in flt_core.inc
and the list continues. A search for the text "downto 1" in fpc's or MSEGui's directory will show more of them.
Most of these loops are tight and this optimization is useful in tight loops. It can't be done manually by the programmer. A couple of years ago I've done some tests and the loops that had a test instead of cmp ran noticeable faster. As far as I remember, the test was something like "for x:=abigvalue1 downto 0 do for y:=abigvalue2 downto 0 do..." vs. "for x:=abigvalue1 downto 1 do for y:=abigvalue2 downto 1 do...". I know that for these tight loops the gains are CPU dependent but FPC is a cross-compiler with many target CPUs.
I always try to compare numbers with 0, not 1 or -1, but regarding the "for" loops this comparison has to be done automatically by the compiler.
By the way, here is an example. Notice the existence of a cmpq instruction instead of testq at the following function.
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function a(const param:sizeint):sizeint;begin result:=param; if result<=-1 then dec(result); if result>=1 then inc(result); if result<0 then dec(result); if result>0 then inc(result);end; # [32] begin movq %rdi,%rax# Var param located in register rax# Var $result located in register rax# Var param located in register rax.Ll2:# [34] if result<=-1 then dec(result); cmpq $-1,%rdi setle %cl movzbl %cl,%ecx subq %rcx,%rax.Ll3:# [35] if result>=1 then inc(result); testq %rax,%rax setnle %cl movzbl %cl,%ecx addq %rcx,%rax.Ll4:# [36] if result<0 then dec(result); testq %rax,%rax setl %cl movzbl %cl,%ecx subq %rcx,%rax.Ll5:# [37] if result>0 then inc(result); testq %rax,%rax setg %cl movzbl %cl,%ecx addq %rcx,%rax.Lc3:.Ll6:# [38] end; ret
lagprogramming:
This is an improved example of near zero comparisons:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function foo(const param:sizeint):sizeint;begin result:=param; if result>-1 then dec(result); if result<1 then inc(result); if result<=-1 then dec(result); if result>=1 then inc(result); if result<0 then dec(result); if result>0 then inc(result);end;The code produced is
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---.section .text.n_unit1_$$_foo$int64$$int64,"ax" .balign 16,0x90.globl UNIT1_$$_FOO$INT64$$INT64 .hidden UNIT1_$$_FOO$INT64$$INT64 .type UNIT1_$$_FOO$INT64$$INT64,@functionUNIT1_$$_FOO$INT64$$INT64:.Lc2:.Ll1:# [unit1.pas]# [33] begin movq %rdi,%rax# Var param located in register rax# Var $result located in register rax# Var param located in register rax.Ll2:# [35] if result>-1 then dec(result); cmpq $-1,%rdi setg %cl movzbl %cl,%ecx subq %rcx,%rax.Ll3:# [36] if result<1 then inc(result); testq %rax,%rax setle %cl movzbl %cl,%ecx addq %rcx,%rax.Ll4:# [37] if result<=-1 then dec(result); cmpq $-1,%rax setle %cl movzbl %cl,%ecx subq %rcx,%rax.Ll5:# [38] if result>=1 then inc(result); testq %rax,%rax setnle %cl movzbl %cl,%ecx addq %rcx,%rax.Ll6:# [39] if result<0 then dec(result); testq %rax,%rax setl %cl movzbl %cl,%ecx subq %rcx,%rax.Ll7:# [40] if result>0 then inc(result); testq %rax,%rax setg %cl movzbl %cl,%ecx addq %rcx,%rax.Lc3:.Ll8:# [41] end; ret.Lc1:.Lt1:.Le0: .size UNIT1_$$_FOO$INT64$$INT64, .Le0 - UNIT1_$$_FOO$INT64$$INT64.Ll9:
I think the compiler should replace the two cmp instructions with test ones.
Comparisons with 0 and 1 are ok so it should be trivial to modify the compiler in order to optimize these -1 comparisons. Probably the compiler developer who implemented this kind of optimization forgot about the -1 scenario.
AlexTP:
Last post info - posted to https://gitlab.com/freepascal.org/fpc/source/-/issues/40370
Navigation
[0] Message Index
[#] Next page