### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook

### Author Topic: Optimization of for loops in the near zero range  (Read 3090 times)

#### lagprogramming

• Sr. Member
• Posts: 406
##### Optimization of for loops in the near zero range
« on: March 21, 2023, 12:26:00 pm »
for signedvariable:=value to -1 do produces code where the value of a register is compared with -1.
for (un)signedvariable:=value downto 1 do produces code where the value of a register is compared with 1.
In both cases the compiler might produce code that compares with 0, not with 1 or -1 because, depending on the target, it may optimize the comparisons.
For example:
- instead of cmp a register value with 1 and then jumping on the condition >=1, it's better to test the register and then jump on the condition >0.
- instead of cmp a register value with -1 and then jumping on the condition <=-1, it's better to test the register and then jump on the condition <0.

• Hero Member
• Posts: 15553
• Censorship about opinions does not belong here.
##### Re: Optimization of for loops in the near zero range
« Reply #1 on: March 21, 2023, 01:58:44 pm »
Did you look at -OoLOOPUNROLL optimization?
If I smell bad code it usually is bad code and that includes my own code.

#### lagprogramming

• Sr. Member
• Posts: 406
##### Re: Optimization of for loops in the near zero range
« Reply #2 on: March 22, 2023, 11:57:38 am »
Did you look at -OoLOOPUNROLL optimization?
It can't help in optimizing loops like:
"for i:=cnt downto 1 do" in sstrings.inc
"for i:=jz downto 1 do" in genmath.inc
"for i := length( s ) downto 1 do" in flt_core.inc
and the list continues. A search for the text "downto 1" in fpc's or MSEGui's directory will show more of them.
Most of these loops are tight and this optimization is useful in tight loops. It can't be done manually by the programmer. A couple of years ago I've done some tests and the loops that had a test instead of cmp ran noticeable faster. As far as I remember, the test was something like "for x:=abigvalue1 downto 0 do for y:=abigvalue2 downto 0 do..." vs. "for x:=abigvalue1 downto 1 do for y:=abigvalue2 downto 1 do...". I know that for these tight loops the gains are CPU dependent but FPC is a cross-compiler with many target CPUs.
I always try to compare numbers with 0, not 1 or -1, but regarding the "for" loops this comparison has to be done automatically by the compiler.

By the way, here is an example. Notice the existence of a cmpq instruction instead of testq at the following function.
Code: Pascal  [Select][+][-]
1. function a(const param:sizeint):sizeint;
2. begin
3.   result:=param;
4.   if result<=-1 then dec(result);
5.   if result>=1 then inc(result);
6.   if result<0 then dec(result);
7.   if result>0 then inc(result);
8. end;
9.
10. # [32] begin
11.         movq    %rdi,%rax
12. # Var param located in register rax
13. # Var \$result located in register rax
14. # Var param located in register rax
15. .Ll2:
16. # [34] if result<=-1 then dec(result);
17.         cmpq    \$-1,%rdi
18.         setle   %cl
19.         movzbl  %cl,%ecx
20.         subq    %rcx,%rax
21. .Ll3:
22. # [35] if result>=1 then inc(result);
23.         testq   %rax,%rax
24.         setnle  %cl
25.         movzbl  %cl,%ecx
27. .Ll4:
28. # [36] if result<0 then dec(result);
29.         testq   %rax,%rax
30.         setl    %cl
31.         movzbl  %cl,%ecx
32.         subq    %rcx,%rax
33. .Ll5:
34. # [37] if result>0 then inc(result);
35.         testq   %rax,%rax
36.         setg    %cl
37.         movzbl  %cl,%ecx
39. .Lc3:
40. .Ll6:
41. # [38] end;
42.         ret

#### lagprogramming

• Sr. Member
• Posts: 406
##### Re: Optimization of for loops in the near zero range
« Reply #3 on: May 27, 2023, 03:56:09 pm »
This is an improved example of near zero comparisons:
Code: Pascal  [Select][+][-]
1. function foo(const param:sizeint):sizeint;
2. begin
3.   result:=param;
4.   if result>-1 then dec(result);
5.   if result<1 then inc(result);
6.   if result<=-1 then dec(result);
7.   if result>=1 then inc(result);
8.   if result<0 then dec(result);
9.   if result>0 then inc(result);
10. end;
The code produced is
Code: Pascal  [Select][+][-]
1. .section .text.n_unit1_\$\$_foo\$int64\$\$int64,"ax"
2.         .balign 16,0x90
3. .globl  UNIT1_\$\$_FOO\$INT64\$\$INT64
4.         .hidden UNIT1_\$\$_FOO\$INT64\$\$INT64
5.         .type   UNIT1_\$\$_FOO\$INT64\$\$INT64,@function
6. UNIT1_\$\$_FOO\$INT64\$\$INT64:
7. .Lc2:
8. .Ll1:
9. # [unit1.pas]
10. # [33] begin
11.         movq    %rdi,%rax
12. # Var param located in register rax
13. # Var \$result located in register rax
14. # Var param located in register rax
15. .Ll2:
16. # [35] if result>-1 then dec(result);
17.         cmpq    \$-1,%rdi
18.         setg    %cl
19.         movzbl  %cl,%ecx
20.         subq    %rcx,%rax
21. .Ll3:
22. # [36] if result<1 then inc(result);
23.         testq   %rax,%rax
24.         setle   %cl
25.         movzbl  %cl,%ecx
27. .Ll4:
28. # [37] if result<=-1 then dec(result);
29.         cmpq    \$-1,%rax
30.         setle   %cl
31.         movzbl  %cl,%ecx
32.         subq    %rcx,%rax
33. .Ll5:
34. # [38] if result>=1 then inc(result);
35.         testq   %rax,%rax
36.         setnle  %cl
37.         movzbl  %cl,%ecx
39. .Ll6:
40. # [39] if result<0 then dec(result);
41.         testq   %rax,%rax
42.         setl    %cl
43.         movzbl  %cl,%ecx
44.         subq    %rcx,%rax
45. .Ll7:
46. # [40] if result>0 then inc(result);
47.         testq   %rax,%rax
48.         setg    %cl
49.         movzbl  %cl,%ecx
51. .Lc3:
52. .Ll8:
53. # [41] end;
54.         ret
55. .Lc1:
56. .Lt1:
57. .Le0:
58.         .size   UNIT1_\$\$_FOO\$INT64\$\$INT64, .Le0 - UNIT1_\$\$_FOO\$INT64\$\$INT64
59. .Ll9:

I think the compiler should replace the two cmp instructions with test ones.
Comparisons with 0 and 1 are ok so it should be trivial to modify the compiler in order to optimize these -1 comparisons. Probably the compiler developer who implemented this kind of optimization forgot about the -1 scenario.

#### AlexTP

• Hero Member
• Posts: 2457
##### Re: Optimization of for loops in the near zero range
« Reply #4 on: July 29, 2023, 04:55:23 pm »

#### AlexTP

• Hero Member
• Posts: 2457
##### Re: Optimization of for loops in the near zero range
« Reply #5 on: August 15, 2023, 09:29:22 am »