Did you look at -OoLOOPUNROLL optimization?
It can't help in optimizing loops like:
"for i:=cnt downto 1 do" in sstrings.inc
"for i:=jz downto 1 do" in genmath.inc
"for i := length( s ) downto 1 do" in flt_core.inc
and the list continues. A search for the text "downto 1" in fpc's or MSEGui's directory will show more of them.
Most of these loops are tight and this optimization is useful in tight loops. It can't be done manually by the programmer. A couple of years ago I've done some tests and the loops that had a test instead of cmp ran noticeable faster. As far as I remember, the test was something like "for x:=abigvalue1 downto 0 do for y:=abigvalue2 downto 0 do..." vs. "for x:=abigvalue1 downto 1 do for y:=abigvalue2 downto 1 do...". I know that for these tight loops the gains are CPU dependent but FPC is a cross-compiler with many target CPUs.
I always try to compare numbers with 0, not 1 or -1, but regarding the "for" loops this comparison has to be done automatically by the compiler.
By the way, here is an example. Notice the existence of a cmpq instruction instead of testq at the following function.
function a(const param:sizeint):sizeint;
begin
result:=param;
if result<=-1 then dec(result);
if result>=1 then inc(result);
if result<0 then dec(result);
if result>0 then inc(result);
end;
# [32] begin
movq %rdi,%rax
# Var param located in register rax
# Var $result located in register rax
# Var param located in register rax
.Ll2:
# [34] if result<=-1 then dec(result);
cmpq $-1,%rdi
setle %cl
movzbl %cl,%ecx
subq %rcx,%rax
.Ll3:
# [35] if result>=1 then inc(result);
testq %rax,%rax
setnle %cl
movzbl %cl,%ecx
addq %rcx,%rax
.Ll4:
# [36] if result<0 then dec(result);
testq %rax,%rax
setl %cl
movzbl %cl,%ecx
subq %rcx,%rax
.Ll5:
# [37] if result>0 then inc(result);
testq %rax,%rax
setg %cl
movzbl %cl,%ecx
addq %rcx,%rax
.Lc3:
.Ll6:
# [38] end;
ret