@Thaddy
Please, read that post entirely.
After playing the entire morning with the issue from the original post, here are my findings:
The minimal code:
var
mas: array[1..200, 1..2] of Boolean;
i, j, k: Integer;
begin
k := 1; j := 1;
for i := 1 to 200 do
mas[i, j] := (k = 1);
WriteLn(mas[1, 1]);
end.
(The array dims are greater than 1 because in higher optimization levels the compiler reduces them to just one assignment)
All compiler targets are x86_64, i.e. 64-bit.
Free Pascal Compiler version 3.2.2-rrelease_3_2_2-0-g0d122c4953 [2022/10/28] for x86_64, optimization level: -O1
Here the result is
FALSE which is
incorrect. The disassembly:
project1.lpr:6 for i := 1 to 200 do
00000000004010B6 c705900e030000000000 movl $0x0,0x30e90(%rip) # 0x431f50 <U_$P$PROGRAM_$$_I>
00000000004010C0 8b058a0e0300 mov 0x30e8a(%rip),%eax # 0x431f50 <U_$P$PROGRAM_$$_I>
00000000004010C6 83c001 add $0x1,%eax
00000000004010C9 8905810e0300 mov %eax,0x30e81(%rip) # 0x431f50 <U_$P$PROGRAM_$$_I>
project1.lpr:7 mas[i, j] := (k = 1);
00000000004010CF 8b0d8b0e0300 mov 0x30e8b(%rip),%ecx # 0x431f60 <U_$P$PROGRAM_$$_J>
00000000004010D5 48d1e0 shl %rax
00000000004010D8 833d910e030001 cmpl $0x1,0x30e91(%rip) # 0x431f70 <U_$P$PROGRAM_$$_K>
00000000004010DF 488d15da0c0300 lea 0x30cda(%rip),%rdx # 0x431dc0 <U_$P$PROGRAM_$$_MAS>
00000000004010E6 4801d0 add %rdx,%rax
00000000004010E9 0f944408fd sete -0x3(%rax,%rcx,1)
project1.lpr:6 for i := 1 to 200 do
00000000004010EE 813d580e0300c8000000 cmpl $0xc8,0x30e58(%rip) # 0x431f50 <U_$P$PROGRAM_$$_I>
00000000004010F8 7cc6 jl 0x4010c0 <main+48>
Here the problem is the line #11 where the
add instruction destroys the flags previously set from the
cmpl at line $9.
Free Pascal Compiler version 3.3.1-15761-gbf970b29f4-dirty [2024/05/30] for x86_64 trunk, optimization level: -O1
Here the result is
TRUE which is
correct. The disassembly:
project1.lpr:6 for i := 1 to 200 do
00000000004010B3 488d0586760300 lea 0x37686(%rip),%rax # 0x438740 <U_$P$PROGRAM_$$_I>
00000000004010BA c70001000000 movl $0x1,(%rax)
project1.lpr:7 mas[i, j] := (k = 1);
00000000004010C0 488d0579760300 lea 0x37679(%rip),%rax # 0x438740 <U_$P$PROGRAM_$$_I>
00000000004010C7 8b10 mov (%rax),%edx
00000000004010C9 488d0580760300 lea 0x37680(%rip),%rax # 0x438750 <U_$P$PROGRAM_$$_J>
00000000004010D0 8b00 mov (%rax),%eax
00000000004010D2 48d1e2 shl %rdx
00000000004010D5 488d0d84760300 lea 0x37684(%rip),%rcx # 0x438760 <U_$P$PROGRAM_$$_K>
00000000004010DC 833901 cmpl $0x1,(%rcx)
00000000004010DF 488d0dca740300 lea 0x374ca(%rip),%rcx # 0x4385b0 <U_$P$PROGRAM_$$_MAS>
00000000004010E6 488d140a lea (%rdx,%rcx,1),%rdx
00000000004010EA 0f944402fd sete -0x3(%rdx,%rax,1)
project1.lpr:6 for i := 1 to 200 do
00000000004010EF 488d054a760300 lea 0x3764a(%rip),%rax # 0x438740 <U_$P$PROGRAM_$$_I>
00000000004010F6 8b00 mov (%rax),%eax
00000000004010F8 678d5001 lea 0x1(%eax),%edx
00000000004010FC 488d053d760300 lea 0x3763d(%rip),%rax # 0x438740 <U_$P$PROGRAM_$$_I>
0000000000401103 8910 mov %edx,(%rax)
0000000000401105 488d0534760300 lea 0x37634(%rip),%rax # 0x438740 <U_$P$PROGRAM_$$_I>
000000000040110C 8138c8000000 cmpl $0xc8,(%rax)
0000000000401112 7eac jle 0x4010c0 <main+48>
Here between the
cmpl at line #11 and
sete at line #14 we have just 2
lea instructions. They don't affect flags.
That confirms the
rvk assumption from his reply #8.
I have tested the above code in FPC 3.2.2 and FPC 3.3.1(trunk) with all optimization levels: FPC 3.2.2 output is incorrect in all levels, FPC 3.3.1 are all correct.
Furthermore, if we introduce an inner loop as in the original post:
begin
k := 1; j := 1;
for i := 1 to 200 do
for j := 1 to 2 do
mas[i, j] := (k = 1);
...
Then FPC 3.2.2 with -O1 or -O2 still gives a wrong result because of the same subtle EA calculation:
00000000004010E6 833d830e030001 cmpl $0x1,0x30e83(%rip) # 0x431f70 <U_$P$PROGRAM_$$_K>
00000000004010ED 488d15cc0c0300 lea 0x30ccc(%rip),%rdx # 0x431dc0 <U_$P$PROGRAM_$$_MAS>
00000000004010F4 4801d0 add %rdx,%rax
00000000004010F7 0f944408fd sete -0x3(%rax,%rcx,1)
while with level -O3 it gives the correct answer because of inner loop unrolled and replaced with 2x assignments at different constant displacements:
project1.lpr:8 mas[i, j] := (k = 1);
00000000004010C7 89c2 mov %eax,%edx
00000000004010C9 833da00e030001 cmpl $0x1,0x30ea0(%rip) # 0x431f70 <U_$P$PROGRAM_$$_K>
00000000004010D0 488d05e90c0300 lea 0x30ce9(%rip),%rax # 0x431dc0 <U_$P$PROGRAM_$$_MAS>
00000000004010D7 0f944450fe sete -0x2(%rax,%rdx,2)
00000000004010DC 8b056e0e0300 mov 0x30e6e(%rip),%eax # 0x431f50 <U_$P$PROGRAM_$$_I>
00000000004010E2 833d870e030001 cmpl $0x1,0x30e87(%rip) # 0x431f70 <U_$P$PROGRAM_$$_K>
00000000004010E9 488d15d00c0300 lea 0x30cd0(%rip),%rdx # 0x431dc0 <U_$P$PROGRAM_$$_MAS>
00000000004010F0 0f944442ff sete -0x1(%rdx,%rax,2)
That is contrary to the
Thaddy statement that higher optimization level will make things worse. Also I don't see any boolean 'expansion' - all is performed at byte level (sete/setz).
Another thing that stands out is that in the trunk the code generation for loops has been changed, in 3.2.2 we have control loop variable assigned to (start-1 ) and pre-increment, in the trunk we have assignment to the (start) and post-increment. Also, the EA calculation for element of multi-dimensional array is done with
lea instead of
add, which is the right way as it can be seen.
So, the recap is that the FPC 3.2.2 has a subtle issue with assigning a
simple boolean expression to an element of a multi-dimensional array.
The question is whether this is worth reporting when it no longer appears in the trunk?