I think i found a bit about the difference.
It's a mixture of a bad optimization and Intel.
So first the optimization.
The difference between x86 and x64 for UInt32 mostly comes from an doubled multiplication.
x86 do [ lea (%edx,%edx,2),%edx ] but x64 do [ imul $0x3,%edx,%edx ].
The Instruction Table document say, there should no speed difference but it seems there is little.
The difference between x86 Uint32 and x86 Int32 come from the needed IDIV instruction because of signed type
Now we come to the most Interesting point.
A combination of bad optimization and Intel.
Both belong together, because it is only an Intel thing.
Under X64 fpc generate code that use 64-bit registers (%rax and %rcx) that is not necessary. => Bad Optimization (for Intel)
Because Intel need after the document a lot more time and power if we use 64-bit registers.
On AMD it make no difference witch registers we use. Every register take the same time.
So it is a combination of bad optimization and Intel.
CPU: Intel i7-3610QM
FPC: 3.1.1@38232
Options: -O3 -OpCOREAVX -CpCOREAVX
Times
UInt32 x86: 4.897 s, 4.934 s, 4.873 s
Int32 x86 : 7.686 s, 7.675 s, 7.668 s
UInt32 x64: 5.150 s, 5.180 s, 5.228 s
Int32 x64 : 23.515 s, 23.409 s, 23.424 s
Instruction Tables:
http://www.agner.org/optimize/instruction_tables.pdfThe code i used:
program project1;
{$mode objfpc}
uses
{$IFDEF UNIX}{$IFDEF UseCThreads}
cthreads,
{$ENDIF}{$ENDIF}
SysUtils;
type
TTestType = UInt32;
//TTestType = Int32;
const
cMax = High(Int32);
//cMax = 50000000;
//cMax = 5;
procedure ModTest(const N: TTestType);
var
Q: TTestType;
begin
Q := N mod 3;
end;
var
Tm: Double;
I: TTestType;
begin
Writeln('Running to ', cMax);
Tm := Now;
for I := 1 to cMax do
begin
ModTest(I);
end;
Tm := Now - Tm;
WriteLn(Format('ModTest() took %.3f seconds', [Tm * 24 * 60 * 60]));
end.
x86 UInt32:
project1.lpr:22 Q := N mod 3;
08048112 b8abaaaaaa mov $0xaaaaaaab,%eax
08048117 f7e1 mul %ecx
08048119 d1ea shr %edx
0804811B 8d1452 lea (%edx,%edx,2),%edx
0804811E 29d1 sub %edx,%ecx
x86 Int32:
project1.lpr:22 Q := N mod 3;
08048110 99 cltd
08048111 b903000000 mov $0x3,%ecx
08048116 f7f9 idiv %ecx
x64 UInt32:
project1.lpr:22 Q := N mod 3;
0000000000400200 b8abaaaaaa mov $0xaaaaaaab,%eax
0000000000400205 f7e7 mul %edi
0000000000400207 d1ea shr %edx
0000000000400209 6bd203 imul $0x3,%edx,%edx
000000000040020C 29d7 sub %edx,%edi
x64 Int32:
project1.lpr:22 Q := N mod 3;
0000000000400202 4863c0 movslq %eax,%rax
0000000000400205 4899 cqto
0000000000400207 b903000000 mov $0x3,%ecx
000000000040020C 48f7f9 idiv %rcx