So I installed FPC again, for yet another try, as I'm getting a bit tired of DB coding MMX/SSE/AVX instructions in Virtual Pascal.
Very, very,very obviously my version of
LIFT (in lift32bit.rar) that's nearly fully recoded into inline assembler does not compile, but after changing eight definitions of variables in hhcommon.pas from longint to word (for the DOS GetTime function), the "Pure Pascal" version compiles without problems.
However, it ABENDs (z/OS parlance for crash) in "readfile" with
RunTime Error 2130567168
Error address $00000000
Which is less than useful...
So let's just look a bit at the generated code...
My benchmark is a routine that does a Shellsort on an array of pointers, comparing two variables in the list items the pointers point to. The code generated for this, enabling
all optimizations in the IDE makes me very sad, as it's hardly better than what Turbo Pascal used to generate more than three decades ago! Obviously I don't expect it the code to come anywhere close to manually optimized code, but this code generated doesn't even try:
; [256] sort_wptr := wait^[_i];
mov eax,dword ptr [ebp-544]
mov edx,dword ptr [eax]
mov eax,dword ptr [ebp-524]
mov eax,dword ptr [edx+eax*4-4]
mov dword ptr [ebp-540],eax
; [258] sort_trip := wait^[_i]^.trip;
mov eax,dword ptr [ebp-544]
mov edx,dword ptr [eax]
mov eax,dword ptr [ebp-524]
mov edx,dword ptr [edx+eax*4-4]
mov eax,dword ptr [edx+8]
mov dword ptr [ebp-532],eax
; [259] sort_cnty := wait^[_i]^.s_cnty;
mov eax,dword ptr [ebp-544]
mov ecx,dword ptr [eax]
mov edx,dword ptr [ebp-524]
mov eax,dword ptr [ecx+edx*4-4]
mov eax,dword ptr [eax+64]
mov dword ptr [ebp-4],eax
; [260] sort_year := wait^[_i]^.date.dyear;
mov eax,dword ptr [ebp-544]
mov edx,dword ptr [eax]
mov eax,dword ptr [ebp-524]
mov edx,dword ptr [edx+eax*4-4]
mov eax,dword ptr [edx+104]
mov dword ptr [ebp-536],eax
; [261] sort_wtime:= wait^[_i]^.wtime;
mov eax,dword ptr [ebp-544]
mov edx,dword ptr [eax]
mov eax,dword ptr [ebp-524]
mov edx,dword ptr [edx+eax*4-4]
mov eax,dword ptr [edx+68]
mov dword ptr [ebp-528],eax
My equivalent, hand-optimized, code
mov edx, [ebx * 4 + esi]
mov sort_wptr, edx
mov eax, [edx + offset lift_list.trip]
mov sort_trip, eax
mov eax, [edx + offset lift_list.s_cnty]
mov sort_cnty, eax
mov eax, [edx + offset lift_list.date.dyear]
mov sort_year, eax
mov eax, [edx + offset lift_list.wtime]
mov sort_wtime, eax
Common sub-expression elimination? No...
Register variables? No...
Other missed optimizations?
Using FISTTP for truncation? No...
Using FWAIT AD 2017 on a CPU that supports AVX? Ouch...
and code like this
; [715] inc(_minmax[0].max.km, ltd_ptr^.dtv.km);
mov eax,dword ptr [dword ptr TC_$HHCOMMON_$$_LTD_PTR]
mov eax,dword ptr [eax+20]
add dword ptr [dword ptr U_$HHCOMMON_$$__MINMAX+8],eax
; [716] inc(_minmax[0].max.time, ltd_ptr^.dtv.time);
mov eax,dword ptr [dword ptr TC_$HHCOMMON_$$_LTD_PTR]
mov eax,dword ptr [eax+24]
add dword ptr [dword ptr U_$HHCOMMON_$$__MINMAX+12],eax
is screaming out for MMX/XMM conversion.
Sigh... or in other words, unless I'm doing something very wrong, I'm pretty disappointed in FPC's optimizing capabilities...