So I've started to write a load of assembler routines, mostly to test myself but to also squeeze some speed out of vector manipulation in games. One of the routines I wrote was a reciprocal (inverse) square root (Intel mode):
function _RSqrtPrecise(const Value: Single): Single; assembler;
asm
{ This produces the exact same assembler code as "Result := 1 / Sqrt(Value)" }
FLD Value
FSQRT
FLD1
FDIVRP { This and FLD1 calculates 1 divided by what's on the stack }
FSTP Result
end;
As the comment implies, this coincidentally is the same machine code produced when "Result := 1 / Sqrt(Value);" is compiled in an equivalent Pascal procedure (it tells me that I'm learning well!). However, within the disassembly, I noticed something interesting (appears in both the assembler and Pascal procedures):
push %ebp
mov %esp,%ebp
lea -0x4(%esp),%esp
; ----------------------
flds 0x8(%ebp)
fsqrt
fld1
fdivp %st,%st(1)
fstps -0x4(%ebp)
; ----------------------
flds -0x4(%ebp)
leave
ret $0x4
Everything between the dashed lines is the procedure itself, while what sits outside of them are lines of code inserted by the compiler to handle procedure entry and exit. Nevertheless, the line "flds -0x4(%ebp)" caught my eye. I'm gathering that this is the "register" convention returning a floating-point result on ST(0) as well as on the stack, but it comes right after the line "fstps -0x4(%ebp)".
So my question, from an optimisation standpoint... what's to stop the lines...
fstps -0x4(%ebp)
flds -0x4(%ebp)
...being changed to the following?
fsts -0x4(%ebp)
i.e. Store value without popping the floating-point stack, since it is immediately loaded back on anyway, and all the relevant opcodes work with ST(0).
(Of course, for an assembler routine I can imagine spotting this optimisation being a bit difficult, but would the same be true in Pascal? Admittedly I'm not sure how the compiler works internally at a deep level)
ADDENDUM: This is using FPC 3.0.0 under Windows x86.