Forum > FPC development

Minor optimisation potential

(1/1)

CuriousKit:
So I've started to write a load of assembler routines, mostly to test myself but to also squeeze some speed out of vector manipulation in games.  One of the routines I wrote was a reciprocal (inverse) square root (Intel mode):


--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---function _RSqrtPrecise(const Value: Single): Single; assembler;asm  { This produces the exact same assembler code as "Result := 1 / Sqrt(Value)" }  FLD Value  FSQRT  FLD1  FDIVRP      { This and FLD1 calculates 1 divided by what's on the stack }  FSTP Resultend;
As the comment implies, this coincidentally is the same machine code produced when "Result := 1 / Sqrt(Value);" is compiled in an equivalent Pascal procedure (it tells me that I'm learning well!).  However, within the disassembly, I noticed something interesting (appears in both the assembler and Pascal procedures):


--- Code: ---push   %ebp
mov    %esp,%ebp
lea    -0x4(%esp),%esp
; ----------------------
flds   0x8(%ebp)
fsqrt 
fld1   
fdivp  %st,%st(1)
fstps  -0x4(%ebp)
; ----------------------
flds   -0x4(%ebp)
leave 
ret    $0x4

--- End code ---

Everything between the dashed lines is the procedure itself, while what sits outside of them are lines of code inserted by the compiler to handle procedure entry and exit.  Nevertheless, the line "flds -0x4(%ebp)" caught my eye.  I'm gathering that this is the "register" convention returning a floating-point result on ST(0) as well as on the stack, but it comes right after the line "fstps  -0x4(%ebp)".

So my question, from an optimisation standpoint... what's to stop the lines...


--- Code: ---fstps  -0x4(%ebp)
flds   -0x4(%ebp)
--- End code ---

...being changed to the following?


--- Code: ---fsts   -0x4(%ebp)
--- End code ---

i.e. Store value without popping the floating-point stack, since it is immediately loaded back on anyway, and all the relevant opcodes work with ST(0).

(Of course, for an assembler routine I can imagine spotting this optimisation being a bit difficult, but would the same be true in Pascal? Admittedly I'm not sure how the compiler works internally at a deep level)

ADDENDUM: This is using FPC 3.0.0 under Windows x86.

CuriousKit:
Come to think of it, why are floating-point results returned on the call stack AND in the floating-point stack? If Pascal is meant to be fast by optimising the use of CPU registers for parameters and function results, surely this is a bit of a slowdown.

Of course, changing it so Single and Double-type returns are on the floating-point stack only would break backwards-compatibility with pure assembler functions.  The only solution I can propose for this is that it use the old method for functions with the "assembler;" directive, unless "nostackframe;" also follows.  I'm not sure what the best optimisation is here admittedly... just going by logic.

(And to make things more complicated, there are times where one may not wish to use the floating-point stack at all and use the XMM0 register instead, but that is very specific to Intel architecture)

Thaddy:

--- Quote from: CuriousKit on March 27, 2016, 11:26:06 am ---Come to think of it, why are floating-point results returned on the call stack AND in the floating-point stack?

--- End quote ---

Because the win64 ABI says so...? Doesn't mean that should be the case on all platforms, but it is a requirement for the win64 ABI.

CuriousKit:
Well in my case, I'm using Win32, although the Win64 ABI demands the use of XMM0 for floating-point returns and RAX for integer (and pointer) returns, with no mention of the stack, so it's not quite that.

Navigation

[0] Message Index

Go to full version