Recent

Author Topic: Minor optimisation potential  (Read 3646 times)

CuriousKit

  • Jr. Member
  • **
  • Posts: 78
Minor optimisation potential
« on: March 17, 2016, 03:59:35 pm »
So I've started to write a load of assembler routines, mostly to test myself but to also squeeze some speed out of vector manipulation in games.  One of the routines I wrote was a reciprocal (inverse) square root (Intel mode):

Code: Pascal  [Select][+][-]
  1. function _RSqrtPrecise(const Value: Single): Single; assembler;
  2. asm
  3.   { This produces the exact same assembler code as "Result := 1 / Sqrt(Value)" }
  4.   FLD Value
  5.   FSQRT
  6.   FLD1
  7.   FDIVRP      { This and FLD1 calculates 1 divided by what's on the stack }
  8.   FSTP Result
  9. end;

As the comment implies, this coincidentally is the same machine code produced when "Result := 1 / Sqrt(Value);" is compiled in an equivalent Pascal procedure (it tells me that I'm learning well!).  However, within the disassembly, I noticed something interesting (appears in both the assembler and Pascal procedures):

Code: [Select]
push   %ebp
mov    %esp,%ebp
lea    -0x4(%esp),%esp
; ----------------------
flds   0x8(%ebp)
fsqrt 
fld1   
fdivp  %st,%st(1)
fstps  -0x4(%ebp)
; ----------------------
flds   -0x4(%ebp)
leave 
ret    $0x4

Everything between the dashed lines is the procedure itself, while what sits outside of them are lines of code inserted by the compiler to handle procedure entry and exit.  Nevertheless, the line "flds -0x4(%ebp)" caught my eye.  I'm gathering that this is the "register" convention returning a floating-point result on ST(0) as well as on the stack, but it comes right after the line "fstps  -0x4(%ebp)".

So my question, from an optimisation standpoint... what's to stop the lines...

Code: [Select]
fstps  -0x4(%ebp)
flds   -0x4(%ebp)

...being changed to the following?

Code: [Select]
fsts   -0x4(%ebp)
i.e. Store value without popping the floating-point stack, since it is immediately loaded back on anyway, and all the relevant opcodes work with ST(0).

(Of course, for an assembler routine I can imagine spotting this optimisation being a bit difficult, but would the same be true in Pascal? Admittedly I'm not sure how the compiler works internally at a deep level)

ADDENDUM: This is using FPC 3.0.0 under Windows x86.
« Last Edit: March 17, 2016, 04:02:03 pm by CuriousKit »

CuriousKit

  • Jr. Member
  • **
  • Posts: 78
Re: Minor optimisation potential
« Reply #1 on: March 27, 2016, 11:26:06 am »
Come to think of it, why are floating-point results returned on the call stack AND in the floating-point stack? If Pascal is meant to be fast by optimising the use of CPU registers for parameters and function results, surely this is a bit of a slowdown.

Of course, changing it so Single and Double-type returns are on the floating-point stack only would break backwards-compatibility with pure assembler functions.  The only solution I can propose for this is that it use the old method for functions with the "assembler;" directive, unless "nostackframe;" also follows.  I'm not sure what the best optimisation is here admittedly... just going by logic.

(And to make things more complicated, there are times where one may not wish to use the floating-point stack at all and use the XMM0 register instead, but that is very specific to Intel architecture)

Thaddy

  • Hero Member
  • *****
  • Posts: 14373
  • Sensorship about opinions does not belong here.
Re: Minor optimisation potential
« Reply #2 on: March 27, 2016, 12:14:02 pm »
Come to think of it, why are floating-point results returned on the call stack AND in the floating-point stack?

Because the win64 ABI says so...? Doesn't mean that should be the case on all platforms, but it is a requirement for the win64 ABI.
Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

CuriousKit

  • Jr. Member
  • **
  • Posts: 78
Re: Minor optimisation potential
« Reply #3 on: March 27, 2016, 10:57:49 pm »
Well in my case, I'm using Win32, although the Win64 ABI demands the use of XMM0 for floating-point returns and RAX for integer (and pointer) returns, with no mention of the stack, so it's not quite that.

 

TinyPortal © 2005-2018