Minor optimisation potential

CuriousKit

Jr. Member
Posts: 78

So I've started to write a load of assembler routines, mostly to test myself but to also squeeze some speed out of vector manipulation in games. One of the routines I wrote was a reciprocal (inverse) square root (Intel mode):

Code: Pascal [Select][+]

function _RSqrtPrecise(const Value: Single): Single; assembler;
asm
  { This produces the exact same assembler code as "Result := 1 / Sqrt(Value)" }
  FLD Value
  FSQRT
  FLD1
  FDIVRP      { This and FLD1 calculates 1 divided by what's on the stack }
  FSTP Result
end;

As the comment implies, this coincidentally is the same machine code produced when "Result := 1 / Sqrt(Value);" is compiled in an equivalent Pascal procedure (it tells me that I'm learning well!). However, within the disassembly, I noticed something interesting (appears in both the assembler and Pascal procedures):

Code: [Select]

push   %ebp
mov    %esp,%ebp
lea    -0x4(%esp),%esp
; ----------------------
flds   0x8(%ebp)
fsqrt  
fld1   
fdivp  %st,%st(1)
fstps  -0x4(%ebp)
; ----------------------
flds   -0x4(%ebp)
leave  
ret    $0x4

Everything between the dashed lines is the procedure itself, while what sits outside of them are lines of code inserted by the compiler to handle procedure entry and exit. Nevertheless, the line "flds -0x4(%ebp)" caught my eye. I'm gathering that this is the "register" convention returning a floating-point result on ST(0) as well as on the stack, but it comes right after the line "fstps -0x4(%ebp)".

So my question, from an optimisation standpoint... what's to stop the lines...

Code: [Select]

fstps  -0x4(%ebp)
flds   -0x4(%ebp)

...being changed to the following?

Code: [Select]

fsts -0x4(%ebp)
i.e. Store value without popping the floating-point stack, since it is immediately loaded back on anyway, and all the relevant opcodes work with ST(0).

(Of course, for an assembler routine I can imagine spotting this optimisation being a bit difficult, but would the same be true in Pascal? Admittedly I'm not sure how the compiler works internally at a deep level)

ADDENDUM: This is using FPC 3.0.0 under Windows x86.

« Last Edit: March 17, 2016, 04:02:03 pm by CuriousKit »

Logged

CuriousKit

Jr. Member
Posts: 78

Re: Minor optimisation potential

« Reply #1 on: March 27, 2016, 11:26:06 am »

Come to think of it, why are floating-point results returned on the call stack AND in the floating-point stack? If Pascal is meant to be fast by optimising the use of CPU registers for parameters and function results, surely this is a bit of a slowdown.

Of course, changing it so Single and Double-type returns are on the floating-point stack only would break backwards-compatibility with pure assembler functions. The only solution I can propose for this is that it use the old method for functions with the "assembler;" directive, unless "nostackframe;" also follows. I'm not sure what the best optimisation is here admittedly... just going by logic.

(And to make things more complicated, there are times where one may not wish to use the floating-point stack at all and use the XMM0 register instead, but that is very specific to Intel architecture)

Logged

Thaddy

Hero Member
Posts: 14387
Sensorship about opinions does not belong here.

Re: Minor optimisation potential

« Reply #2 on: March 27, 2016, 12:14:02 pm »

Quote from: CuriousKit on March 27, 2016, 11:26:06 am

Come to think of it, why are floating-point results returned on the call stack AND in the floating-point stack?

Because the win64 ABI says so...? Doesn't mean that should be the case on all platforms, but it is a requirement for the win64 ABI.

Logged

Object Pascal programmers should get rid of their "component fetish" especially with the non-visuals.

CuriousKit

Jr. Member
Posts: 78

Re: Minor optimisation potential

« Reply #3 on: March 27, 2016, 10:57:49 pm »

Well in my case, I'm using Win32, although the Win64 ABI demands the use of XMM0 for floating-point returns and RAX for integer (and pointer) returns, with no mention of the stack, so it's not quite that.

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: Minor optimisation potential (Read 3650 times)

CuriousKit

Minor optimisation potential

CuriousKit

Re: Minor optimisation potential

Thaddy

Re: Minor optimisation potential

CuriousKit

Re: Minor optimisation potential

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook