Those errors are boring So perhaps by making and external object library with masm or nasm/yasm, will be better than use internal asm ???
Still have to conform to pascal calling conventions so not much gain in doing so probably spend more time trying to get your params to your lib correctly..
I am writing some test cases, mark what is bad carry on coding and I'll try to sort out the 'annoying' errors.
I'm finding the issue is with 32bit result is not aligned so i had "movaps [RESULT], xmm0" instead of "movups [RESULT], xmm0" it's working
Under 64bit no problem, RESULT is aligned. But alway a problem with the clamp, lerp, combine, combine 2/3 functions. All others are ok in 32bit
As for this I have got this in unix64 should work for win64 I think from previous testing.
class operator TGLZVector4f.+(constref A: TGLZVector4f; constref B:Single): TGLZVector4f; assembler; nostackframe; register;
asm
movaps xmm0,[A]
movss xmm1,[B]
shufps xmm1, xmm1, $00
addps xmm0,xmm1
movhlps xmm1,xmm0
end;
Huch, you have the right Result with this ? because movhlps moving WZ value to XY value ??? if i well understood under Linux64 result is splitted and the right result is, low in xmm0 and the high is in xmm1, i'm correct ?
Re comparison operators, in the pure pascal code as I read it every element must pass the comparison test, that was not happening in the case that one element failed in the asm. So it passed my tests with the following which also avoids branching. Comments please before I change a lot of code.
cmpps xmm0, xmm1, cSSE_OPERATOR_LESS_OR_EQUAL
movmskps eax, xmm0 // copies a 4 bit mask to eax
xor eax, $f // only 1111 should should be correct for anded compares.
setz al // true if zero
Edit 1 Negate fails tests that mask is doing a multiply by -1 not setting all items negative as the pascal code. Though I suspect the pascal code is wrong. Never had a use for setting all negative whereas *-1 is vector reverse.
I've tested it work, but result is wrong
if v1 = v2 then Cells[1,25] := 'TRUE' else Cells[1,25] := 'FALSE';
the ZEROFLAG is not set under 64bit so always return TRUE, but with 32bit your function is ok and return the right result
For negate you have right, under 64bit the result is wrong normaly in our sample the sign of the Y value should change. Under 32bit the function return the correct result.
For X*-1 is equal as 0 - X so i've choose this latest Sub is normaly fastest than Mul.
I'm having some difficulty compiling the latest version of the unit from BeanzMaster - the GLZTypes unit has an awkward dependency on GLZVectorMath and others, since TGLZVector and TGLZVector2i are not defined. It's easy enough to fix, but it means that GLZTypes is not self-contained.
Ouch sorry i've forget to delete the TGLZVectorX in the GLZType, this unit is only used by GLZMath, this is due because i've added the MinXYZComponent and the MAxXYZComponent and this 2 using the function Min3s and Max3s in GLZMath unit; So you can just copy / past this 2 functions in GLZVectorMath unit and delete the dependency of the GLZMath unit. Or simply comment the MinXYZ/MaxXYZComponent functions
. This comes from my own project. Sorry