AVX and SSE support question

dicepd

Full Member
Posts: 163

Re: AVX and SSE support question

« Reply #165 on: December 24, 2017, 09:22:37 am »

Jerome,

Here are the helpers for *nix 64 and 32 bit, along with the main sources which include the three new methods in the base class. The avx helpers are just stubs at the moment.

AverageNorm4 is now completed, needs 1e-7 in the test as epsilon.

Also in a folder is the example of this single function to show a 'submission' request for a 'new' method using the template.

Bit painful this as you put your helpers in the main code file so I had to sort out quite a bit before I could even compile and start work. Lots of minor problems with case etc, but no point listing all these when I can sort them out later.

Ready for more functions to work on now

Merry Christmas

HelpersUnix64_32.zip (47.81 kB - downloaded 160 times.)

Logged

Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

BeanzMaster

Sr. Member
Posts: 268

Re: AVX and SSE support question

« Reply #166 on: December 26, 2017, 08:00:39 pm »

Hi Peter

Hi Peter

I just created a repository for our 'SIMD Vector Math Unit Tester' on Github https://github.com/jdelauney/SIMD-VectorMath-UnitTest

I also added type TGLZVector2f and implement some functions in SSE (Win64) and tests

See you soon

Logged

dicepd

Full Member
Posts: 163

Re: AVX and SSE support question

« Reply #167 on: December 26, 2017, 09:31:38 pm »

Hi Jerome,

My github handle is the same as here, I wiil do a pull and get some minor fixes diffed up.

Peter

Logged

Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

Full Member
Posts: 163

Re: AVX and SSE support question

« Reply #168 on: December 27, 2017, 02:36:46 am »

Ok I have the Unix 64 bit 2d vector working in 7 local commits along with the two native targets. Just awaiting the ability to push now.

Logged

Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

Full Member
Posts: 163

Re: AVX and SSE support question

« Reply #169 on: December 28, 2017, 09:34:18 am »

My first checkin of getting unix working along with the start of vector2f and vector4f structural changes. Will continue with all other configs so they compile by setting up stubs for work that needs doing.

Added a .gitgnore in the project dir so git status does not fill the screen with crap.

Plane seems to be broken in win64 now, got normalize working by adding some var initialisation to overridden setup. (plane functions never worked in unix.)

Removed lps files.

I have a local mod here, not checked in, to change the utils xml handling from widestring to utf8 (more lazarus friendly in my opinion) will hold this till you agree. Otherwise I have other work (ifdefs) to make the xml work in unix.

Logged

Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

BeanzMaster

Sr. Member
Posts: 268

Re: AVX and SSE support question

« Reply #170 on: December 28, 2017, 09:57:45 am »

No problem for me for handling xml from WideString to UTF8

Logged

dicepd

Full Member
Posts: 163

Re: AVX and SSE support question

« Reply #171 on: December 28, 2017, 12:42:43 pm »

32 bit unix added for sse.

For single results in 32bit the ABI wants the result value in st0.

We already have the result in xmm0 (st0 by another name when in mmx mode)

So I cannot use nostackframe and have to copy the result to the stack, the compiler then copies this value on the stack back to st0.

Anyone any ideas on a method to not have to do this stack copy and just leave the value in xmm0?

Logged

Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

Full Member
Posts: 163

Re: AVX and SSE support question

« Reply #172 on: December 28, 2017, 05:39:14 pm »

Jerome,

Everything that is not win64 is created, stubbed and runs, I am not saying it works, just you can run any test in unix64, 32 and win32 without the compiler complaining or runtime generating a seg fault.

I am sure this will not last as you get some more routines started, but I will try to keep it at least in this state, so you can concentrate on just win64 and one codebase.

Logged

Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

SonnyBoyXXl

Jr. Member
Posts: 57

Re: AVX and SSE support question

« Reply #173 on: January 04, 2018, 10:56:33 am »

Hi all,
I've finished the translation of the DirectX Math headers and test now the functions. I got a problem with this one:

Code: Pascal [Select][+]

function XMVectorSetBinaryConstant(constref C0: UINT32; constref C1: UINT32; constref C2: UINT32; constref C3: UINT32): TXMVECTOR;{ assembler;}
const
    g_vMask1: TXMVECTORU32 = (u: (1, 1, 1, 1));
asm
           // Move the parms to a vector
           // __m128i vTemp = _mm_set_epi32(C3,C2,C1,C0);
           MOVUPS        XMM0,TXMVECTOR([c3])
           MOVUPS        XMM1,TXMVECTOR([c2])
           MOVUPS        XMM2,TXMVECTOR([c1])
           MOVUPS        XMM3,TXMVECTOR([c0])
           PUNPCKLDQ   XMM3,XMM1
           PUNPCKLDQ   XMM2,XMM0
           PUNPCKLDQ   XMM3,XMM2  // XMM3 = vTemp
           // Mask off the low bits
           PAND    XMM3, [g_vMask1] // vTemp = _mm_and_si128(vTemp,g_vMask1);
           // 0xFFFFFFFF on true bits
           PCMPEQD XMM3, [g_vMask1] // vTemp = _mm_cmpeq_epi32(vTemp,g_vMask1);
           // 0xFFFFFFFF -> 1.0f, 0x00000000 -> 0.0f
           PAND    XMM3, [g_XMOne] // vTemp = _mm_and_si128(vTemp,g_XMOne);
           MOVUPS  TXMVECTOR([result]), XMM3// return _mm_castsi128_ps(vTemp);
end;   

The result in XMM3 is correct, as I see this in the debugger. But the function doesn't return the result.
Debuging the value of result gives a strange behavior. The result is available at a breakpoint at MOVUPS but not at end. See attachted pictures.
What is here wrong? I use

Code: Pascal [Select][+]

{$ASMMODE intel}
{$Z4}
{$CODEALIGN CONSTMIN=16}
{$A4}  

and compiler flag -CfSSE.

The casting of constants C0, C1, C2, C3 as TXMVector is to avoid a compiler hint that MOVUPS needs a M128 adress. I no casting is done, the result is the same.

Debug2.PNG (24.01 kB, 904x250 - viewed 410 times.)

Debug1.PNG (21.26 kB, 657x321 - viewed 379 times.)

Logged

CuriousKit

Jr. Member
Posts: 78

Re: AVX and SSE support question

« Reply #174 on: January 04, 2018, 11:50:25 pm »

Try looking at the disassembly of the program to see what it's doing in the function epilogue, and to also see what Result actually represents (likely a pre-reserved block of memory).

Logged

BeanzMaster

Sr. Member
Posts: 268

Re: AVX and SSE support question

« Reply #175 on: January 05, 2018, 12:33:49 am »

Hi
1st instead of cast and movups use "movq"
2nd for const access in you are in a 64 bit system use the "RIP" mov xmm0, [RIP+MyConst]
3rd don't cast result Mov {REsult], xmm0 is enought

And like CuriousKit say, take a look in th .s file (see compiler -a options)

Logged

SonnyBoyXXl

Jr. Member
Posts: 57

Re: AVX and SSE support question

« Reply #176 on: January 05, 2018, 01:25:25 am »

I'v found some time today to work on that problem.
First I changed the ASM code. I've checked how M$ VS 2017 handles the _mm_set_epi32 intrinsic. This is the
new routine:

Code: Pascal [Select][+]

function XMVectorSetBinaryConstant(constref C0: UINT32; constref C1: UINT32; constref C2: UINT32; constref C3: UINT32): TXMVECTOR;
     assembler;
const
    g_vMask1: TXMVECTORU32 = (u: (1, 1, 1, 1));
asm
           // Move the parms to a vector
           // __m128i vTemp = _mm_set_epi32(C3,C2,C1,C0);
           movd        xmm0,dword ptr [C3]
           movd        xmm1,dword ptr[C2]
           movd        xmm2,dword ptr[C1]
           movd        xmm3,dword ptr[C0]
           punpckldq   xmm3,xmm1
           punpckldq   xmm2,xmm0
           punpckldq   xmm3,xmm2 // XMM3 = vTemp
           // Mask off the low bits
           PAND    XMM3, [g_vMask1] // vTemp = _mm_and_si128(vTemp,g_vMask1);
           // 0xFFFFFFFF on true bits
           PCMPEQD XMM3, [g_vMask1] // vTemp = _mm_cmpeq_epi32(vTemp,g_vMask1);
           // 0xFFFFFFFF -> 1.0f, 0x00000000 -> 0.0f
           PAND    XMM3, [g_XMOne] // vTemp = _mm_and_si128(vTemp,g_XMOne);
           MOVUPS  TXMVECTOR([result]), XMM3// return _mm_castsi128_ps(vTemp);
end;       

When I now make a breakpoint at the "movd xmm1,dword ptr[C2]" line. I see in the debugger that the value of XMM0 is not what it should be.
Now I looked at the .s file.

Quote

DIRECTX.MATH_$$_XMVECTORSETBINARYCONSTANT$LONGWORD$LONGWORD$LONGWORD$LONGWORD$$TXMVECTOR:
.Lc128:
.Ll314:
# [2903] g_vMask1: TXMVECTORU32 = (u: (1, 1, 1, 1));
   pushl   %ebp
.Lc130:
.Lc131:
   movl   %esp,%ebp
.Lc132:
# Var C0 located in register eax
# Var C1 located in register edx
# Var C2 located in register ecx
# Var C3 located at ebp+12, size=OS_32
# Var $result located at ebp+8, size=OS_32
.Ll315:
# [2907] movd xmm0,dword ptr [C3]
   movd   12(%ebp),%xmm0
.Ll316:
# [2908] movd xmm1,dword ptr[C2]
   movd   (%ecx),%xmm1
.Ll317:
# [2909] movd xmm2,dword ptr[C1]
   movd   (%edx),%xmm2
.Ll318:
# [2910] movd xmm3,dword ptr[C0]
   movd   (%eax),%xmm3
.Ll319:
# [2911] punpckldq xmm3,xmm1
   punpckldq   %xmm1,%xmm3
.Ll320:
# [2912] punpckldq xmm2,xmm0
   punpckldq   %xmm0,%xmm2
.Ll321:
# [2913] punpckldq xmm3,xmm2 // XMM3 = vTemp
   punpckldq   %xmm2,%xmm3
.Ll322:
# [2915] PAND XMM3, [g_vMask1] // vTemp = _mm_and_si128(vTemp,g_vMask1);
   pand   TC_$DIRECTX.MATH$_$XMVECTORSETBINARYCONSTANT$crcD1D7FBA5_$$_G_VMASK1,%xmm3
.Ll323:
# [2917] PCMPEQD XMM3, [g_vMask1] // vTemp = _mm_cmpeq_epi32(vTemp,g_vMask1);
   pcmpeqd   TC_$DIRECTX.MATH$_$XMVECTORSETBINARYCONSTANT$crcD1D7FBA5_$$_G_VMASK1,%xmm3
.Ll324:
# [2919] PAND XMM3, [g_XMOne] // vTemp = _mm_and_si128(vTemp,g_XMOne);
   pand   TC_$DIRECTX.MATH_$$_G_XMONE,%xmm3
.Ll325:
# [2920] MOVUPS TXMVECTOR([result]), XMM3// return _mm_castsi128_ps(vTemp);
   movups   %xmm3,8(%ebp)
.Ll326:
# [2921] end;
   leave
   ret   $8
.Lc129:
.Lt14:
.Ll327:

The C3 ist located on the stack. So I change the function to

Code: Pascal [Select][+]

function XMVectorSetBinaryConstant(constref C0: UINT32; constref C1: UINT32; constref C2: UINT32; const C3: UINT32): TXMVECTOR;
     assembler;  

The .s output is

Quote

DIRECTX.MATH_$$_XMVECTORSETBINARYCONSTANT$LONGWORD$LONGWORD$LONGWORD$LONGWORD$$TXMVECTOR:
.Lc128:
.Ll314:
# [2903] g_vMask1: TXMVECTORU32 = (u: (1, 1, 1, 1));
   pushl   %ebp
.Lc130:
.Lc131:
   movl   %esp,%ebp
.Lc132:
# Var C0 located in register eax
# Var C1 located in register edx
# Var C2 located in register ecx
# Var C3 located at ebp+12, size=OS_32
# Var $result located at ebp+8, size=OS_32
.Ll315:
# [2907] movd xmm0,dword ptr [C3]
   movd   12(%ebp),%xmm0
.Ll316:
# [2908] movd xmm1,dword ptr[C2]
   movd   (%ecx),%xmm1
.Ll317:
# [2909] movd xmm2,dword ptr[C1]
   movd   (%edx),%xmm2
.Ll318:
# [2910] movd xmm3,dword ptr[C0]
   movd   (%eax),%xmm3
.Ll319:
# [2911] punpckldq xmm3,xmm1
   punpckldq   %xmm1,%xmm3
.Ll320:
# [2912] punpckldq xmm2,xmm0
   punpckldq   %xmm0,%xmm2
.Ll321:
# [2913] punpckldq xmm3,xmm2 // XMM3 = vTemp
   punpckldq   %xmm2,%xmm3
.Ll322:
# [2915] PAND XMM3, [g_vMask1] // vTemp = _mm_and_si128(vTemp,g_vMask1);
   pand   TC_$DIRECTX.MATH$_$XMVECTORSETBINARYCONSTANT$crcD1D7FBA5_$$_G_VMASK1,%xmm3
.Ll323:
# [2917] PCMPEQD XMM3, [g_vMask1] // vTemp = _mm_cmpeq_epi32(vTemp,g_vMask1);
   pcmpeqd   TC_$DIRECTX.MATH$_$XMVECTORSETBINARYCONSTANT$crcD1D7FBA5_$$_G_VMASK1,%xmm3
.Ll324:
# [2919] PAND XMM3, [g_XMOne] // vTemp = _mm_and_si128(vTemp,g_XMOne);
   pand   TC_$DIRECTX.MATH_$$_G_XMONE,%xmm3
.Ll325:
# [2920] MOVUPS TXMVECTOR([result]), XMM3// return _mm_castsi128_ps(vTemp);
   movups   %xmm3,8(%ebp)
.Ll326:
# [2921] end;
   leave
   ret   $8
.Lc129:
.Lt14:
.Ll327:

As you see, the output is the same. But most of all, the value in XMM0 is now valid.

The only problem remain is that the result is still not valid.
If I change the routine that also the result is in a register and not on the stack everythink works perfekt (this means, I pass a TXMVector as input instead of the four UINT32. So I have the in-var in a register and also the out-var).
Seems this is a problem when result lays on the stack?
And I have found this post https://forum.lazarus.freepascal.org/index.php?topic=29097.0
This is the bug tracker https://bugs.freepascal.org/view.php?id=32710#c104254.

So I think the problem is the same on Windows?

« Last Edit: January 05, 2018, 01:32:35 am by SonnyBoyXXl »

Logged

SonnyBoyXXl

Jr. Member
Posts: 57

Re: AVX and SSE support question

« Reply #177 on: January 06, 2018, 05:54:01 pm »

I got the function now running with this modifications:

Code: Pascal [Select][+]

function XMVectorSetBinaryConstant(const C0: UINT32; const C1: UINT32; const C2: UINT32; const C3: UINT32): PXMVECTOR;
const
    g_vMask1: TXMVECTORU32 = (u: (1, 1, 1, 1));
var
    x: TXMVECTOR;
begin
    asm
               // Move the parms to a vector
               // __m128i vTemp = _mm_set_epi32(C3,C2,C1,C0);
               MOVD        XMM0, [C3]
               MOVD        XMM1, [C2]
               MOVD        XMM2, [C1]
               MOVD        XMM3, [C0]
               PUNPCKLDQ   XMM3,XMM1
               PUNPCKLDQ   XMM2,XMM0
               PUNPCKLDQ   XMM3,XMM2 // XMM3 = vTemp
               // Mask off the low bits
               PAND    XMM3, [g_vMask1] // vTemp = _mm_and_si128(vTemp,g_vMask1);
               // 0xFFFFFFFF on true bits
               PCMPEQD XMM3, [g_vMask1] // vTemp = _mm_cmpeq_epi32(vTemp,g_vMask1);
               // 0xFFFFFFFF -> 1.0f, 0x00000000 -> 0.0f
               PAND    XMM3, [g_XMOne] // vTemp = _mm_and_si128(vTemp,g_XMOne);
               MOVUPS  [x], XMM3// return _mm_castsi128_ps(vTemp);
    end;
    Result := @x;
end; 

This is the .s output:

Quote

DIRECTX.MATH_$$_XMVECTORSETBINARYCONSTANT$LONGWORD$LONGWORD$LONGWORD$LONGWORD$$PXMVECTOR:
.Lc128:
.Ll314:
# [2925] begin
   pushl   %ebp
.Lc130:
.Lc131:
   movl   %esp,%ebp
.Lc132:
   leal   -80(%esp),%esp
# Var C0 located at ebp-16, size=OS_32
# Var C1 located at ebp-32, size=OS_32
# Var C2 located at ebp-48, size=OS_32
# Var C3 located at ebp+8, size=OS_32
# Var $result located at ebp-64, size=OS_32
# Var x located at ebp-80, size=OS_NO
   movl   %eax,-16(%ebp)
   movl   %edx,-32(%ebp)
   movl   %ecx,-48(%ebp)
# CPU PENTIUM
.Ll315:
# [2929] movd xmm0, [C3]
   movd   8(%ebp),%xmm0
.Ll316:
# [2930] movd xmm1, [C2]
   movd   -48(%ebp),%xmm1
.Ll317:
# [2931] movd xmm2, [C1]
   movd   -32(%ebp),%xmm2
.Ll318:
# [2932] movd xmm3, [C0]
   movd   -16(%ebp),%xmm3
.Ll319:
# [2933] punpckldq xmm3,xmm1
   punpckldq   %xmm1,%xmm3
.Ll320:
# [2934] punpckldq xmm2,xmm0
   punpckldq   %xmm0,%xmm2
.Ll321:
# [2935] punpckldq xmm3,xmm2 // XMM3 = vTemp
   punpckldq   %xmm2,%xmm3
.Ll322:
# [2937] PAND XMM3, [g_vMask1] // vTemp = _mm_and_si128(vTemp,g_vMask1);
   pand   TC_$DIRECTX.MATH$_$XMVECTORSETBINARYCONSTANT$LONGWORD$LONGWORD$LONGWORD$LONGWORD$$PXMVECTOR_$$_G_VMASK1,%xmm3
.Ll323:
# [2939] PCMPEQD XMM3, [g_vMask1] // vTemp = _mm_cmpeq_epi32(vTemp,g_vMask1);
   pcmpeqd   TC_$DIRECTX.MATH$_$XMVECTORSETBINARYCONSTANT$LONGWORD$LONGWORD$LONGWORD$LONGWORD$$PXMVECTOR_$$_G_VMASK1,%xmm3
.Ll324:
# [2941] PAND XMM3, [g_XMOne] // vTemp = _mm_and_si128(vTemp,g_XMOne);
   pand   TC_$DIRECTX.MATH_$$_G_XMONE,%xmm3
.Ll325:
# [2942] MOVUPS
, XMM3// return _mm_castsi128_ps(vTemp);

   movups   %xmm3,-80(%ebp)
# CPU PENTIUM
.Ll326:
# [2944] result:=@x;
   leal   -80(%ebp),%eax
   movl   %eax,-64(%ebp)
.Ll327:
# [2945] end;
   movl   %ebp,%esp
   popl   %ebp
   ret   $4
.Lc129:
.Lt14:
.Ll328:

Why is this working?

Logged

dicepd

Full Member
Posts: 163

Re: AVX and SSE support question

« Reply #178 on: January 06, 2018, 09:55:05 pm »

When you have parameters or returns on the stack you have to look at the size to try to work out if it is a value or a pointer.

if the return is a pointer then you can use something like this.

Code: Pascal [Select][+]

  mov    ebx,  [Result]
  vmovups [ebx], xmm0                 
 

for parameter pointers which are one the stack you will require something like this

Code: Pascal [Select][+]

  mov    ebx,  [right]
  movups xmm5, [ebx]       

32 bit usually puts pointer for most things on the stack.

Looking at your case you declared a local variable which was allocated space on the stack which is why you have the following

Code: Pascal [Select][+]

movups   %xmm3,-80(%ebp)

this is a value on the stack not a pointer.

« Last Edit: January 06, 2018, 10:03:39 pm by dicepd »

Logged

Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

SonnyBoyXXl

Jr. Member
Posts: 57

Re: AVX and SSE support question

« Reply #179 on: January 07, 2018, 12:36:29 am »

Yes, it is really confusing.

I've now continue testing and have now another function:

Code: Pascal [Select][+]

function XMVectorSet(const x, y, z, w: single): TXMVECTOR; assembler;
asm
               MOVD        XMM0, [w]
               MOVD        XMM1, [z]
               MOVD        XMM2, [y]
               MOVD        XMM3, [x]
               PUNPCKLDQ   XMM3,XMM1
               PUNPCKLDQ   XMM2,XMM0
               PUNPCKLDQ   XMM3,XMM2
               MOVUPS  [result], XMM3 // _mm_set_ps( w, z, y, x );
end;  

As you see, this is the same assembler code as the first part of XMVectorSetBinaryConstant. The difference is that the input parameters are of type single.
Therefore the .s output is

Quote

DIRECTX.MATH_$$_XMVECTORSET$SINGLE$SINGLE$SINGLE$SINGLE$$TXMVECTOR:
.Lc261:
.Ll822:
# [5426] asm
   pushl   %ebp
.Lc263:
.Lc264:
   movl   %esp,%ebp
.Lc265:
# Var $result located in register eax
# Var x located at ebp+20, size=OS_F32
# Var y located at ebp+16, size=OS_F32
# Var z located at ebp+12, size=OS_F32
# Var w located at ebp+8, size=OS_F32
.Ll823:
# [5427] MOVD XMM0, [w]
   movd   8(%ebp),%xmm0
.Ll824:
# [5428] MOVD XMM1, [z]
   movd   12(%ebp),%xmm1
.Ll825:
# [5429] MOVD XMM2, [y]
   movd   16(%ebp),%xmm2
.Ll826:
# [5430] MOVD XMM3,

   movd   20(%ebp),%xmm3
.Ll827:
# [5431] PUNPCKLDQ XMM3,XMM1
   punpckldq   %xmm1,%xmm3
.Ll828:
# [5432] PUNPCKLDQ XMM2,XMM0
   punpckldq   %xmm0,%xmm2
.Ll829:
# [5433] PUNPCKLDQ XMM3,XMM2
   punpckldq   %xmm2,%xmm3
.Ll830:
# [5434] MOVUPS [result], XMM3 // _mm_set_ps( w, z, y, x );
   movups   %xmm3,(%eax)
.Ll831:
# [5435] end;
   leave
   ret   $16

the difference is that here the result is in an register.

So what comes out is:

Same routine, input params as UINT32:

Quote

# Var C0 located in register eax
# Var C1 located in register edx
# Var C2 located in register ecx
# Var C3 located at ebp+12, size=OS_32
# Var $result located at ebp+8, size=OS_32

--> not working directly, address of result is on the stack, must be loaded into register

input params as SINGLE:

Quote

# Var $result located in register eax
# Var C0 located at ebp+20, size=OS_F32
# Var C1 located at ebp+16, size=OS_F32
# Var C2 located at ebp+12, size=OS_F32
# Var C3 located at ebp+8, size=OS_F32

--> working, cause address of result is located in register

I've added your comment about the stack parameter in the routine, and is working now.

Code: Pascal [Select][+]

function XMVectorSetBinaryConstant(constref C0, C1, C2: UINT32; const c3: UINT32): TXMVECTOR; assembler;
const
    g_vMask1: TXMVECTOR = (u32: (1, 1, 1, 1));
asm
           // Move the parms to a vector
           // __m128i vTemp = _mm_set_epi32(C3,C2,C1,C0);
           MOVD        XMM0, [C3]
           MOVD        XMM1, [C2]
           MOVD        XMM2, [C1]
           MOVD        XMM3, [C0]
           PUNPCKLDQ   XMM3,XMM1
           PUNPCKLDQ   XMM2,XMM0
           PUNPCKLDQ   XMM3,XMM2 // XMM3 = vTemp
           // Mask off the low bits
           PAND    XMM3, [g_vMask1] // vTemp = _mm_and_si128(vTemp,g_vMask1);
           // 0xFFFFFFFF on true bits
           PCMPEQD XMM3, [g_vMask1] // vTemp = _mm_cmpeq_epi32(vTemp,g_vMask1);
           // 0xFFFFFFFF -> 1.0f, 0x00000000 -> 0.0f
           PAND    XMM3, [g_XMOne] // vTemp = _mm_and_si128(vTemp,g_XMOne);
           PUSH    EBX
           MOV     EBX, [result]
           MOVUPS  [EBX], XMM3 // return _mm_castsi128_ps(vTemp);
           POP     EBX
end;    

Thanks!

« Last Edit: January 07, 2018, 01:09:10 am by SonnyBoyXXl »

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: AVX and SSE support question (Read 90264 times)

dicepd

Re: AVX and SSE support question

BeanzMaster

Re: AVX and SSE support question

dicepd

Re: AVX and SSE support question

dicepd

Re: AVX and SSE support question

dicepd

Re: AVX and SSE support question

BeanzMaster

Re: AVX and SSE support question

dicepd

Re: AVX and SSE support question

dicepd

Re: AVX and SSE support question

SonnyBoyXXl

Re: AVX and SSE support question

CuriousKit

Re: AVX and SSE support question

BeanzMaster

Re: AVX and SSE support question

SonnyBoyXXl

Re: AVX and SSE support question

SonnyBoyXXl

Re: AVX and SSE support question

dicepd

Re: AVX and SSE support question

SonnyBoyXXl

Re: AVX and SSE support question

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook