Recent

Author Topic: AVX and SSE support question  (Read 89728 times)

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #45 on: November 23, 2017, 07:23:28 pm »
Ok finally got an almost working system on FreeBSD, just got to get gdb working, but I did manage to test the app. Numbers are fine but get

GLZVectorMath.pas(478,38) Warning: Exported/global symbols should be accessed via the GOT

from these lines   
Code: Pascal  [Select][+][-]
  1.  movups xmm0,[RIP+cNullSSEVector4f]
  2.  vmovups xmm0,[RIP+cNullAVXVector4f]
  3.  

whatever that warning means.
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

schuler

  • Full Member
  • ***
  • Posts: 223
Re: AVX and SSE support question
« Reply #46 on: November 23, 2017, 07:52:43 pm »
 :) Hello Pascal Lovers  :)

Decided to share a piece of FPC source code that I find clever in the hope it's helpful:
Code: Pascal  [Select][+][-]
  1. function CompareByte(Const buf1,buf2;len:SizeInt):SizeInt; assembler; nostackframe;
  2. { win64: rcx buf, rdx buf, r8 len
  3.   linux: rdi buf, rsi buf, rdx len }
  4. asm
  5. {$ifndef win64}
  6.     mov    %rdx, %r8
  7.     mov    %rsi, %rdx
  8.     mov    %rdi, %rcx
  9. {$endif win64}

Above code deals with different calling conventions.

I saw the code about "negate" somewhere, instead of using a 4 elements constant array, BROADCAST can be used. This example has been copied from uvolume.pas unit:
Code: Pascal  [Select][+][-]
  1.   mov rdx, FillOpPtr
  2.   VBROADCASTSS ymm0, [rdx]

In the example above, all 8 elements will be filled with the single value pointed by FillOpPtr.

 :) Wish everyone happy coding :)


dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #47 on: November 23, 2017, 08:09:09 pm »
Looking at the number of self registers and combos ifdefs could get quite messy very quickly. I just tested one alternative that works using macros at the top of the file.

Code: Pascal  [Select][+][-]
  1. {$MACRO ON}
  2. {$ifdef UNIX}
  3.   {$ifdef CPU64}
  4.     {$define ASM_VMOVUPS_SELF:=asm vmovups xmm0,[RDI]}
  5.     {$define ASM_VMOVAPS_SELF:=asm vmovaps xmm0,[RDI]}
  6.     {$define ASM_MOVUPS_SELF:=asm movups xmm0,[RDI]}
  7.     {$define ASM_MOVAPS_SELF:=asm movaps xmm0,[RDI]}
  8.   {$else}
  9.     {$define ASM_VMOVUPS_SELF:=asm vmovups xmm0,[EDI]}
  10.     {$define ASM_VMOVAPS_SELF:=asm vmovaps xmm0,[EDI]}
  11.     {$define ASM_MOVUPS_SELF:=asm movups xmm0,[EDI]}
  12.     {$define ASM_MOVAPS_SELF:=asm movaps xmm0,[EDI]}
  13.   {$endif}
  14. {$else}
  15.   {$ifdef CPU64}
  16.     {$define ASM_VMOVUPS_SELF:=asm vmovups xmm0,[RCX]}
  17.     {$define ASM_VMOVAPS_SELF:=asm vmovaps xmm0,[RCX]}
  18.     {$define ASM_MOVUPS_SELF:=asm movups xmm0,[RCX]}
  19.     {$define ASM_MOVAPS_SELF:=asm movaps xmm0,[RCX]}
  20.   {$else}
  21.     {$define ASM_VMOVUPS_SELF:=asm vmovups xmm0,[ECX]}
  22.     {$define ASM_VMOVAPS_SELF:=asm vmovaps xmm0,[ECX]}
  23.     {$define ASM_MOVUPS_SELF:=asm movups xmm0,[ECX]}
  24.     {$define ASM_MOVAPS_SELF:=asm movaps xmm0,[ECX]}
  25.   {$endif}
  26. {$endif}    
  27.  

Then the routines would look something like this:

Code: Pascal  [Select][+][-]
  1. function TGLZAVXVector4f.DotProduct(constref A: TGLZAVXVector4f):Single;assembler;
  2.   ASM_VMOVUPS_SELF
  3.   vmovups xmm1, [A]
  4.   vdpps xmm0, xmm0, xmm1, 01110001b //or $F1
  5.   movlps [Result], xmm0
  6. end;
  7.  

Advantage ifdefs removed from bulk of code.
DisAdvantage Asm colouring does not work.

Tested as working here, but just as a suggestion. btw macros will not otherwise work inside an asm section.

Peter
« Last Edit: November 23, 2017, 08:30:18 pm by dicepd »
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #48 on: November 23, 2017, 10:02:05 pm »
@schuler thanks it will be usefull for procedure

@Peter : Macro is good solution i think to, but not work with me  >:( but prefer "old style" due need add asm keyword in the macro  :-[

With the problem :
Code: Pascal  [Select][+][-]
  1. movups xmm0,[RIP+cNullSSEVector4f]
  2. vmovups xmm0,[RIP+cNullAVXVector4f

It's what we talk some messages above with Akira and Marcov, so try by just removed the RIP register.
eg :
Code: Pascal  [Select][+][-]
  1. movups xmm0,[cNullSSEVector4f]

 
« Last Edit: November 23, 2017, 11:36:16 pm by BeanzMaster »

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #49 on: November 23, 2017, 11:16:01 pm »
It would appear that the only way I can see to get rid of the warning and make the routines safe for BSD and osx is to use routine local consts. A bit of reading and global const and position independent code which could be randomised by these OSes does not sit well together and could result in more cycles trying to work out where the data segment actually is in memory to retrieve the global.

Peter.

 
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #50 on: November 23, 2017, 11:32:03 pm »

@Peter : Macro is good solution i think to, but not work with me  >:(


Try this...
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #51 on: November 25, 2017, 01:18:32 am »
Hi i try to implement some others operator like =,<=, < ect...

For the beginning, i try =

Code: Pascal  [Select][+][-]
  1. class operator TGLZSSEVector4f.= (constref A, B: TGLZSSEVector4f): boolean; assembler;
  2. asm
  3.   movups xmm0,[A]  
  4.   movups xmm1,[B]
  5.   {$IFDEF USE_ASM_SSE_4}
  6.   cmpeqps xmm0,xmm1
  7.   ptest    xmm0, xmm1
  8.   jz @no_differences
  9.   mov [RESULT],FALSE
  10.   jmp @END_SSE
  11.   {$ELSE}
  12.   cmpeqps  xmm0, xmm1    // 0:A and B are ordered and equal.  -1:not ieee_equal.
  13.   //andnps    xmm0, xmm1
  14.   movmskps  eax, xmm0
  15.   test      eax, eax
  16.   //or eax, eax
  17.   jz @no_differences
  18.   mov [RESULT],FALSE
  19.   jmp @END_SSE
  20.   {$ENDIF}
  21.   @no_differences:
  22.   mov [RESULT],TRUE
  23.   @END_SSE:
  24. end;

But this don't work (both for SSE and SSE4) it always return TRUE
with for example V1 = v1  and V1 = V2 (v1 and v2 are 2 differents vectors of course)

Must be add some (or just one PUSH/POP) Help is welcome, perhaps i don't understand something  :-[ with the Movmskps or ptest

Thanks
« Last Edit: November 25, 2017, 01:20:16 am by BeanzMaster »

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #52 on: November 26, 2017, 04:01:48 pm »
Hello, so i found solution for comparing Vector use CMPPS with flag instead of cmpXXps instructions

I've also optimized and corrected SSE/SSE2 functions
I've added SSE3/SS4 support for some functions and synchronized with AVX
I've added bunch of functions like, min, max, clamp, negate, lerp, anglecosine, reflect ect...
I've added some procedure for doing chain computing (not tested yet)

Now i have a very strange bug with the MOVSS instruction

Take a look :
with SSE i have this 2 functions. They're raise warning (see comment in)

Code: Pascal  [Select][+][-]
  1. function TGLZSSEVector4f.Combine2(constref V2: TGLZSSEVector4f; constref F1:Single;Constref F2: Single): TGLZSSEVector4f;assembler;
  2. asm
  3. {$ifdef UNIX}
  4.   {$ifdef CPU64}
  5.      movups xmm0,[RDI]
  6.   {$else}
  7.      movups xmm0,[EDI]
  8.   {$endif}
  9. {$else}
  10.   {$ifdef CPU64}
  11.      movups xmm0,[RCX]
  12.   {$else}
  13.      movups xmm0,[ECX]
  14.   {$endif}
  15. {$endif}
  16.   movups xmm1, [V2]
  17.   movss xmm2, [F1]
  18.   movss xmm3, [F2]   //---> WARNING GLZVectorMath.pas(1869,18) Warning: Check size of memory operand "movss: memory-operand-size is 64 bits, but expected [128 bits]"
  19.  
  20.   shufps xmm2, xmm2, $00 // replicate
  21.   shufps xmm3, xmm3, $00 // replicate
  22.  
  23.   mulps xmm0, xmm2  // Self * F1
  24.   mulps xmm1, xmm3  // V2 * F2
  25.  
  26.   addps xmm0, xmm1  // (Self * F1) + (V2 * F2)
  27.  
  28.   andps xmm0, [RIP+cSSE_MASK_NO_W]
  29.   movups [RESULT], xmm0
  30. end;  

Code: Pascal  [Select][+][-]
  1. function TGLZSSEVector4f.Combine3(constref V2, V3: TGLZSSEVector4f; constref F1, F2, F3: Single): TGLZSSEVector4f;  assembler;
  2. asm
  3. {$ifdef UNIX}
  4.   {$ifdef CPU64}
  5.      movups xmm0,[RDI]
  6.   {$else}
  7.      movups xmm0,[EDI]
  8.   {$endif}
  9. {$else}
  10.   {$ifdef CPU64}
  11.      movups xmm0,[RCX]
  12.   {$else}
  13.      movups xmm0,[ECX]
  14.   {$endif}
  15. {$endif}
  16.  
  17.   movups xmm1, [V2]
  18.   movups xmm4, [V3]
  19.  
  20.   movss xmm2, [F1] //---> WARNING GLZVectorMath.pas(1902,18) Warning: Check size of memory operand "movss: memory-operand-size is 64 bits, but expected [128 bits]"
  21.   movss xmm3, [F2] //---> WARNING GLZVectorMath.pas(1903,18) Warning: Check size of memory operand "movss: memory-operand-size is 64 bits, but expected [128 bits]"
  22.   movss xmm5, [F3] //---> WARNING GLZVectorMath.pas(1904,18) Warning: Check size of memory operand "movss: memory-operand-size is 64 bits, but expected [128 bits]"
  23.  
  24.   shufps xmm2, xmm2, $00 // replicate
  25.   shufps xmm3, xmm3, $00 // replicate
  26.   shufps xmm5, xmm5, $00 // replicate
  27.  
  28.   mulps xmm0, xmm2 // Self * F1
  29.   mulps xmm1, xmm3 // V2 * F2
  30.   mulps xmm4, xmm5 // V3 * F3
  31.  
  32.   addps xmm0, xmm1 // (Self * F1) + (V2 * F2)
  33.   addps xmm0, xmm4 // ((Self * F1) + (V2 * F2)) + (V3 * F3)
  34.  
  35.   andps xmm0, [RIP+cSSE_MASK_NO_W]
  36.   movups [RESULT], xmm0
  37. end;  

and now the AVX, it RAISE ERROR (same for Combine3) :

Code: Pascal  [Select][+][-]
  1. function TGLZAVXVector4f.Combine2(constref V2: TGLZAVXVector4f; Constref F1, F2: Single): TGLZAVXVector4f;assembler;
  2. asm
  3. {$ifdef UNIX}
  4.   {$ifdef CPU64}
  5.      vmovups xmm0,[RDI]
  6.   {$else}
  7.      vmovups xmm0,[EDI]
  8.   {$endif}
  9. {$else}
  10.   {$ifdef CPU64}
  11.      vmovups xmm0,[RCX]
  12.   {$else}
  13.      vmovups xmm0,[ECX]
  14.   {$endif}
  15. {$endif}
  16.   vmovss xmm2, [F1]
  17.   vmovss xmm3, [F2]  //--> ERROR : GLZVectorMath.pas(3465,3) Error: Invalid register used in memory reference expression: "xmm3"
  18.  
  19.   vmovups xmm1, [V2]
  20.  
  21.   vshufps xmm2, xmm2, xmm2, $00 // replicate
  22.   vshufps xmm3, xmm3, xmm3, $00 // replicate
  23.  
  24.   vmulps xmm0, xmm0, xmm2  // Self * F1
  25.   vmulps xmm1, xmm1, xmm3  // V2 * F2
  26.  
  27.   vaddps xmm0, xmm0, xmm1  // (Self * F1) + (V2 * F2)
  28.  
  29.   vandps xmm0, xmm0, [RIP+cSSE_MASK_NO_W]
  30.   vmovups [RESULT], xmm0
  31. end;

And with there two functions in SSE an AVX, NO WARNING / NO ERROR

Code: Pascal  [Select][+][-]
  1. function TGLZSSEVector4f.Combine(constref V2: TGLZSSEVector4f; constref F1: Single): TGLZSSEVector4f;assembler;
  2. asm
  3. {$ifdef UNIX}
  4.   {$ifdef CPU64}
  5.      movups xmm0,[RDI]
  6.   {$else}
  7.      movups xmm0,[EDI]
  8.   {$endif}
  9. {$else}
  10.   {$ifdef CPU64}
  11.      movups xmm0,[RCX]
  12.   {$else}
  13.      movups xmm0,[ECX]
  14.   {$endif}
  15. {$endif}
  16.   movups xmm1, [V2]
  17.   movss xmm2, [F1]
  18.   shufps xmm2, xmm2, $00 // replicate
  19.  
  20.   mulps xmm1, xmm2 //V2*F1
  21.   addps xmm0, xmm1 // Self + (V2*F1)
  22.  
  23.   andps xmm0, [RIP+cSSE_MASK_NO_W]
  24.   movups [RESULT], xmm0
  25. end;  
  26.  
  27. function TGLZAVXVector4f.Combine(constref V2: TGLZAVXVector4f; constref F1: Single): TGLZAVXVector4f;assembler;
  28. asm
  29. {$ifdef UNIX}
  30.   {$ifdef CPU64}
  31.      vmovups xmm0,[RDI]
  32.   {$else}
  33.      vmovups xmm0,[EDI]
  34.   {$endif}
  35. {$else}
  36.   {$ifdef CPU64}
  37.      vmovups xmm0,[RCX]
  38.   {$else}
  39.      vmovups xmm0,[ECX]
  40.   {$endif}
  41. {$endif}
  42.   vmovups xmm1, [V2]
  43.   vmovss xmm2, [F1]
  44.   vshufps xmm2, xmm2, xmm2, $00 // replicate
  45.  
  46.   vmulps xmm1, xmm1, xmm2 //V2*F1
  47.   vaddps xmm0, xmm0, xmm1 // Self + (V2*F1)
  48.  
  49.   vandps xmm0, xmm0, [RIP+cSSE_MASK_NO_W]
  50.   vmovups [RESULT], xmm0
  51. end;
  52.  

So it's seem have a problem with the compilator, it's very strange because with there 2 others functions no problems

Code: Pascal  [Select][+][-]
  1. function TGLZSSEVector4f.Clamp(constref AMin, AMax: Single): TGLZSSEVector4f; assembler;
  2. asm
  3. {$ifdef UNIX}
  4.   {$ifdef CPU64}
  5.      movups xmm0,[RDI]
  6.   {$else}
  7.      movups xmm0,[EDI]
  8.   {$endif}
  9. {$else}
  10.   {$ifdef CPU64}
  11.      movups xmm0,[RCX]
  12.   {$else}
  13.      movups xmm0,[ECX]
  14.   {$endif}
  15. {$endif}
  16.   movss xmm2, [AMin]
  17.   movss xmm3, [AMax]
  18.   shufps xmm2, xmm2, $00 // Replicate AMin
  19.   shufps xmm3, xmm3, $00 // Replicate AMax
  20.   maxps  xmm0, xmm2
  21.   minps  xmm0, xmm3
  22.   movups [Result], xmm0
  23. end;
  24.  
  25. function TGLZAVXVector4f.Clamp(constref AMin, AMax: Single): TGLZAVXVector4f; assembler;
  26. asm
  27. {$ifdef UNIX}
  28.   {$ifdef CPU64}
  29.      vmovups xmm0,[RDI]
  30.   {$else}
  31.      vmovups xmm0,[EDI]
  32.   {$endif}
  33. {$else}
  34.   {$ifdef CPU64}
  35.      vmovups xmm0,[RCX]
  36.   {$else}
  37.      vmovups xmm0,[ECX]
  38.   {$endif}
  39. {$endif}
  40.   vmovss xmm2, [AMin]
  41.   vmovss xmm3, [AMax]
  42.   vshufps xmm2, xmm2, xmm2, $00
  43.   vshufps xmm3, xmm3, xmm3, $00
  44.   vmaxps  xmm0, xmm0, xmm2
  45.   vminps  xmm0, xmm0, xmm3
  46.   vmovups [Result], xmm0
  47. end;

You can try with the updated sample project attached here

Note i've commented The functions Combine2 and Combine3 so uncomment for testing

By defaut it use SSE/SSE2 code. See Directive on top of the unit to change

Thanks in advance for your tests and help



« Last Edit: November 26, 2017, 04:05:00 pm by BeanzMaster »

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #53 on: November 29, 2017, 02:10:20 am »
Hi Jerome,

Just got round to testing this in Linux. I have just moved my main dev box over to Linux for good now, so that took a couple of days. BTW this box with decent drivers solves all the control issues I was having . GLScene now works just fine for me.

Anyway back to the testing, as tested it just kept crashing on
 
Code: Pascal  [Select][+][-]
  1. andps xmm0, [RIP+cSSE_MASK_NO_W]
and similar lines. My first solution was to read the generated native code which seemed to do a move first as in

Code: Pascal  [Select][+][-]
  1.  movups xmm3, [RIP+cSSE_MASK_NO_W]
  2.  andps xmm0, xmm3

I tried this and while it worked I was not happy having to add another instruction, as this is meant to be an optimsed lib.

A bit more googling and reading and a better solution came to light. It would seem that you are getting your consts aligned correctly, while in Linux they were not and therefore generating an error. So I rolled back the first set of changes and added

Code: Pascal  [Select][+][-]
  1. {$CODEALIGN CONSTMIN=16}  

This worked fine.

Peter
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #54 on: November 29, 2017, 04:02:34 am »
Testing the Combine2 and Combine3 I am afraid I can't offer any help there as they all just work for me with no warnings or errors.

One other thing I looked at again was the Exported/global symbols should be accessed via the GOT warning, now so many they are annoying.

One solution is to move the consts to the Implementation section as they are not going to be required by the end user as I see you have added the TGLZVector type definition and exported consts can be declared using this type.

And the last thing before I go back to my problems, you are trashing the stack where you want a single return by using a [ v]mov[lua]ps instruction, should only need a movss to return a single.
 
Peter
« Last Edit: November 29, 2017, 05:02:40 am by dicepd »
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #55 on: November 29, 2017, 04:28:14 pm »
Hi Peter thanks for testing

For Combine2 and Combine3, with you the results are corrects ?

It's very strange i tried to change order of args like :

  function TGLZSSEVector4f.Combine2(constref F1, F2: Single;constref V2: TGLZSSEVector4f): TGLZSSEVector4f;
  function TGLZAVXVector4f.Combine3(constref F1, F2, F3: Single;constref V2, V3: TGLZAVXVector4f ): TGLZAVXVector4f;

Now the compiler only report in combine3 function:  >:(
GLZVectorMath_NEW.pas(2167,18) Warning: Check size of memory operand "movss: memory-operand-size is 64 bits, but expected [128 bits]" on this line :  movss xmm5, [F3] with sse
GLZVectorMath_NEW.pas(3731,3) Error: Asm: [vmovss xmmreg,mem128] invalid combination of opcode and operands  on this line :  vmovss xmm5, [F3] with AVX

No warning anymore in Combine2 but the results are wrong relative to the native function  :'(

I've also change the order of instructions in Combine3 AVX to

Code: Pascal  [Select][+][-]
  1. function TGLZAVXVector4f.Combine3(constref F1, F2, F3: Single;constref V2, V3: TGLZAVXVector4f ): TGLZAVXVector4f;  assembler;
  2. asm
  3. {$ifdef UNIX}
  4.   {$ifdef CPU64}
  5.      vmovups xmm0,[RDI]
  6.   {$else}
  7.      vmovups xmm0,[EDI]
  8.   {$endif}
  9. {$else}
  10.   {$ifdef CPU64}
  11.      vmovups xmm0,[RCX]
  12.   {$else}
  13.      vmovups xmm0,[ECX]
  14.   {$endif}
  15. {$endif}
  16.   vmovss xmm2, [F1]
  17.   vshufps xmm2, xmm2, xmm2, $00 // replicate
  18.   vmulps xmm0, xmm0, xmm2 // Self * F1
  19.  
  20.   vmovups xmm1, [V2]
  21.   vmovss xmm3, [F2]
  22.   vshufps xmm3, xmm3, xmm3, $00 // replicate
  23.   vmulps xmm1, xmm1, xmm3 // V2 * F2
  24.  
  25.   vaddps xmm0, xmm0, xmm1 // (Self * F1) + (V2 * F2)
  26.  
  27.   vmovups xmm4, [V3]
  28.   movss xmm5, [F3]
  29.   vshufps xmm5, xmm5, xmm5, $00 // replicate
  30.   vmulps xmm4, xmm4, xmm5 // V3 * F3
  31.  
  32.   vaddps xmm0, xmm0, xmm4 // ((Self * F1) + (V2 * F2)) + (V3 * F3)
  33.  
  34.   vandps xmm0, xmm0, [RIP+cSSE_MASK_NO_W]
  35.   vmovups [RESULT], xmm0
  36. end;
  37.  

and now i've GLZVectorMath_NEW.pas(3741,18) Warning: Check size of memory operand "movss: memory-operand-size is 64 bits, but expected [128 bits]" on this line :  vmovss xmm5,[F3] instead of the error message. If a Guru come here, perhaps he will can give an explication. Because here i'm totally lost.  :'(

I've also tried this,(no error, no warning)  but result is the same as Combine2 and not correct relative to the native function  >:D it's like the third operation (V3*F3) is not compute or is set to ZERO  >:(

Code: Pascal  [Select][+][-]
  1. function TGLZAVXVector4f.Combine3(constref F1, F2, F3: TGLZAVXVector4f;constref V2, V3: TGLZAVXVector4f ): TGLZAVXVector4f;  assembler;
  2. asm
  3. {$ifdef UNIX}
  4.   {$ifdef CPU64}
  5.      vmovups xmm0,[RDI]
  6.   {$else}
  7.      vmovups xmm0,[EDI]
  8.   {$endif}
  9. {$else}
  10.   {$ifdef CPU64}
  11.      vmovups xmm0,[RCX]
  12.   {$else}
  13.      vmovups xmm0,[ECX]
  14.   {$endif}
  15. {$endif}
  16.   vmovups xmm2, [F1]
  17.   vmulps xmm0, xmm0, xmm2 // Self * F1
  18.  
  19.   vmovups xmm1, [V2]
  20.   vmovups xmm3, [F2]
  21.   vmulps xmm1, xmm1, xmm3 // V2 * F2
  22.  
  23.   vmovups xmm4, [V3]
  24.   vmovups xmm5, [F3]
  25.   vmulps xmm4, xmm4, xmm5 // V3 * F3
  26.  
  27.   vaddps xmm0, xmm0, xmm1 // (Self * F1) + (V2 * F2)
  28.   vaddps xmm0, xmm0, xmm4 // ((Self * F1) + (V2 * F2)) + (V3 * F3)
  29.  
  30.   vandps xmm0, xmm0, [RIP+cSSE_MASK_NO_W]
  31.   vmovups [RESULT], xmm0
  32. end;  
  33.  
So i think is a compiler bug under Windows 64 bit (not tested in 32bit) but why just here ?????

Peter for

Code: Pascal  [Select][+][-]
  1. andps xmm0, [RIP+cSSE_MASK_NO_W]

Have you tried without the RIP ?


dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #56 on: November 29, 2017, 04:50:58 pm »
First off if I remove RIP Test crashes.

Next here is the screenshot of my results. I think this may be down to alignment as with the consts although my recent reading matter states that the stack in 64 bit OS is allready 16 byte aligned and as I am getting the correct result the code as of the last zip is probably correct, just not getting right numbers from the stack.

I will try to play with the code I currently have in Windows and see what happens here.
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #57 on: November 29, 2017, 07:11:45 pm »
Ok I got a VM of win7 64 bit up and am stepping through combine3 comparing registers from a linux and a win 7,

vectors load in fine in both OS



Load F1 into xmm2
 1.5 in Linux
{v4_float = {1.5, 0, 0, 0}, v2_double = {5.2842668622670356e-315, 0}, v16_int8 = {0, 0, -64, 63, 0 <repeats 12 times>}, v8_int16 = {0, 16320, 0, 0, 0, 0, 0, 0}, v4_int32 = {1069547520, 0, 0, 0}, v2_int64 = {1069547520, 0}, uint128 = 1069547520}
in windows
{v4_float = {2.40490394e-038, 0, 0, 0}, v2_double = {8.3840924311424506e-317, 0}, v16_int8 = {120, -17, 2, 1, 0 <repeats 12 times>}, v8_int16 = {-4232, 258, 0, 0, 0, 0, 0, 0}, v4_int32 = {16969592, 0, 0, 0}, v2_int64 = {16969592, 0}, uint128 = 16969592}

Load F2 into xmm3
Linux
{v4_float = {5.5, 0, 0, 0}, v2_double = {5.3619766690650802e-315, 0}, v16_int8 = {0, 0, -80, 64, 0 <repeats 12 times>}, v8_int16 = {0, 16560, 0, 0, 0, 0, 0, 0}, v4_int32 = {1085276160, 0, 0, 0}, v2_int64 = {1085276160, 0}, uint128 = 1085276160}
Windows
{v4_float = {2.4049017e-038, 0, 0, 0}, v2_double = {8.3840884786172839e-317, 0}, v16_int8 = {112, -17, 2, 1, 0 <repeats 12 times>}, v8_int16 = {-4240, 258, 0, 0, 0, 0, 0, 0}, v4_int32 = {16969584, 0, 0, 0}, v2_int64 = {16969584, 0}, uint128 = 16969584}

Load F3 into xmm5
Linux
{v4_float = {6.5999999, 0, 0, 0}, v2_double = {5.3733741064073288e-315, 0}, v16_int8 = {51, 51, -45, 64, 0 <repeats 12 times>}, v8_int16 = {13107, 16595, 0, 0, 0, 0, 0, 0}, v4_int32 = {1087583027, 0, 0, 0}, v2_int64 = {1087583027, 0}, uint128 = 1087583027}
Windows
{v4_float = {2.40489946e-038, 0, 0, 0}, v2_double = {8.3840845260921172e-317, 0}, v16_int8 = {104, -17, 2, 1, 0 <repeats 12 times>}, v8_int16 = {-4248, 258, 0, 0, 0, 0, 0, 0}, v4_int32 = {16969576, 0, 0, 0}, v2_int64 = {16969576, 0}, uint128 = 16969576}

So no wonder you are getting wroing answers. Atm I have not got a clue what is going on but will think about it and play some more.

Peter


Update Removing constref from singles and passing by value on the stack and it works. That is probably why the compiler gave the 64 bits message as against a 32 bit. Single are probably better off passed by value as 32 bits is less than 64 bit pointer.
« Last Edit: November 29, 2017, 07:21:16 pm by dicepd »
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #58 on: November 29, 2017, 11:05:49 pm »
Thanks Peter, you make me in the right way, after re-reading your message about the Const so i found now, i have correct results :
1st i've added this :
Code: Pascal  [Select][+][-]
  1. {$CODEALIGN RECORDMIN=16}

EDIT :; I didn't take take care this directive now break many other functions (see attached screenshot 1st without 2nd with see Distance, Length, norm, normalize )

but always having this warning in Combine3 with SSE (no more with Combine2)
GLZVectorMath_NEW.pas(2167,18) Warning: Check size of memory operand "movss: memory-operand-size is 64 bits, but expected [128 bits]" on this line :  movss xmm5, [F3]
so i  changed movss by movlps no more warning but result is always incorrect with Combine3

But now for the AVX Combine2 for the second VMOVSS and in Combine3 for the 3 VMOVSS I've always this error
GLZVectorMath_NEW.pas(3804,3) Error: Asm: [vmovss xmmreg,mem128] invalid combination of opcode and operands but by changing VMOVSS by MOVLPS no more error and result are corrects.  (NOTE : VMOVLPS give the same error as above >:( ) and result is  always incorrect like with the SSE version (for Combine3)
 I don't understand,  this behaviour it's crazy  :'(
« Last Edit: November 29, 2017, 11:26:54 pm by BeanzMaster »

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #59 on: November 29, 2017, 11:24:05 pm »
And the second screenshot

 

TinyPortal © 2005-2018