Recent

Author Topic: AVX and SSE support question  (Read 89737 times)

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: AVX and SSE support question
« Reply #30 on: November 20, 2017, 11:31:07 pm »
Yes it was the problem, the inversion, but you tricks is not the real solution ( don't work with  the 2nd overloaded operators (V:TheVector;F:Single)
I've made some research on instructions and test So the good is :
[c]VSUBPS XMM0,XMM0, XMM1[/c] where the 1st param is the result and not the 3rd as I thought.
Now the results are ok
Thanks

Yeah, that was just a quick suggestion and certainly wouldn't work if the second parameter was a single floating-point value instead of another vector. Inverting it the way you're doing it above is definitely a better all-around solution.

Also, @Nitorami: your code works fine for me, both in and outside of the loop.
« Last Edit: November 21, 2017, 02:08:47 am by Akira1364 »

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #31 on: November 21, 2017, 12:12:03 am »
Continuing my research i've added this code for SSE :

Code: Pascal  [Select][+][-]
  1. Const
  2.   NullVector4f : TGLZVector4f =(v:(0,0,0,0)); // or =(x:0;y:0;z:0;w:0);
  3.  
  4.  
  5. function TGLZVector4f.Negate :TGLZVector4f;assembler;
  6. asm
  7.     movups xmm1,[RCX]    // RCX = Self
  8.     movups xmm0,[NullVector4f]
  9.     subps xmm0,xmm1
  10.     movups [Result],xmm0 //RDX = Result
  11. End;


Sometime it compile but give wrong result sometime is good but not very very very often and most of time, it just stop with SIGSEGV on the
line : movups xmm0,[NullVector4f]  >:D

And I also have this message that appears, sometime :
project1.lpr(22,0) Warning: Object file "unit1.o" contains 32-bit absolute relocation to symbol ".data.n_tc_$unit1_$$_nullvector4f". I don't understand

For info i'm using Lazarus 1.8rc4 64bits on windows 10
« Last Edit: November 21, 2017, 12:27:34 am by BeanzMaster »

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: AVX and SSE support question
« Reply #32 on: November 21, 2017, 01:54:23 am »
Negation is also an overloadable operator that uses the same symbol as subtraction, by the way.
 
You don't have to worry about it conflicting with the subtraction overload either, as the compiler recognizes that they're not the same thing since they have different numbers of parameters. Here's the SSE and AVX versions of it:

Code: Pascal  [Select][+][-]
  1.   class operator TGLZVector4F.-(constref A: TGLZVector4F): TGLZVector4F; assembler; //SSE
  2.   asm
  3.     MOVAPS XMM1,[A]
  4.     MOVAPS XMM0,[NullVector4F]
  5.     SUBPS XMM0,XMM1
  6.     MOVAPS [Result],XMM0
  7.   end;
  8.  
  9.   class operator TGLZVector4F.-(constref A: TGLZVector4F): TGLZVector4F; assembler; //AVX
  10.   asm
  11.     VMOVAPS XMM1,[A]
  12.     VMOVAPS XMM0,[NullVector4F]
  13.     VSUBPS XMM0,XMM0,XMM1
  14.     VMOVAPS [Result],XMM0
  15.   end;
  16.  
  17.   //So to use these you would obviously just do B := -A, or V2 := -V1 or whatever

Both of those work fine for me, with no compiler warnings or any crashes/invalid output after running them a whole bunch of times. Again, I'm using the aligned versions of the "MOV" functions as opposed to the unaligned ones.
« Last Edit: November 21, 2017, 02:16:40 am by Akira1364 »

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #33 on: November 21, 2017, 02:26:41 pm »
Hi, Thanks Akira, I did not think about it but it's not resolve the problem on my pc i have always a SIGSEGV and this message :
project1.lpr(22,0) Warning: Object file "unit1.o" contains 32-bit absolute relocation to symbol ".data.n_tc_$unit1_$$_nullvector4f".

Something is wrong with my configuration

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11352
  • FPC developer.
Re: AVX and SSE support question
« Reply #34 on: November 21, 2017, 03:16:07 pm »
Hi, Thanks Akira, I did not think about it but it's not resolve the problem on my pc i have always a SIGSEGV and this message :
project1.lpr(22,0) Warning: Object file "unit1.o" contains 32-bit absolute relocation to symbol ".data.n_tc_$unit1_$$_nullvector4f".

Something is wrong with my configuration

Try

Code: Pascal  [Select][+][-]
  1.   MOVAPS XMM0,[RIP+NullVector4F]

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #35 on: November 21, 2017, 03:44:07 pm »

Try

Code: Pascal  [Select][+][-]
  1.   MOVAPS XMM0,[RIP+NullVector4F]

Thanks Marcov the  RIP-Relative Addressing solve the problem. But Why at Akira seems to works without ?

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: AVX and SSE support question
« Reply #36 on: November 21, 2017, 08:19:20 pm »
Not sure why they both consistently work here (or why I don't get any warnings... enabled all the verbosity settings and still nothing.) Could be any number of things (CPU differences/compiler version differences/e.t.c.)

Also the "rip-relative" addressing style didn't cross my mind before, but it definitely is a safer way overall so I'd probably just stick with that for stuff like the negation operator.
« Last Edit: November 21, 2017, 08:34:43 pm by Akira1364 »

Nitorami

  • Sr. Member
  • ****
  • Posts: 481
Re: AVX and SSE support question
« Reply #37 on: November 21, 2017, 08:56:19 pm »
On my environments (win7 and win10, 32bit) I sometimes (but then consistently) get access violations with these routines. It seems to depend, probably amongst other things, on the settings of optimisation level2 and regvar. While O2 is known for some bugs, I had no problems with regvar so far. Therefore I think that something serious is wrong with these assembler routines - saving registers, calling convention, stack issues, alignment - whatever.
I experienced similar problems years ago, when I thought I could optimise my code using selfmade assembler routines... which suddenly stopped working when I changed code at entirely different places in the program. My lesson was - if you do not know exactly what you are doing, don't use assembler.   
« Last Edit: November 21, 2017, 08:57:54 pm by Nitorami »

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: AVX and SSE support question
« Reply #38 on: November 21, 2017, 09:39:10 pm »
99% of the time I'd agree with you, as in most cases assembly hand-written by people is unlikely to be better or even close to as good as what FPC produces at high optimization levels.

However this is sort of an edge case where it's specifically known that the compiler isn't currently capable of generating working "vectorized" ASM from Pascal at any setting. They're also extremely simple 4-line methods that are pretty much in line with what GCC would generate at Ofast or MSVC would generate with the vectorcall extension turned on, so there's not a whole lot that can (or should) go wrong.
« Last Edit: November 22, 2017, 12:39:36 am by Akira1364 »

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: AVX and SSE support question
« Reply #39 on: November 21, 2017, 10:06:56 pm »
I guess that a problem is seen sometimes on 32bit code, and not seen on 64bit code.

Akira, are you testing 64bit code?

Nitorami, 32bit?

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: AVX and SSE support question
« Reply #40 on: November 22, 2017, 12:37:37 am »
Yeah, 64 bit compiled with trunk FPC. CPU is an i7-4790k.
Nitorami actually already said they were using 32-bit in their last post, by the way.

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11352
  • FPC developer.
Re: AVX and SSE support question
« Reply #41 on: November 22, 2017, 10:00:00 am »
I use FPC sse/avx assembler in (64-bit) production applications. Never found a problem that I couldn't explain by looking at the generated code.

-Sv is different and buggy, but assembler is usually straightforward.

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #42 on: November 23, 2017, 12:54:40 am »
Hi to all, thanks for the explanations

I've made a little test app with 3 clones of the same record. One for Pure Pascal, One for SSE, and the 3rd for AVX
I've included the basic vectors functions (Length, Distance, DotProduct, CrossProduct, Normalize,....
I've putted some comment
I'll suggest you to see specialy the SSE DotProduct function in comment you'll find 3 others versions (SSE1, SSE2, SSE3 and SS4 tests)
This is just a test so some functions are not optimized yet  ;D
The App compile without any exceptions or compilater's warnings  8-)

In order to make a comparison between our pc and configuration

My PC
- CPU                            : AMD A10-7870K Radeon R7, 12 Compute Cores 4C+8G
- Supported Instructions : MMX (+), SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE4A, AMD 64, NX, VMX, AES, AVX, FMA3, FMA4
- OS                              : Windows 10 64-bit
- DEV                            : Lazarus 1.8rc4 / FPC 3.0.2

All suggestions are welcome.

Notice : the unit is for 64-bit and Windows, for others see the comment on the top of the unit

« Last Edit: November 23, 2017, 12:57:03 am by BeanzMaster »

dicepd

  • Full Member
  • ***
  • Posts: 163
Re: AVX and SSE support question
« Reply #43 on: November 23, 2017, 10:07:49 am »
Hi Jerome,

Nice work with the Vector lib, thats the cleanest Pascal code I have come across for vector math, demonstates what advanced records can really do.

Tested on Win7 64 on both my AMD and Intel desktops with no problems, plugging the number ranges I use into some of the Vectors thankfully I see no loss of precision using the rsqrtps in normalize.

Loaded it into a Linux VM and I have made your nice neat code all messy with some Unix defines  (I selected Unix for now as I am about to upgrade my FreeBSD boxen to test there). This is still for 64 bit linux not tested in 32bit as I have no 32 bit OSes any more.

Peter
Lazarus 1.8rc5 Win64 / Linux gtk2 64 / FreeBSD qt4

BeanzMaster

  • Sr. Member
  • ****
  • Posts: 268
Re: AVX and SSE support question
« Reply #44 on: November 23, 2017, 02:31:51 pm »
Hi Jerome,

Nice work with the Vector lib, thats the cleanest Pascal code I have come across for vector math, demonstates what advanced records can really do.

Tested on Win7 64 on both my AMD and Intel desktops with no problems, plugging the number ranges I use into some of the Vectors thankfully I see no loss of precision using the rsqrtps in normalize.

Loaded it into a Linux VM and I have made your nice neat code all messy with some Unix defines  (I selected Unix for now as I am about to upgrade my FreeBSD boxen to test there). This is still for 64 bit linux not tested in 32bit as I have no 32 bit OSes any more.

Peter

Hi Peter,

Thanks. It's cool you've added and test for Unix  8)
Now i'll can add more functions like min, max, clamp, refract, reflect.... and beginning to work with array.
Now i'll can add this to my project and improve GLScene for the next big update  ;)
If someone could test under linux 32bit and mac to, it will be very helpfull

cheers

 

TinyPortal © 2005-2018