Hi to all
so I do not know how Jerome is getting good returns in Win10?
I can't answer, i'm just coding so.........
Anayway, i've splitted all asm code into 6 includes file (one for each case Linux/Windows 32/64bit SSE/AVX), better reading and better for debugging
plus i've added 2 of my others units, i've beginning some other little tests)
I've corrected some little spelling bugs and putted some little comments
I've tested 32bit with Lazarus 1.8rc3 but some errors occured :
1st the clamp functions work but raise a SIGSEV just after
2nd the function with single result. The result is stored in ST register, i tried to set it with FTSP intruction, but without success
I'm also add some conditionnals commands for alignment, replaced MOVUPS by MOVAPS and it work. I've also added 2/3 others little functions, and added AngleBetween in asm but not tested yet
The performance varying and depends of the compiler's options and how record is set (packed or not)
The best results I've got are with SSE4/SSE3, not with AVX so i think they're will be better with matrix manipulation.
Peter i don't include your change for Unix, i can't test and don't know where exactly.
I've also tested your sample, it work in 32bit with Laz1.8rc3 but not in Laz1.8rc4 64bit. The better result i haved, was with
{$define USE_RECORD_V}Now I have a headache ! Next i'll begin some tests with Arrays, matrix and quaternion
Request to BeanzMaster - as well as timing checks, can you also implement some verification in your benchmark program? I have a feeling that some functions return incorrect results. Failing that, I can possibly design something a little more in-depth once I've finished my current task.
Yes later, one of the first needed is check the divide by 0. Otherwise compared to the native code the results are good