Recent

Author Topic: ASM Calling convention differences beetween FPC 3.0.4 and 3.3.1  (Read 410 times)

BeanzMaster

  • Full Member
  • ***
  • Posts: 182
Hi to all i want to update my vector math lib  here : https://github.com/jdelauney/SIMD-VectorMath-UnitTest for FPC 3.0.4 to fpc 3.1.x min and up.
Actually i have Lazarus 2.1 with fpc 3.3.1 SVN Rev 61197 installed

So. Actually i'm declaring my record for 2D Integer vector, as is (same way for 2D Single, Double Vector, 4D Single,.....

Code: Pascal  [Select]
  1. Type
  2.   TBZVector2iType = packed array[0..1] of Integer; //< Tableau aligné pour les vecteurs 2D Integer
  3.   { TBZVector2i : Vecteur 2D Integer }
  4.   TBZVector2i = record
  5.  
  6.     { Initialisation des valeurs X et Y value }
  7.     procedure Create(aX, aY:Integer); overload;
  8.  
  9.     { Retourne une chaine de caractères formaté représentant le vecteur : '(x, y)' }
  10.     function ToString : String;
  11.  
  12.     { Ajoute deux vecteurs TBZVector2i
  13.       Exemple (V1, V2 et resultat sont des variables de type TBZVector2i  :
  14.       Resultat := V1 + V2;}
  15.     class operator +(constref A, B: TBZVector2i): TBZVector2i; overload;
  16.  
  17.     { Ajoute un vecteurs TBZVector2i avec une variable de type Integer
  18.       Exemple (V12 et resultat sont des variables de type TBZVector2i, I est de type Integer  :
  19.       Resultat := V1 + I ;}
  20.     class operator +(constref A: TBZVector2i; constref B:Integer): TBZVector2i; overload;
  21.  
  22.     { Ajoute une variable de type Single a un vecteurs TBZVector2i avec
  23.       Exemple (V12 et resultat sont des variables de type TBZVector2i, I est de type Single  :
  24.       Resultat := V1 div I ;}
  25.     class operator +(constref A: TBZVector2i; constref B:Single): TBZVector2i; overload;
  26.  
  27. .....
  28.  
  29.     function Abs:TBZVector2i;overload;
  30.  
  31.     case Byte of
  32.       0: (V: TBZVector2iType);
  33.       1: (X, Y : Integer);
  34.       2: (Width, Height : Integer);
  35.   end;

and currently, with FPC 3.0.4 I have the following implementation :

Code: Pascal  [Select]
  1. class operator TBZVector2i.+(constref A, B: TBZVector2i): TBZVector2i;assembler; nostackframe;register;
  2. asm
  3.   movq  xmm0, [A]
  4.   movq  xmm1, [B]
  5.   paddd xmm0, xmm1
  6.   movq  RAX, {%H-}xmm0
  7. end;
  8.  
  9. class operator TBZVector2i.+(constref A: TBZVector2i; constref B:Single): TBZVector2i;  assembler; nostackframe; register;
  10. asm
  11.   movq   xmm0, [A]
  12.   movq   xmm1, [B]
  13.   pshufd xmm1, xmm1, $00
  14.   paddd xmm0, xmm1
  15.   movq   RAX, {%H-}xmm0
  16. end;
  17.  
  18. class operator TBZVector2i.+(constref A: TBZVector2i; constref B:Integer): TBZVector2i; assembler; nostackframe; register;
  19. asm
  20.   movq   xmm0, [A]
  21.   movq   xmm1, [B]
  22.   pshufd xmm1, xmm1, $00
  23.   paddd xmm0, xmm1
  24.   movq   RAX, {%H-}xmm0
  25. end;

With FPC 3.3.1 i would like to implement VectorCall

so i did this

Code: Pascal  [Select]
  1. {$IF fpc_fullversion >= 030100}
  2.    {$DEFINE USE_VECTORCALL}
  3. {$ENDIF}    
  4.  
  5. class operator TBZVector2i.+(constref A, B: TBZVector2i): TBZVector2i; {$ifdef USE_VECTORCALL} vectorcall {$else} register; {$endif} assembler; nostackframe;
  6. asm
  7.   {$ifndef USE_VECTORCALL}
  8.   movq  xmm0, [A]
  9.   movq  xmm1, [B]
  10.   {$endif}
  11.   paddd xmm0, xmm1
  12.   movq  RAX, {%H-}xmm0
  13. end;
  14.  
  15. class operator TBZVector2i.+(constref A: TBZVector2i; constref B:Single): TBZVector2i; {$ifdef USE_VECTORCALL} vectorcall {$else} register; {$endif} assembler; nostackframe;
  16. asm
  17.   {$ifndef USE_VECTORCALL}
  18.   movq   xmm0, [A]
  19.   movq   xmm1, [B]
  20.   {$endif}
  21.   pshufd xmm1, xmm1, $00
  22.   paddd xmm0, xmm1
  23.   {.$ifndef USE_VECTORCALL}
  24.   movq   RAX, {%H-}xmm0
  25.   {.$endif}
  26. end;
  27.  
  28. class operator TBZVector2i.+(constref A: TBZVector2i; constref B:Integer): TBZVector2i; {$ifdef USE_VECTORCALL} vectorcall {$else} register; {$endif} assembler; nostackframe;
  29. asm
  30.   {$ifndef USE_VECTORCALL}
  31.   movq   xmm0, [A]
  32.   movq   xmm1, [B]
  33.   {$endif}
  34.   pshufd xmm1, xmm1, $00
  35.   paddd xmm0, xmm1
  36.   movq   RAX, {%H-}xmm0
  37. end;  

FPC say

Quote
vectormath_vector2i_win64_sse_imp.inc(3,28) Error: Calling convention doesn't match forward
BZVectorMath.pas(233,20) Error: Found declaration: operator +(constref TBZVector2i;constref TBZVector2i):<record type>; Static;
BZVectorMath.pas(263,20) Error: Found declaration: operator +(constref TBZVector2i;constref LongInt):<record type>; Static;
BZVectorMath.pas(269,20) Error: Found declaration: operator +(constref TBZVector2i;constref Single):<record type>; Static;

vectormath_vector2i_win64_sse_imp.inc(49,28) Error: Calling convention doesn't match forward
vectormath_vector2i_win64_sse_imp.inc(3,28) Error: Found declaration: operator +(constref TBZVector2i;constref TBZVector2i):<record type>; Register; Static;
BZVectorMath.pas(263,20) Error: Found declaration: operator +(constref TBZVector2i;constref LongInt):<record type>; Static;
BZVectorMath.pas(269,20) Error: Found declaration: operator +(constref TBZVector2i;constref Single):<record type>; Static;
vectormath_vector2i_win64_sse_imp.inc(62,28) Error: Calling convention doesn't match forward
vectormath_vector2i_win64_sse_imp.inc(3,28) Error: Found declaration: operator +(constref TBZVector2i;constref TBZVector2i):<record type>; Register; Static;
BZVectorMath.pas(263,20) Error: Found declaration: operator +(constref TBZVector2i;constref LongInt):<record type>; Static;
vectormath_vector2i_win64_sse_imp.inc(49,28) Error: Found declaration: operator +(constref TBZVector2i;constref Single):<record type>; Register; Static;

So what the correct declaration in interface ?

And for 2D Double vector, I have also this error

Code: Pascal  [Select]
  1. class operator TBZVector2d.*(constref A: TBZVector2d; constref B : TBZVector2i): TBZVector2d; assembler; nostackframe; register;
  2. asm
  3.   movapd   xmm0, [A]
  4.   cvtdq2pd xmm2, [B]
  5.   mulpd    xmm0, xmm2
  6.   movapd   [Result], xmm0
  7. end;

With FPC 3.3.1 cvtdq2pd xmm2, FPC return "vectormath_vector2d_win64_sse_imp.inc(30,3) Error: Asm: [cvtdq2pd reg??,mem128] invalid combination of opcode and operands" (Work well with FPC 3.0.4)

Thanks in advance

Best regards

julkas

  • Sr. Member
  • ****
  • Posts: 385
  • KISS principle / Lazarus 2.0.0 / FPC 3.0.4
Re: ASM Calling convention differences beetween FPC 3.0.4 and 3.3.1
« Reply #1 on: July 06, 2019, 06:16:33 pm »
Please give information about CPU and OS.
procedure mulu64(a, b: QWORD; out clo, chi: QWORD); assembler;
asm
  mov rax, a
  mov rdx, b
  mul rdx
  mov [clo], rax
  mov [chi], rdx
end;

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7459
Re: ASM Calling convention differences beetween FPC 3.0.4 and 3.3.1
« Reply #2 on: July 06, 2019, 06:20:38 pm »
Vectorcall afaik puts vector parameters in vector registers. But that is for value types, not ref types.

Try removing the constref.

BeanzMaster

  • Full Member
  • ***
  • Posts: 182
Re: ASM Calling convention differences beetween FPC 3.0.4 and 3.3.1
« Reply #3 on: July 06, 2019, 11:34:26 pm »
Please give information about CPU and OS.
Actually library run under  Windows and Linux 32 and 64bit. And should be work with MacOS with Intel CPU.

Vectorcall afaik puts vector parameters in vector registers. But that is for value types, not ref types.

Try removing the constref.

Thanks Marcov. I've found how do. ConstRef seems to be accepted.

At this time vectorcall should be declared in the interface too

Code: Pascal  [Select]
  1.    class operator +(Constref A, B: TBZVector2i): TBZVector2i; {$ifdef USE_VECTORCALL} vectorcall; {$endif}overload;

and in implementation

Code: Pascal  [Select]
  1. class operator TBZVector2i.+(Constref A, B: TBZVector2i): TBZVector2i; {$ifdef USE_VECTORCALL} vectorcall; {$else} register; {$endif} assembler; nostackframe;
  2. asm
  3.   {$ifndef USE_VECTORCALL}
  4.   movq  xmm0, [A]
  5.   movq  xmm1, [B]
  6.   {$endif}
  7.   paddd xmm0, xmm1
  8.   movq  RAX, {%H-}xmm0
  9. end;  

I'll run my unit test later for check results

But at this stage i always get this error with

Code: Pascal  [Select]
  1. class operator TBZVector2d.*(constref A: TBZVector2d; constref B : TBZVector2i): TBZVector2d; assembler; nostackframe; register;
  2. asm
  3.   movapd   xmm0, [A]
  4.   cvtdq2pd xmm2, [B] // ERROR HERE
  5.   mulpd    xmm0, xmm2
  6.   movapd   [Result], xmm0
  7. end;

Quote
With FPC 3.3.1 cvtdq2pd xmm2, FPC return "vectormath_vector2d_win64_sse_imp.inc(30,3) Error: Asm: [cvtdq2pd reg??,mem128] invalid combination of opcode and operands" (Work well with FPC 3.0.4)

i'll check manual of SIMD instruction later. Perhaps i'll should load with a movq before doing the conversion (don't remember why i used XMM2 instead of XMM1) and i must try with vectorcall enabled

Thanks
               
« Last Edit: July 06, 2019, 11:36:48 pm by BeanzMaster »

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7459
Re: ASM Calling convention differences beetween FPC 3.0.4 and 3.3.1
« Reply #4 on: July 06, 2019, 11:44:56 pm »
That looks like AT&T style, and then the last register is usually the destination. Is that what you want? It seems to me use AT&T mnemonics (with q and d suffix), but intel argument order.

BeanzMaster

  • Full Member
  • ***
  • Posts: 182
Re: ASM Calling convention differences beetween FPC 3.0.4 and 3.3.1
« Reply #5 on: July 07, 2019, 08:55:13 am »
That looks like AT&T style, and then the last register is usually the destination. Is that what you want? It seems to me use AT&T mnemonics (with q and d suffix), but intel argument order.

For why i' used xmm0 and xmm2 it due to the Linux ABI see the code for Linux :

Code: Pascal  [Select]
  1. class operator TBZVector2d.*(constref A: TBZVector2d; constref B : TBZVector2i): TBZVector2d; assembler; nostackframe; register;
  2. asm
  3.   movapd   xmm0, [A]
  4.   cvtdq2pd xmm2, [B]
  5.   mulpd    xmm0, xmm2
  6.   movhlps xmm1, xmm0
  7. end;  

For  cvtdq2pd the term and order of argurments are rights (i'm in Intel mode). See this doc :  https://www.felixcloutier.com/x86/cvtdq2pd

So probably a bug with FPC 3.3.1 or i must do something else, somewhere..... (i must convert a lot of code, so perhaps,  it is possible that it is just an echo of another part of the code sent by the fpc compiler. . I'll also try with a Const instead ConstRef, but I'm not persuading that this is the problem)

PS : i did a strong "Unit Test" (include on the Github) so actually I'm sure the results and the code are right with FPC 3.0.4 with both Linux and Windows 64bits
« Last Edit: July 07, 2019, 08:59:10 am by BeanzMaster »