Lazarus

Free Pascal => General => Topic started by: beria on September 29, 2022, 01:09:10 pm

Title: assembler; + inline; How do I make friends? :-)
Post by: beria on September 29, 2022, 01:09:10 pm
For example, I have one:
Code: Pascal  [Select][+][-]
  1.  procedure StrSwap(var S1, S2: WideString); assembler; stdcall; nostackframe; inline;
  2.   asm
  3.            MOV     RAX,qword ptr [S1]
  4.            MOV     R8,qword ptr [S2]
  5.            MOV     qword ptr [S1],R8
  6.            MOV     qword ptr [S2],RAX
  7.   end;      

It is analogous, only 250 times faster than:

Code: Pascal  [Select][+][-]
  1.   procedure StrSwap(var S1, S2: WideString); inline;
  2.   var
  3.     S3: WideString;
  4.   begin
  5.     S3 := S2;
  6.     S2 := S1;
  7.     S1 := S3;
  8.   end;  


but how to make it always "inline"... The compiler writes that "assembler" and "inline" are incompatible, but I don't understand why, if in FPC source codes such mappings are used all the time?
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: avk on September 29, 2022, 02:05:15 pm
What about this version without any assembler?
Code: Pascal  [Select][+][-]
  1. procedure StrSwap(var S1, S2: widestring); inline;
  2. var
  3.   S3: Pointer;
  4. begin
  5.   S3 := Pointer(S2);
  6.   Pointer(S2) := Pointer(S1);
  7.   Pointer(S1) := S3;
  8. end;
  9.  
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on September 29, 2022, 02:12:02 pm
What about this version without any assembler?

It's not about the specific procedure, it's only as an example, although I have quite working, but how to do it in general.... For on the one hand it is possible and there are examples, but on the other hand the compiler does not allow...
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: PascalDragon on September 29, 2022, 02:15:03 pm
but how to make it always "inline"... The compiler writes that "assembler" and "inline" are incompatible, but I don't understand why, if in FPC source codes such mappings are used all the time?

Assembler routines can not be inlined, because the compiler has no real knowledge about what you're doing in that assembly code and what side effects some instruction might have. This would cause problems when the compiler assumes a certain state. A function call boundary protects here in most cases and thus assembly code either needs to be in a asmend-block (preferably with a list of modified registers) or it needs to be in a separate function which simply will not be inlined.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: 440bx on September 29, 2022, 02:25:40 pm
Assembler routines can not be inlined, because the compiler has no real knowledge about what you're doing in that assembly code and what side effects some instruction might have. This would cause problems when the compiler assumes a certain state. A function call boundary protects here in most cases and thus assembly code either needs to be in a asmend-block (preferably with a list of modified registers) or it needs to be in a separate function which simply will not be inlined.
Understandable and very reasonable.  However, it would be nice if there was a way to tell the compiler "inline it anyway, I take full responsibility".

I commonly use code like this:
Code: Pascal  [Select][+][-]
  1. if <somecondition> then
  2. begin
  3.    { <somecondition> cannot happen unless there is a bug in this program }
  4.  
  5.   if IsDebuggerPresent() then asm int 3 end;
  6.  
  7.   exit;
  8. end;
  9.  
That "int 3" statement won't break any assumptions the compiler might make about the rest of the Pascal code but, because of that single "asm int 3" it won't inline it and, in that particular case the inline assembler is totally innocuous.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: Thaddy on September 29, 2022, 03:32:29 pm
int 3 is an instruction that should always break. What's the fuss about? It is by design.
Since https://www.cpu-world.com/CPUs/8080/  or the Intel docs.


That's almost before I was born and i am 64...... ;D
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: 440bx on September 29, 2022, 03:50:31 pm
int 3 is an instruction that should always break. What's the fuss about? It is by design.
Since https://www.cpu-world.com/CPUs/8080/  or the Intel docs.


That's almost before I was born and i am 64...... ;D
Apparently, in 64 years you haven't learned that one thing int 3 won't break are any assumptions a compiler may make about register usage. Therefore, it would be safe for the compiler to inline any function/procedure which only uses int 3.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: abouchez on September 29, 2022, 05:24:28 pm
Today, I don't use asm but for very specific tasks using specific opcodes, like SSE2, AES-NI, AVX or SSE4.2.
... and you will find thousands of lines of such manually tuned asm in mORMot, e.g. just for x86_64 https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.base.asmx64.inc and https://github.com/synopse/mORMot2/blob/master/src/crypt/mormot.crypt.core.asmx64.inc or even https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.fpcx64mm.pas

Otherwise, especially if I want the FPC compiler to inline, I write pascal code.
With some tricks like pointer arithmetic for some core routines.

And latest versions of FPC tends to generate very good code.
The pointer() trick is to be used for this example. It will be properly inlined, it will be really cross-platform and cross-CPU, and it will be faster than manual non-inline asm.

To be fair, inlining asm would need more than... inlining... to be efficient.
You will need proper register allocation by the compiler, therefore you would need something closer to C/C++ intrinsics.
In real projects, inlining asm is used in a very few places. Intrinsics are the way to go. Or write the asm in high-level languages like Perl or other DSL.

So don't make premature optimization.
Root of all evil. ;)
And trust the RTL and some tuned libraries to be fast enough for most of the purposes.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: Thaddy on September 29, 2022, 06:07:45 pm
int 3 is an instruction that should always break. What's the fuss about? It is by design.
Since https://www.cpu-world.com/CPUs/8080/  or the Intel docs.


That's almost before I was born and i am 64...... ;D
Apparently, in 64 years you haven't learned that one thing int 3 won't break are any assumptions a compiler may make about register usage. Therefore, it would be safe for the compiler to inline any function/procedure which only uses int 3.
Well, before you make any further comments, my pension is payed. Three months to go ... Good luck with lizzy.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: 440bx on September 29, 2022, 06:19:01 pm
my pension is payed.
your pension is payed ?... I see you haven't learned the meaning of "payed" yet, in spite of the fact that this is the second time you misuse it and that it is pointed out to you.  Good for you that your pension won't leak.


Title: Re: assembler; + inline; How do I make friends? :-)
Post by: avk on September 29, 2022, 06:22:35 pm
What about this version without any assembler?

It's not about the specific procedure, it's only as an example, although I have quite working, but how to do it in general.... For on the one hand it is possible and there are examples, but on the other hand the compiler does not allow...

I had hoped that my message was clear: if you can do without assembler, you should do without it. If a pure Pascal subroutine seemed to be many times slower than a similar assembler one, it might be worth taking a closer look at its code.
And your example demonstrates this very well.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: Thaddy on September 29, 2022, 06:25:31 pm
my pension is payed.
your pension is payed ?... I see you haven't learned the meaning of "payed" yet, in spite of the fact that this is the second time you misuse it and that it is pointed out to you.  Good for you that your pension won't leak.
20% programming of which is 5% Pascal, 75% management. The  left-over is sheer luck.

When in France...
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: KodeZwerg on September 29, 2022, 10:04:51 pm
For example, I have one:
Code: Pascal  [Select][+][-]
  1.  procedure StrSwap(var S1, S2: WideString); assembler; stdcall; nostackframe; inline;
  2.   asm
  3.            MOV     RAX,qword ptr [S1]
  4.            MOV     R8,qword ptr [S2]
  5.            MOV     qword ptr [S1],R8
  6.            MOV     qword ptr [S2],RAX
  7.   end;      

It is analogous, only 250 times faster than:

Code: Pascal  [Select][+][-]
  1.   procedure StrSwap(var S1, S2: WideString); inline;
  2.   var
  3.     S3: WideString;
  4.   begin
  5.     S3 := S2;
  6.     S2 := S1;
  7.     S1 := S3;
  8.   end;  


but how to make it always "inline"... The compiler writes that "assembler" and "inline" are incompatible, but I don't understand why, if in FPC source codes such mappings are used all the time?

It is analogous, only 250 times faster than:
I wonder how you measured that.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on September 29, 2022, 11:45:13 pm
What about this version without any assembler?

It's not about the specific procedure, it's only as an example, although I have quite working, but how to do it in general.... For on the one hand it is possible and there are examples, but on the other hand the compiler does not allow...

I had hoped that my message was clear: if you can do without assembler, you should do without it. If a pure Pascal subroutine seemed to be many times slower than a similar assembler one, it might be worth taking a closer look at its code.
And your example demonstrates this very well.

This was an example. I can rewrite the same thing on MMX,SSE,AVX 1-2-512 and no pascal will do so close....
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on September 29, 2022, 11:49:07 pm

I wonder how you measured that.
[/quote]

https://www.freepascal.org/docs-html/rtl/sysutils/gettickcount64.html (https://www.freepascal.org/docs-html/rtl/sysutils/gettickcount64.html)
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: simsee on September 29, 2022, 11:51:48 pm
What about this version without any assembler?
Code: Pascal  [Select][+][-]
  1. procedure StrSwap(var S1, S2: widestring); inline;
  2. var
  3.   S3: Pointer;
  4. begin
  5.   S3 := Pointer(S2);
  6.   Pointer(S2) := Pointer(S1);
  7.   Pointer(S1) := S3;
  8. end;
  9.  

Very interesting example. One aspect that is not clear to me: why does casting a string to a pointer return the pointer to the string itself? Thank you.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on September 30, 2022, 12:04:22 am

And trust the RTL and some tuned libraries to be fast enough for most of the purposes.
I already know in rtl some completely irrational and close to religion things, on which I have already been burned and had to do the functionality myself....
By the way, the trick with the Paintner is also not a standard approach.
Anyway, I still didn't understand why such an artificial restriction for assembler, and I didn't even find its description in the manual...  I remember practically in my childhood, I used to insert "Inline code", and in general as processor codes, into Turbo-Pascal on the old IBM PC-XT. And it all worked.
If one uses it - he already knows all the risks and a priori knows what he's doing, and if you ban everything dangerous, it's easier to ban all the pointers, as it is in some super high-level scripting languages......
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on September 30, 2022, 12:06:52 am
What about this version without any assembler?
Code: Pascal  [Select][+][-]
  1. procedure StrSwap(var S1, S2: widestring); inline;
  2. var
  3.   S3: Pointer;
  4. begin
  5.   S3 := Pointer(S2);
  6.   Pointer(S2) := Pointer(S1);
  7.   Pointer(S1) := S3;
  8. end;
  9.  

Very interesting example. One aspect that is not clear to me: why does casting a string to a pointer return the pointer to the string itself? Thank you.
It's very simple. WideString is the most common pointer to some binary structure, which is described in the manual....
It is very similar to Pchar, that is, it is also only a pointer, but it also includes its own length and not just a null marker at the end, like Pchar.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: jamie on September 30, 2022, 02:23:17 am
Interlockexchangepointer ?
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: abouchez on September 30, 2022, 01:53:06 pm
What about this version without any assembler?
Very interesting example. One aspect that is not clear to me: why does casting a string to a pointer return the pointer to the string itself? Thank you.
If you assign two WideString variables, on Windows, it will use OleStr/BStr assignment, which is to allocate a new buffer using the slow Windows global heap, then copy all UTF-16 codepoints.
So it is a very slow process, and not needed to swap variables. You end up allocating 3 BSTR instances!
(not that on Linux/POSIX, WideString is not using BSTR but is an alias to UnicodeString for FPC, so it uses reference counting, so will be MUCH faster)

If you write pointer() then it will just copy the pointer of it. This is what FPC (and Delphi) does for AnsiString, UnicodeString, WideString, interface, and dynamic arrays, which are all stored as pointers in variables.
Which is as fast as assembly. Perfect to swap two variables.
This is exactly what the initial asm does.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: abouchez on September 30, 2022, 02:06:02 pm
I already know in rtl some completely irrational and close to religion things, on which I have already been burned and had to do the functionality myself....
Fair Enough. This is why we recoded most of the RTL within mORMot. ;)
(also for perfect switch between FPC and Delphi - because their RTL is not always compatible)

By the way, the trick with the Paintner is also not a standard approach.
It is a perfectly valid and standard approach, perfectly documented and used in several places in the RTL. It is also common to FPC and Delphi for AnsiString, UnicodeString, WideString, dynamic arrays and interfaces.

Anyway, I still didn't understand why such an artificial restriction for assembler, and I didn't even find its description in the manual...  I remember practically in my childhood, I used to insert "Inline code", and in general as processor codes, into Turbo-Pascal on the old IBM PC-XT. And it all worked.
I see several reasons:
1) On 8086/8087 it made sense because asm was much more needed, e.g. to call the OS, or for better performance, since the TP compiler was fast but not so optimized.
2) Delphi followed an even worse pattern: asm end blocks are not allowed with begin end blocks on Win64 - and asm is not available on ARM/AARCH64.
3) Register allocation is a hard work for the compiler, and using pure pascal code is easier to optimize when inlining than opaque assembly - one obvious example is constant propagation.
4) In practice, an algorithm with some loops or complex opcodes (e.g. AVX2) is better in its own sub-function that inlined. You can verify the AVX2 registers allocation and vzeroupper mandatory opcodes.
5) Making asm working on several OS can be a PITA for instance. You won't be able to make something more complex that your exchange sample. Otherwise, you are likely to be stuck with the ABI differences (calling convention, volatile registers...)
6) I already wrote about the right way to use complex opcodes, from the compiler point of view: it is by using intrinsics, not manual asm. No one is making huge asm code in VC/GCC using inlined asm - they use intrinsics.
7) Just look at the asm I wrote in mORMot, and you will see it is not so easy to write such assembly. And to be honest, I never needed to have inlined asm for real programming. But intrinsics may have helped.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: PascalDragon on September 30, 2022, 03:47:47 pm
Assembler routines can not be inlined, because the compiler has no real knowledge about what you're doing in that assembly code and what side effects some instruction might have. This would cause problems when the compiler assumes a certain state. A function call boundary protects here in most cases and thus assembly code either needs to be in a asmend-block (preferably with a list of modified registers) or it needs to be in a separate function which simply will not be inlined.
Understandable and very reasonable.  However, it would be nice if there was a way to tell the compiler "inline it anyway, I take full responsibility".

We won't add something like that, because users will abuse it and then complain if something goes wrong.

Anyway, I still didn't understand why such an artificial restriction for assembler, and I didn't even find its description in the manual...

The inline directive is merely a hint for the compiler. It is in no way required to inline a function and if for whatever reason it decides that it won't or can't do it then it won't. And currently one of the reasons not to inline a function is if it's an assembly function or contains an assembly block.

I remember practically in my childhood, I used to insert "Inline code", and in general as processor codes, into Turbo-Pascal on the old IBM PC-XT. And it all worked.

FPC also allows you to write inline assembly for all targets (compared to Delphi as abouchez wrote), however if you don't add a register list to an assembly block then the generated code might have worse performance than expected due to potential unnecessary spilling of registers to the stack.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on October 01, 2022, 12:35:55 am
I already know in rtl some completely irrational and close to religion things, on which I have already been burned and had to do the functionality myself....

7) Just look at the asm I wrote in mORMot, and you will see it is not so easy to write such assembly. And to be honest, I never needed to have inlined asm for real programming. But intrinsics may have helped.

I do about the same thing myself. For example to exchange values between data buffers, I am currently working on a telemetry server for industrial equipment, I use a very simple procedure... Not even the best one, because without data alignment.


Code: Pascal  [Select][+][-]
  1. procedure Swap32(P1, P2: Pointer);
  2. asm
  3.          VMOVDQU ymm0, YMMWORD PTR [P1]
  4.          VMOVDQU ymm1, YMMWORD PTR [P2]
  5.          VMOVDQU YMMWORD PTR [P2], ymm0
  6.          VMOVDQU YMMWORD PTR [P1], ymm1
  7. end;
  8.  
  9.  
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on October 01, 2022, 12:50:59 am


We won't add something like that, because users will abuse it and then complain if something goes wrong.


This is certainly not my business, but the assembler in the code, it is not something that is at all widespread and is usually done firstly, if one knows assembler of specific processors at all, by the way I do not particularly pretend, and secondly, only if the standard ways do not realize something at all or standard implementation is extremely bad, even if universal. Another concrete example in FPC, which I encountered a lot earlier, is the library of complex arithmetic, which is exceptionally bad by performance, that is, without using AVX. But it is necessary to rewrite just a few necessary functions - everything instantly changes because there is almost 100 times performance difference...
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: PascalDragon on October 01, 2022, 07:34:05 pm
Another concrete example in FPC, which I encountered a lot earlier, is the library of complex arithmetic, which is exceptionally bad by performance, that is, without using AVX. But it is necessary to rewrite just a few necessary functions - everything instantly changes because there is almost 100 times performance difference...

FPC also targets x86 systems that do not provide AVX. If you have improvements that also work with older processors or that provides a dynamic detection then you can report an issue together with patches or provide a merge request.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on October 03, 2022, 05:15:44 pm
Another concrete example in FPC, which I encountered a lot earlier, is the library of complex arithmetic, which is exceptionally bad by performance, that is, without using AVX. But it is necessary to rewrite just a few necessary functions - everything instantly changes because there is almost 100 times performance difference...

FPC also targets x86 systems that do not provide AVX. If you have improvements that also work with older processors or that provides a dynamic detection then you can report an issue together with patches or provide a merge request.
This is done very simply with conditional compilation flags.
It's enough to have flags not only win64/win32/linix/maccos, but also flags AVX, AVX2 and AVX512.... globally, at copyleader kernel level And any procedure, including the system one, can be implemented in several variants.  Moreover, I have to do it right now, because of performance problems... Moreover, when you set flags, you can even compile only base code for processor with avx, and you need it too. This is the first one.
Second, as in C++ you can have an "inline" library of elementary vector operations, which can be used to program the AVX without going to assembler.  An example is immintrin.h and even the whole set is not important.  Just from my experience only less than 5% of commands are used massively. But the possibility to implement this (inline+assembler) in general is artificially turned off in FPC for reasons I can't understand, although there are implementations of this deep inside the compiler...


Title: Re: assembler; + inline; How do I make friends? :-)
Post by: Thaddy on October 03, 2022, 05:59:46 pm
The latter is nonsense for Intel types since you would have to work aroing the cpuid instruction, which not all supported Xxxx intel family support.
Title: Re: assembler; + inline; How do I make friends? :-)
Post by: beria on October 04, 2022, 10:48:56 am
The latter is nonsense for Intel types since you would have to work aroing the cpuid instruction, which not all supported Xxxx intel family support.

???? As far as I know always and for any Intel processor family, you can get information about the supported instruction set.  That is, the compiler, depending on its revision for a particular type of processor, always knows this...  And then it's up to the programmer whether or not to use conditional compilation... What's more, the compiler already has...

Code: Pascal  [Select][+][-]
  1.    fputypestr : array[tfputype] of string[7] = (
  2.      'NONE',
  3. // 'SOFT',
  4.      'SSE64',
  5.      'SSE3',
  6.      'SSSE3',
  7.      'SSE41',
  8.      'SSE42',
  9.      'AVX',
  10.      'AVX2',
  11.      'AVX512F'
  12.    );
  13.  
  14.  
  15.  

Title: Re: assembler; + inline; How do I make friends? :-)
Post by: PascalDragon on October 04, 2022, 01:25:13 pm
Another concrete example in FPC, which I encountered a lot earlier, is the library of complex arithmetic, which is exceptionally bad by performance, that is, without using AVX. But it is necessary to rewrite just a few necessary functions - everything instantly changes because there is almost 100 times performance difference...

FPC also targets x86 systems that do not provide AVX. If you have improvements that also work with older processors or that provides a dynamic detection then you can report an issue together with patches or provide a merge request.
This is done very simply with conditional compilation flags.

That's why I wrote “or that provides a dynamic detection”. But if there isn't a solution that covers that from the beginning then we will not even think about integrating it.

But the possibility to implement this (inline+assembler) in general is artificially turned off in FPC for reasons I can't understand, although there are implementations of this deep inside the compiler...

We have explained our reasons and we stand by those reasons and will not make exceptions for them. And I don't know what you mean with “implementations of this deep inside the compiler”.

The latter is nonsense for Intel types since you would have to work aroing the cpuid instruction, which not all supported Xxxx intel family support.

The CPUID instruction was introduced with the 486 processors and any Intel compatible CPU since then supports this instruction. So for the context of this discussion (namely accelerated functions) you can simply fall back to the non-accelerated code path if there is no CPUID instruction available, cause any of the acceleration functionalities in question won't be available either.

That is, the compiler, depending on its revision for a particular type of processor, always knows this...  And then it's up to the programmer whether or not to use conditional compilation... What's more, the compiler already has...

That is for compile time. However the default RTL we distribute needs to be able to run also on systems that don't support that (and not everyone will recompile the RTL with their preferred setting). And thus it either needs to be compiled with the lowest setting for the oldest supported processor or it needs to have runtime detection.
TinyPortal © 2005-2018