What about this version without any assembler?It's not about the specific procedure, it's only as an example, although I have quite working, but how to do it in general.... For on the one hand it is possible and there are examples, but on the other hand the compiler does not allow...
but how to make it always "inline"... The compiler writes that "assembler" and "inline" are incompatible, but I don't understand why, if in FPC source codes such mappings are used all the time?
Assembler routines can not be inlined, because the compiler has no real knowledge about what you're doing in that assembly code and what side effects some instruction might have. This would cause problems when the compiler assumes a certain state. A function call boundary protects here in most cases and thus assembly code either needs to be in a asm … end-block (preferably with a list of modified registers) or it needs to be in a separate function which simply will not be inlined.Understandable and very reasonable. However, it would be nice if there was a way to tell the compiler "inline it anyway, I take full responsibility".
int 3 is an instruction that should always break. What's the fuss about? It is by design.Apparently, in 64 years you haven't learned that one thing int 3 won't break are any assumptions a compiler may make about register usage. Therefore, it would be safe for the compiler to inline any function/procedure which only uses int 3.
Since https://www.cpu-world.com/CPUs/8080/ or the Intel docs.
That's almost before I was born and i am 64...... ;D
Well, before you make any further comments, my pension is payed. Three months to go ... Good luck with lizzy.int 3 is an instruction that should always break. What's the fuss about? It is by design.Apparently, in 64 years you haven't learned that one thing int 3 won't break are any assumptions a compiler may make about register usage. Therefore, it would be safe for the compiler to inline any function/procedure which only uses int 3.
Since https://www.cpu-world.com/CPUs/8080/ or the Intel docs.
That's almost before I was born and i am 64...... ;D
my pension is payed.your pension is payed ?... I see you haven't learned the meaning of "payed" yet, in spite of the fact that this is the second time you misuse it and that it is pointed out to you. Good for you that your pension won't leak.
What about this version without any assembler?It's not about the specific procedure, it's only as an example, although I have quite working, but how to do it in general.... For on the one hand it is possible and there are examples, but on the other hand the compiler does not allow...
20% programming of which is 5% Pascal, 75% management. The left-over is sheer luck.my pension is payed.your pension is payed ?... I see you haven't learned the meaning of "payed" yet, in spite of the fact that this is the second time you misuse it and that it is pointed out to you. Good for you that your pension won't leak.
For example, I have one:
procedure StrSwap(var S1, S2: WideString); assembler; stdcall; nostackframe; inline; asm MOV RAX,qword ptr [S1] MOV R8,qword ptr [S2] MOV qword ptr [S1],R8 MOV qword ptr [S2],RAX end;
It is analogous, only 250 times faster than:
procedure StrSwap(var S1, S2: WideString); inline; var S3: WideString; begin S3 := S2; S2 := S1; S1 := S3; end;
but how to make it always "inline"... The compiler writes that "assembler" and "inline" are incompatible, but I don't understand why, if in FPC source codes such mappings are used all the time?
It is analogous, only 250 times faster than:I wonder how you measured that.
What about this version without any assembler?It's not about the specific procedure, it's only as an example, although I have quite working, but how to do it in general.... For on the one hand it is possible and there are examples, but on the other hand the compiler does not allow...
I had hoped that my message was clear: if you can do without assembler, you should do without it. If a pure Pascal subroutine seemed to be many times slower than a similar assembler one, it might be worth taking a closer look at its code.
And your example demonstrates this very well.
I wonder how you measured that.
What about this version without any assembler?
procedure StrSwap(var S1, S2: widestring); inline; var S3: Pointer; begin S3 := Pointer(S2); Pointer(S2) := Pointer(S1); Pointer(S1) := S3; end;
I already know in rtl some completely irrational and close to religion things, on which I have already been burned and had to do the functionality myself....
And trust the RTL and some tuned libraries to be fast enough for most of the purposes.
It's very simple. WideString is the most common pointer to some binary structure, which is described in the manual....What about this version without any assembler?
procedure StrSwap(var S1, S2: widestring); inline; var S3: Pointer; begin S3 := Pointer(S2); Pointer(S2) := Pointer(S1); Pointer(S1) := S3; end;
Very interesting example. One aspect that is not clear to me: why does casting a string to a pointer return the pointer to the string itself? Thank you.
If you assign two WideString variables, on Windows, it will use OleStr/BStr assignment, which is to allocate a new buffer using the slow Windows global heap, then copy all UTF-16 codepoints.What about this version without any assembler?Very interesting example. One aspect that is not clear to me: why does casting a string to a pointer return the pointer to the string itself? Thank you.
I already know in rtl some completely irrational and close to religion things, on which I have already been burned and had to do the functionality myself....Fair Enough. This is why we recoded most of the RTL within mORMot. ;)
By the way, the trick with the Paintner is also not a standard approach.It is a perfectly valid and standard approach, perfectly documented and used in several places in the RTL. It is also common to FPC and Delphi for AnsiString, UnicodeString, WideString, dynamic arrays and interfaces.
Anyway, I still didn't understand why such an artificial restriction for assembler, and I didn't even find its description in the manual... I remember practically in my childhood, I used to insert "Inline code", and in general as processor codes, into Turbo-Pascal on the old IBM PC-XT. And it all worked.I see several reasons:
Assembler routines can not be inlined, because the compiler has no real knowledge about what you're doing in that assembly code and what side effects some instruction might have. This would cause problems when the compiler assumes a certain state. A function call boundary protects here in most cases and thus assembly code either needs to be in a asm … end-block (preferably with a list of modified registers) or it needs to be in a separate function which simply will not be inlined.Understandable and very reasonable. However, it would be nice if there was a way to tell the compiler "inline it anyway, I take full responsibility".
Anyway, I still didn't understand why such an artificial restriction for assembler, and I didn't even find its description in the manual...
I remember practically in my childhood, I used to insert "Inline code", and in general as processor codes, into Turbo-Pascal on the old IBM PC-XT. And it all worked.
I already know in rtl some completely irrational and close to religion things, on which I have already been burned and had to do the functionality myself....
7) Just look at the asm I wrote in mORMot, and you will see it is not so easy to write such assembly. And to be honest, I never needed to have inlined asm for real programming. But intrinsics may have helped.
We won't add something like that, because users will abuse it and then complain if something goes wrong.
Another concrete example in FPC, which I encountered a lot earlier, is the library of complex arithmetic, which is exceptionally bad by performance, that is, without using AVX. But it is necessary to rewrite just a few necessary functions - everything instantly changes because there is almost 100 times performance difference...
This is done very simply with conditional compilation flags.Another concrete example in FPC, which I encountered a lot earlier, is the library of complex arithmetic, which is exceptionally bad by performance, that is, without using AVX. But it is necessary to rewrite just a few necessary functions - everything instantly changes because there is almost 100 times performance difference...
FPC also targets x86 systems that do not provide AVX. If you have improvements that also work with older processors or that provides a dynamic detection then you can report an issue together with patches or provide a merge request.
The latter is nonsense for Intel types since you would have to work aroing the cpuid instruction, which not all supported Xxxx intel family support.
This is done very simply with conditional compilation flags.Another concrete example in FPC, which I encountered a lot earlier, is the library of complex arithmetic, which is exceptionally bad by performance, that is, without using AVX. But it is necessary to rewrite just a few necessary functions - everything instantly changes because there is almost 100 times performance difference...
FPC also targets x86 systems that do not provide AVX. If you have improvements that also work with older processors or that provides a dynamic detection then you can report an issue together with patches or provide a merge request.
But the possibility to implement this (inline+assembler) in general is artificially turned off in FPC for reasons I can't understand, although there are implementations of this deep inside the compiler...
The latter is nonsense for Intel types since you would have to work aroing the cpuid instruction, which not all supported Xxxx intel family support.
That is, the compiler, depending on its revision for a particular type of processor, always knows this... And then it's up to the programmer whether or not to use conditional compilation... What's more, the compiler already has...