I already know in rtl some completely irrational and close to religion things, on which I have already been burned and had to do the functionality myself....
Fair Enough. This is why we recoded most of the RTL within mORMot.
(also for perfect switch between FPC and Delphi - because their RTL is not always compatible)
By the way, the trick with the Paintner is also not a standard approach.
It is a perfectly valid and standard approach, perfectly documented and used in several places in the RTL. It is also common to FPC and Delphi for AnsiString, UnicodeString, WideString, dynamic arrays and interfaces.
Anyway, I still didn't understand why such an artificial restriction for assembler, and I didn't even find its description in the manual... I remember practically in my childhood, I used to insert "Inline code", and in general as processor codes, into Turbo-Pascal on the old IBM PC-XT. And it all worked.
I see several reasons:
1) On 8086/8087 it made sense because asm was much more needed, e.g. to call the OS, or for better performance, since the TP compiler was fast but not so optimized.
2) Delphi followed an even worse pattern: asm end blocks are not allowed with begin end blocks on Win64 - and asm is not available on ARM/AARCH64.
3) Register allocation is a hard work for the compiler, and using pure pascal code is easier to optimize when inlining than opaque assembly - one obvious example is constant propagation.
4) In practice, an algorithm with some loops or complex opcodes (e.g. AVX2) is better in its own sub-function that inlined. You can verify the AVX2 registers allocation and vzeroupper mandatory opcodes.
5) Making asm working on several OS can be a PITA for instance. You won't be able to make something more complex that your exchange sample. Otherwise, you are likely to be stuck with the ABI differences (calling convention, volatile registers...)
6) I already wrote about the right way to use complex opcodes, from the compiler point of view: it is by using intrinsics, not manual asm. No one is making huge asm code in VC/GCC using inlined asm - they use intrinsics.
7) Just look at the asm I wrote in mORMot, and you will see it is not so easy to write such assembly. And to be honest, I never needed to have inlined asm for real programming. But intrinsics may have helped.