Well, and that goes for both you and me, our original 8/16/32 bit 8086/8087/6502/6510/z80 , maybe 80386 or God forbid 80286 assembler knowledge is soooooo outdated we will have to use one of the FreePascal esotheric (cross-)compilers to show off our knowledge

.
Even if you or me still wrote assembler from scratch, we wouldn't know about cache stalls, locked cores etc. So our code becomes
slower instead of faster on modern multi-core cpu's.
Many people have that pretention and it is seldom that it is proper to write assembler, except for the likes of Marcov, but he knows what he is doing and knows about the limitations. The world and the compilers have moved on, in Indonesean, Tempo dulu, that time has gone by.
(The latter is just to ensure our friend Handoko reads the thread

)
I don't want to discourage beginners that still think: "Hey, Look, this assembler MUST be faster than the compiler generated code!". Just let them dream... and pursue the deeper knowledge.
Remember, beginners do not know about profilers...
We both started like that I guess...