Recent

Author Topic: Self in ASM  (Read 4055 times)

creaothceann

  • Full Member
  • ***
  • Posts: 117
Self in ASM
« on: September 09, 2017, 12:13:56 am »
I have an assembler block in a method. In this assembler block I want to access a variable of the class, preferably without loading it into a local variable.

The MOV instruction takes an offset and a register (and optionally an index with its multiplicator, but I don't use that here), and the documentation says that "_SELF represents the object method pointer in methods", but I can't seem to be able to use __SELF in a MOV instruction...

Example code:

Code: Pascal  [Select][+][-]
  1. {$ASMMODE ATT}
  2.  
  3.  
  4. procedure TMyClass.MyMethod;
  5. begin
  6. //...
  7. ASM
  8.         {$IFDEF CPU32}
  9.         //...
  10.         MOVl    TMyClass.MyVariable(__SELF), %EAX
  11.         //...
  12.         {$ELSE}
  13.         //...
  14.         MOVq    __SELF, %RAX
  15.         MOVq    TMyClass.MyVariable(%RAX), %RAX
  16.         //...
  17.         {$ENDIF}
  18. END;
  19. //...
  20. end;

Right now I'm compiling for x64, so the upper code path isn't active. In the lower code path you can see that I have to explicitly load __SELF into a register first before using that register in the MOV instruction (which might end up as a useless "MOV RAX, RAX" in some cases). Is this a bug in the compiler?

EDIT: From the documentation I was assuming that __SELF represents the current register holding the object pointer, maybe that's not the case?
« Last Edit: September 09, 2017, 12:54:33 am by creaothceann »

ASerge

  • Hero Member
  • *****
  • Posts: 2242
Re: Self in ASM
« Reply #1 on: September 09, 2017, 02:16:22 am »
Code: Pascal  [Select][+][-]
  1. procedure TForm1.FormCreate(Sender: TObject);
  2. begin
  3.   FField := 1;
  4.   TestIncAsm;
  5.   TestIncBlock;
  6.   Caption := IntToStr(FField);
  7. end;
  8.  
  9. {$ASMMODE INTEL}
  10.  
  11. procedure TForm1.TestIncAsm; assembler;
  12. asm
  13.   mov rdx, Self.FField
  14.   inc rdx
  15.   mov Self.FField, rdx
  16. end;
  17.  
  18. procedure TForm1.TestIncBlock;
  19. begin
  20.   asm
  21.     mov rax, Self
  22.     mov rdx, TForm1(rax).FField
  23.     inc rdx
  24.     mov TForm1(rax).FField, rdx
  25.   end;
  26. end;

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: Self in ASM
« Reply #2 on: September 09, 2017, 04:37:31 am »
I would say: Just don't use ASM. There is essentially a zero percent chance that it will actually be faster than what FPC will produce from normal Pascal code when the proper optimization flags are set. And especially don't use it for such basic purposes as what you seem to be using it for (as in shuffling the values of variables around. Really? Why?)

Let's even give you the benefit of the doubt and say you somehow managed to implement the most highly optimized hand-written ASM implementation of a certain method the world has ever seen (using SSE2 instructions, we'll say, because for some unknown reason people still seem to think the 16-year-old SSE2 instruction set is remotely close to what high-end CPUs are designed for in 2017.) It would still be blown out of the water with a well-written Pascal implementation by anyone with an SSE3+, AVX, AVX2, e.t.c capable CPU who bothered to take the two seconds required to set the required optimization flags before compiling it.
« Last Edit: September 09, 2017, 06:00:23 am by Akira1364 »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11452
  • FPC developer.
Re: Self in ASM
« Reply #3 on: September 09, 2017, 07:06:09 am »
Not even close. FPC does not autovectorize, so the multiple-values-in-one go feature of SSE goes unused with pascal code. (and even then it is not hard to make handoptimized code better than autovectorized or intrinsics code)

Akira1364

  • Hero Member
  • *****
  • Posts: 561
Re: Self in ASM
« Reply #4 on: September 09, 2017, 07:39:55 am »
-Sv seems to do the trick in my experience. I'm not saying it isn't possible, I'm just saying I've never seen a handwritten ASM implementation that was better than or came anywhere close to the result of the Pascal version compiled with -CfAVX2 -CpCOREAVX2 -O4 -OpCOREAVX2 -Sv.

creaothceann

  • Full Member
  • ***
  • Posts: 117
Re: Self in ASM
« Reply #5 on: September 09, 2017, 09:41:53 am »
@ASerge:
Thanks. So I have to do the extra step in non-ASM methods? Alright.

@Akira1364:
Sometimes the programmer has more information available than the compiler, which can make all the difference. Especially when the compiler's usual approach isn't really designed for the problem.

I'm writing an emulator (several million opcodes per second) with an interpreter core. The naive approach reads a byte (opcode) from virtual memory in an endless loop and uses a case-of construct to run the appropriate opcode handler, which defeats the host CPU's branch predictor. So a more refined approach uses an array of pointers to labels and jumps between them via computed goto at the end of every opcode handler. The "problem" is that in my case this burns 2*256*8 = 4 KB of the host CPU's L1 data cache (x64 host CPU, 2 guest CPU modes), which is usually 32 KB per core. That's 12.5% of high speed cache occupied which could maybe, possibly be used for other data.

So my idea was to take the opcode byte and transform it (without further memory accesses) into an opcode handler's memory address. Which is already working, using a fixed virtual memory layout (yes, highly platform specific but that's ok) where I copy each handler's program code to its own strategic position. (Which means no global variables thanks to x64 RIP, but no problem.) The problem right now is to safely return from there when it's time to run the rest of the program.

(This is a "for fun" project, and "just write a JITter" / "write the whole program in ASM" wouldn't be fun.)

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11452
  • FPC developer.
Re: Self in ASM
« Reply #6 on: September 09, 2017, 02:05:48 pm »
-Sv seems to do the trick in my experience.

In my experience it doesn't do that much at all. And it is buggy. https://bugs.freepascal.org/view.php?id=31612

Quote
I'm not saying it isn't possible, I'm just saying I've never seen a handwritten ASM implementation that was better than or came anywhere close to the result of the Pascal version compiled with -CfAVX2 -CpCOREAVX2 -O4 -OpCOREAVX2 -Sv.

I work in image processing, and a factor 3-4 is standard. That said, one should always have both implementations and compare.

 

TinyPortal © 2005-2018