Recent

Author Topic: Limited relocation support in inline assembler  (Read 5716 times)

MathMan

  • Full Member
  • ***
  • Posts: 164
Limited relocation support in inline assembler
« on: September 24, 2014, 10:21:43 pm »
Hi all,

I am currently developing a Multi precision arithmetic package for FPC using LCL 1.2.4, FPC 2.6.4., Win 7/64 bit on an Intel processor.

For Speed reasons I would like to do the following

Quote
    // prepare addition with loop-unrolling 8
    mov     R8, RCX;
    and     R8, 7;
    shr     RCX, 3;
    inc     RCX;
    clc;
    jmp     [@Table+8*R8];

    align   8;
    @Table:
    dq      @lAddZero;
    dq      @lAddOne;
    dq      @lAddTwo;
    dq      @lAddThree;
    dq      @lAddFour;
    dq      @lAddFive;
    dq      @lAddSix;
    dq      @lAddSeven;

    // add in chunks of 8 limb
  @AddLongLoop:

    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddSeven:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddSix:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddFive:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddFour:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddThree:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddTwo:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddOne:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddZero:
    loop    @AddLongLoop;

I know that this is a bit extreme - but it is valid and not even self modifying code  ;)

The documentation unfortunately states that "offset" is not supported for Intel style Assembler. Consequently the compiler only generates a 32bit relocation info for the entries in @Table - making the code crash if not run in the low 4GByte of system memory. This is a bit unfortunate and I would like to know

 a - is there any Intention to lift this limitation in the near furture?
 b - is there maybe an alternative way to do this - e.g. I thought about moving @Table to the general variable space, but the labels are local to the procedure scope so no luck  :(

Regards,
MathMan

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7356

MathMan

  • Full Member
  • ***
  • Posts: 164
Re: Limited relocation support in inline assembler
« Reply #2 on: September 24, 2014, 10:51:22 pm »

See http://bugs.freepascal.org/view.php?id=26555
Hi Marcov,

This seems to not the same as I am talking about. The bug report is about accessing global variables from inline Assembler - I don't do that. What I am doing is acessing a local variable (via it's offset). It might be though that both boil down to the same issue within the compiler ...

Beside - I think it was me also triggering that bug report with my post "Linker issue"  :)

Regards,
Jens

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 659
Re: Limited relocation support in inline assembler
« Reply #3 on: September 25, 2014, 09:49:28 am »
Your code is invalid, you cannot use 64 bit displacements in arbitrary instructions. See e.g. http://www.nasm.us/doc/nasmdo11.html . You will have to use RIP-relative addressing, which in turn does not support indexed accesses.

MathMan

  • Full Member
  • ***
  • Posts: 164
Re: Limited relocation support in inline assembler
« Reply #4 on: September 25, 2014, 12:00:48 pm »
Your code is invalid, you cannot use 64 bit displacements in arbitrary instructions. See e.g. http://www.nasm.us/doc/nasmdo11.html . You will have to use RIP-relative addressing, which in turn does not support indexed accesses.

Ok, I see your point - I was too fast scanning the Intel documentation and missed the fact that the displacements are pretty much limited to 32 bit in 64 bit mode too  :(. However shouldn't

Quote
lea     RAX, @Table;
jmp     [RAX+8*R8];

do the trick then? The Intel doc states that lea can calculate a 64 bit displacement (operand size & address size set 64 bit meaning only the REX.W prefix is required) - but the compiler still generates a 32 bit offset. Am I missing something too here? I used lea because the obvious choice

Quote
mov     RAX, offset @Table;
jmp     [RAX+8*R8];

is not supported by the inline assembler and the programmers reference states that lea should be used instead.

Kind regards,
MathMan

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 659
Re: Limited relocation support in inline assembler
« Reply #5 on: September 25, 2014, 11:13:28 pm »
There are several answers to this:
a) the x86-64 intel assembler reader is barely used/tested, see e.g. http://bugs.freepascal.org/view.php?id=26613
b) you should never use absolute addressing on x86-64, but use relative (RIP-based) addressing instead. While in theory it's supported, in practice it's strongly discouraged for various reasons (and I doubt there is any support for it in FPC, because unless you are writing OS kernels or so you shouldn't ever use it).

I would strongly recommend to try to find a tutorial on writing x86-64 assembly somewhere, or at least study the code generated by the compiler. You cannot just apply your assumptions from 32 bit x86.

MathMan

  • Full Member
  • ***
  • Posts: 164
Re: Limited relocation support in inline assembler
« Reply #6 on: September 26, 2014, 01:21:10 am »
...
b) you should never use absolute addressing on x86-64, but use relative (RIP-based) addressing instead. While in theory it's supported, in practice it's strongly discouraged for various reasons (and I doubt there is any support for it in FPC, because unless you are writing OS kernels or so you shouldn't ever use it).
...

Ok, if it is bad programming practice and will potentially bring me into conflict with the FPC environment I'll let it rest and see if there are other, better behaved possibilities.