Forum > FPC development

Limited relocation support in inline assembler

(1/2) > >>

MathMan:
Hi all,

I am currently developing a Multi precision arithmetic package for FPC using LCL 1.2.4, FPC 2.6.4., Win 7/64 bit on an Intel processor.

For Speed reasons I would like to do the following


--- Quote ---    // prepare addition with loop-unrolling 8
    mov     R8, RCX;
    and     R8, 7;
    shr     RCX, 3;
    inc     RCX;
    clc;
    jmp     [@Table+8*R8];

    align   8;
    @Table:
    dq      @lAddZero;
    dq      @lAddOne;
    dq      @lAddTwo;
    dq      @lAddThree;
    dq      @lAddFour;
    dq      @lAddFive;
    dq      @lAddSix;
    dq      @lAddSeven;

    // add in chunks of 8 limb
  @AddLongLoop:

    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddSeven:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddSix:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddFive:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddFour:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddThree:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddTwo:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddOne:
    lodsq;
    adc     RAX, qword ptr [RBX+RDI];
    stosq;
  @lAddZero:
    loop    @AddLongLoop;

--- End quote ---

I know that this is a bit extreme - but it is valid and not even self modifying code  ;)

The documentation unfortunately states that "offset" is not supported for Intel style Assembler. Consequently the compiler only generates a 32bit relocation info for the entries in @Table - making the code crash if not run in the low 4GByte of system memory. This is a bit unfortunate and I would like to know

 a - is there any Intention to lift this limitation in the near furture?
 b - is there maybe an alternative way to do this - e.g. I thought about moving @Table to the general variable space, but the labels are local to the procedure scope so no luck  :(

Regards,
MathMan

marcov:

See http://bugs.freepascal.org/view.php?id=26555

MathMan:

--- Quote from: marcov on September 24, 2014, 10:38:00 pm ---
See http://bugs.freepascal.org/view.php?id=26555

--- End quote ---
Hi Marcov,

This seems to not the same as I am talking about. The bug report is about accessing global variables from inline Assembler - I don't do that. What I am doing is acessing a local variable (via it's offset). It might be though that both boil down to the same issue within the compiler ...

Beside - I think it was me also triggering that bug report with my post "Linker issue"  :)

Regards,
Jens

Jonas Maebe:
Your code is invalid, you cannot use 64 bit displacements in arbitrary instructions. See e.g. http://www.nasm.us/doc/nasmdo11.html . You will have to use RIP-relative addressing, which in turn does not support indexed accesses.

MathMan:

--- Quote from: Jonas Maebe on September 25, 2014, 09:49:28 am ---Your code is invalid, you cannot use 64 bit displacements in arbitrary instructions. See e.g. http://www.nasm.us/doc/nasmdo11.html . You will have to use RIP-relative addressing, which in turn does not support indexed accesses.

--- End quote ---

Ok, I see your point - I was too fast scanning the Intel documentation and missed the fact that the displacements are pretty much limited to 32 bit in 64 bit mode too  :(. However shouldn't


--- Quote ---lea     RAX, @Table;
jmp     [RAX+8*R8];

--- End quote ---

do the trick then? The Intel doc states that lea can calculate a 64 bit displacement (operand size & address size set 64 bit meaning only the REX.W prefix is required) - but the compiler still generates a 32 bit offset. Am I missing something too here? I used lea because the obvious choice


--- Quote ---mov     RAX, offset @Table;
jmp     [RAX+8*R8];

--- End quote ---

is not supported by the inline assembler and the programmers reference states that lea should be used instead.

Kind regards,
MathMan

Navigation

[0] Message Index

[#] Next page

Go to full version