Forum > FPC development
Limited relocation support in inline assembler
MathMan:
Hi all,
I am currently developing a Multi precision arithmetic package for FPC using LCL 1.2.4, FPC 2.6.4., Win 7/64 bit on an Intel processor.
For Speed reasons I would like to do the following
--- Quote --- // prepare addition with loop-unrolling 8
mov R8, RCX;
and R8, 7;
shr RCX, 3;
inc RCX;
clc;
jmp [@Table+8*R8];
align 8;
@Table:
dq @lAddZero;
dq @lAddOne;
dq @lAddTwo;
dq @lAddThree;
dq @lAddFour;
dq @lAddFive;
dq @lAddSix;
dq @lAddSeven;
// add in chunks of 8 limb
@AddLongLoop:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddSeven:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddSix:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddFive:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddFour:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddThree:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddTwo:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddOne:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddZero:
loop @AddLongLoop;
--- End quote ---
I know that this is a bit extreme - but it is valid and not even self modifying code ;)
The documentation unfortunately states that "offset" is not supported for Intel style Assembler. Consequently the compiler only generates a 32bit relocation info for the entries in @Table - making the code crash if not run in the low 4GByte of system memory. This is a bit unfortunate and I would like to know
a - is there any Intention to lift this limitation in the near furture?
b - is there maybe an alternative way to do this - e.g. I thought about moving @Table to the general variable space, but the labels are local to the procedure scope so no luck :(
Regards,
MathMan
marcov:
See http://bugs.freepascal.org/view.php?id=26555
MathMan:
--- Quote from: marcov on September 24, 2014, 10:38:00 pm ---
See http://bugs.freepascal.org/view.php?id=26555
--- End quote ---
Hi Marcov,
This seems to not the same as I am talking about. The bug report is about accessing global variables from inline Assembler - I don't do that. What I am doing is acessing a local variable (via it's offset). It might be though that both boil down to the same issue within the compiler ...
Beside - I think it was me also triggering that bug report with my post "Linker issue" :)
Regards,
Jens
Jonas Maebe:
Your code is invalid, you cannot use 64 bit displacements in arbitrary instructions. See e.g. http://www.nasm.us/doc/nasmdo11.html . You will have to use RIP-relative addressing, which in turn does not support indexed accesses.
MathMan:
--- Quote from: Jonas Maebe on September 25, 2014, 09:49:28 am ---Your code is invalid, you cannot use 64 bit displacements in arbitrary instructions. See e.g. http://www.nasm.us/doc/nasmdo11.html . You will have to use RIP-relative addressing, which in turn does not support indexed accesses.
--- End quote ---
Ok, I see your point - I was too fast scanning the Intel documentation and missed the fact that the displacements are pretty much limited to 32 bit in 64 bit mode too :(. However shouldn't
--- Quote ---lea RAX, @Table;
jmp [RAX+8*R8];
--- End quote ---
do the trick then? The Intel doc states that lea can calculate a 64 bit displacement (operand size & address size set 64 bit meaning only the REX.W prefix is required) - but the compiler still generates a 32 bit offset. Am I missing something too here? I used lea because the obvious choice
--- Quote ---mov RAX, offset @Table;
jmp [RAX+8*R8];
--- End quote ---
is not supported by the inline assembler and the programmers reference states that lea should be used instead.
Kind regards,
MathMan
Navigation
[0] Message Index
[#] Next page