Hi all,
I am currently developing a Multi precision arithmetic package for FPC using LCL 1.2.4, FPC 2.6.4., Win 7/64 bit on an Intel processor.
For Speed reasons I would like to do the following
// prepare addition with loop-unrolling 8
mov R8, RCX;
and R8, 7;
shr RCX, 3;
inc RCX;
clc;
jmp [@Table+8*R8];
align 8;
@Table:
dq @lAddZero;
dq @lAddOne;
dq @lAddTwo;
dq @lAddThree;
dq @lAddFour;
dq @lAddFive;
dq @lAddSix;
dq @lAddSeven;
// add in chunks of 8 limb
@AddLongLoop:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddSeven:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddSix:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddFive:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddFour:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddThree:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddTwo:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddOne:
lodsq;
adc RAX, qword ptr [RBX+RDI];
stosq;
@lAddZero:
loop @AddLongLoop;
I know that this is a bit extreme - but it is valid and not even self modifying code
The documentation unfortunately states that "offset" is not supported for Intel style Assembler. Consequently the compiler only generates a 32bit relocation info for the entries in @Table - making the code crash if not run in the low 4GByte of system memory. This is a bit unfortunate and I would like to know
a - is there any Intention to lift this limitation in the near furture?
b - is there maybe an alternative way to do this - e.g. I thought about moving @Table to the general variable space, but the labels are local to the procedure scope so no luck
Regards,
MathMan