Forum > General
Turbo Pascal 6.0, 80386 assembly language, Free Pascal, 64-bit CPU
Rick314:
Thank you for your helpful replies. I am looking into FPC assembly support and updating all the 80286 assembly code in my original program.
Rick314:
I am having a hard time getting started with Free Pascal and 80386 assembly language, after reading what I can find about both subjects. In the way of an example for future readers like me, could someone please convert the 7-line for loop in the attached test program to assembly? I need to see an example of accessing variables declared in Free Pascal and using pointers.
Instructions: Change the name of the attached program from .txt to .pas. Compile it and run the program to verify you get the output shown in the comments. Change "{ $DEFINE ASSEMBLY }" to {$DEFINE ASSEMBLY}, put assembly code at "ASSEMBLY GOES HERE" to do the same as the for loop, and post the new file. Thank you!
engkin:
If you pass -a to the compiler you get the assembly file for your Pascal source. Changing it to -al would instruct the compiler to include the source code lines as well. To get Intel dialect I requested using Microsoft Assembler using -Amasm. These options are mentioned here.
The compiler calls fpc_mod_qword and fpc_div_qword from unit system. To be able to call them, I reintroduced them at the top with corrected param order.
To deal with 64-bit variables like: Div64 : UInt64 in 32-bit assembly, I added its two halfs as two variables:
--- Code: ---var
..
Div64L : UInt32;
Div64H : UInt32;
Div64 : UInt64 absolute Div64L;
--- End code ---
notice how Div64 covers the same address space as its halves by using absolute.
marcov:
The core reason to do it in assembler is that a 64-bit / 32-bt with a 32-bit result and remainder is faster. Note that the result MUST fit in 32-bit though. Just pasting the compiler doesn't exploit that fact. (the same as in TP btw, but 32-bit/ 16-bit with 16 -bit result and rem)
An intrinsic for that option (or, slightly less optimal, a math.divmod with 64/32/32 options) would maybe ease the need for using assembler .
So I did a quick conversion. I'm by no means an assembler wizard. The whole function is optimized for the current calling convention (so wholly assembler, not just a block that loads from local variables)
I used pretty much most of the techniques Engkin describes btw.
marcov:
Note that xor eax,eax // result of shl-32
can be deleted.
I attached a variant that uses stosd/losd for comparison with the old version.
IIRC those instructions generally aren't faster without rep anymore though.
Navigation
[0] Message Index
[#] Next page
[*] Previous page