Forum > General

Turbo Pascal 6.0, 80386 assembly language, Free Pascal, 64-bit CPU

<< < (2/3) > >>

Thank you for your helpful replies.  I am looking into FPC assembly support and updating all the 80286 assembly code in my original program.

I am having a hard time getting started with Free Pascal and 80386 assembly language, after reading what I can find about both subjects.  In the way of an example for future readers like me, could someone please convert the 7-line for loop in the attached test program to assembly?  I need to see an example of accessing variables declared in Free Pascal and using pointers.

Instructions: Change the name of the attached program from .txt to .pas.  Compile it and run the program to verify you get the output shown in the comments.  Change "{ $DEFINE ASSEMBLY }" to {$DEFINE ASSEMBLY}, put assembly code at "ASSEMBLY GOES HERE" to do the same as the for loop, and post the new file.  Thank you!

If you pass -a to the compiler you get the assembly file for your Pascal source. Changing it to -al would instruct the compiler to include the source code lines as well. To get Intel dialect I requested using Microsoft Assembler using -Amasm. These options are mentioned here.

The compiler calls fpc_mod_qword and fpc_div_qword from unit system. To be able to call them, I reintroduced them at the top with corrected param order.

To deal with 64-bit variables like: Div64 : UInt64 in 32-bit assembly, I added its two halfs as two variables:

--- Code: ---var
   Div64L : UInt32;
   Div64H : UInt32;
   Div64  : UInt64 absolute Div64L;

--- End code ---
notice how Div64 covers the same address space as its halves by using absolute.

The core reason to do it in assembler is that a 64-bit / 32-bt with a 32-bit result and remainder is faster. Note that the result MUST fit in 32-bit though. Just pasting the compiler doesn't exploit that fact. (the same as in TP btw, but 32-bit/ 16-bit with 16 -bit result and rem)

An intrinsic for that option (or, slightly less optimal, a math.divmod with 64/32/32 options) would maybe ease the need for using assembler .

So I did a quick conversion. I'm by no means an assembler wizard. The whole function is optimized for the current calling convention (so wholly assembler, not just a block that loads from local variables)

I used pretty much most of the techniques Engkin describes btw.

Note that       xor eax,eax                         // result of shl-32      
can be deleted.

I attached a variant that uses stosd/losd for comparison with the old version.

IIRC those instructions generally aren't faster without rep anymore though.


[0] Message Index

[#] Next page

[*] Previous page

Go to full version