Forum > General
Turbo Pascal 6.0, 80386 assembly language, Free Pascal, 64-bit CPU
Rick314:
Thank you for your help -- I have just what I want and realize I could have done better with the problem statement.
I forgot to mention I modified my fpc.cfg file with "-Mtp" for Turbo Pascal compatibility and you had to discover that. Also sorry I didn't make it clear I wanted the "lodsd, div ebx, stosd" fast loop at the heart of the procedure. This was shown in TEST1.TXT in my first post of the thread (modified with "db $66" opcodes as required in Turbo Pascal 6.0), and is in marcov's test_div2.txt. That speed gain is the main motivation for going to assembly. In TEST1.TXT I added comments justifying coding it 16 times inline to reduce the impact of the branch instruction, gaining another 20% speed improvement. This procedure is part of an "approximate e (2.718...) to 1,000,000 decimal places" program with most time spent doing those 3 assembly language instructions. I should have said all that, and thanks again.
marcov:
--- Quote from: Rick314 on February 28, 2015, 07:35:05 pm ---I forgot to mention I modified my fpc.cfg file with "-Mtp" for Turbo Pascal compatibility and you had to discover that.
--- End quote ---
Don't worry. You learn about such things after almost 17 years of FPC :-)
--- Quote ---Also sorry I didn't make it clear I wanted the "lodsd, div ebx, stosd" fast loop at the heart of the procedure.
--- End quote ---
Be careful. As said lodsd/stosd USED to be the fast way. However starting with Pentium-I, optimization rules changed. Be sure to benchmark my original vs my -2 to make sure that it is actually faster.
Rick314:
--- Quote from: marcov on February 28, 2015, 03:33:59 pm ---I attached a variant that uses stosd/losd for comparison with the old version.
IIRC those instructions generally aren't faster without rep anymore though.
--- End quote ---
--- Quote from: marcov on February 28, 2015, 08:09:01 pm ---Be careful. As said lodsd/stosd USED to be the fast way. However starting with Pentium-I, optimization rules changed. Be sure to benchmark my original vs my -2 to make sure that it is actually faster.
--- End quote ---
Please clarify what you meant by "...generally aren't faster without rep anymore though." (rep?)
I will benchmark both versions. Thanks again for your help.
marcov:
--- Quote from: Rick314 on February 28, 2015, 09:44:58 pm --- Please clarify what you meant by "...generally aren't faster without rep anymore though." (rep?)
--- End quote ---
I mean that afaik lods<x> and stos<x> use without the "rep;" prefix is generally discouraged out of
performance considerations for processors after the 486.
There are (rare) exceptions though, specially in some AMD processors, where some combinations with short opcodes avoids a decoding bottleneck.
Navigation
[0] Message Index
[*] Previous page