@engkin: Lol. Yes, that assembly version of the function DoIt definitely speeded it up! It isn't really an equivalent program. But, it was fun to try it out.
@Jonas Maebe: You were right, targeting the CPU with "-Cppentium4" did not make much of a change, if any. The "-Cfsse2" compiler command-line option is equivalent to the in-code compiler directive "{$FPUTYPE SSE2}", isn't it? For testing, I like to create different source versions so I can keep track of which executable is which. So, I used the directive instead of the command-line option.
Using that directive did not increase performance over not using it; 15.552s and 15.583s, respectively. I think that may be due what
http://www.freepascal.org/docs-html/prog/progsu92.html#x100-990001.3.9 says about the $E switch, "Under linux and most unix’es, the kernel takes care of the coprocessor support, so this switch is not necessary on those platforms." Wouldn't that also mean that targeting the co-processor would also not be necessary?
The biggest performance gain was obtained by changing the divide by 2 into a multiply by 0.5: from 21.306s down to 15.583s. (I had my browser running during my initial tests.
) And I also changed that in the Gambas program (which made no appreciable difference there), so it is still equivalent.
That brings the ratio to 1.37:1, whereas the other Gambas user's performance ratio would be 1.29:1. I think that's pretty darn close. And it could be that the difference is amplified on my system due to how relatively slow it is overall.
Thank you everyone. I have learned a few things about code optimization and compiler options.
I still have a couple of questions, though.
1. Why is floating point division slower than floating point multiplication? I'm not sure how it could be, but is it related to the direction of bit-shifting in the registers?
2. In the optimized code that DelphiFreak posted, the variable "result" was not declared in the "var" section prior to being used in the main body of function DoIt. FPC did not complain and the program executed without errors. Why is that? I thought that all vars had to be declared before use.