Here you are (unix only) 
Thanks tetrastes, that is sure the smoking gun !
For those following along from home, tetrasts' code delivers (for 100 million) a substantial speed up (but the wrong answer if you total them up) -
real 0m1.017s
user 0m1.017s
sys 0m0.000s
Significantly, tetrasts's method can produces what might be the right answers with a hybrid system ! These two blocks call the m library directly using single precision but either Double or Single accumulator. You will notice that using a Single accumulator is only useful up to 100k iterations !
Double - Iterations =1E5 total=5.14395E+04 2mS
Double - Iterations =1E6 total=5.14395E+05 12mS
Double - Iterations =1E7 total=5.14395E+06 115mS
Double - Iterations =1E8 total=5.14395E+07 1.15 S
Double - Iterations =1E9 total=5.14395E+08 12.3 S // 1,000,000,000 iterations
Single - Iterations =1E5 total=5.14633E+04 2mS // 100k iterations, correct total
Single - Iterations =1E6 total=5.07848E+05 11mS
Single - Iterations =1E7 total=5.00785E+06 0.1 S
Single - Iterations =1E8 total=1.67772E+07 1 S
Single - Iterations =1E9 total=1.67772E+07 10.4 S // 1,000,000,000 but total is wrong !
For reference, "uses math", Double and 100 Million iterations (1E8) gives the right answer but takes 6 seconds.
Conclusion FPC "math" might not do Single Precision in any useful manner depending on your app. If you need that speed up and are really, really certain you can accept its limits (the compiler does not check for float overflow), go directly to the libm.so. Maybe consider using Single for calcs but keep (very big or very small) totals in a Double ?
Davo