In the case of FPC vs C it is also a matter of platform: x86_64 and i386 have more optimizations than other platforms.
Another note is that C often has more aggresive default settimgs, whereas FPC is conservative by default.
So you must also try and test with the same platform and CPU/FPU instruction sets. This can make a big difference here, e.g like copy operations using SSEx instead of on the CPU. The former is what C often selects, whereas FPC selects the CPU by default but CAN use SSEx)