Don't forget to flag it as inline, and configure the compiler (lazarus menus or cmdline) to inline
As why it is slower, if you translate foreign concepts into a different language, there is a chance that it is not the best option for that language.
It might also be unrelated parts (like stdio being slower by default, or memory management being more conservative, thus growing arrays in small increments is slower).
Benchmarking is an art. Once we had a fairly major issue with a large scale FPC user saying the code was slower than Kylix/Delphi. Turned out to be that differences in the speed of random() used in the tests that created the main difference, not the algorithm/codegeneration.
(FPC has a higher quality but slower random)