@cpicanco
Yes, we know about this video. It is very funny, but with some quirks / oversimplifications.
This is why benchmarks, especially micro benchmarks, are not very reliable.
TLWR: Only real profiling makes sense to find the bottlenecks, not guessing.
In mORMot, we mostly optimized the core from actual performance data taken on production sites, doing real work.
The mormot.core.log unit has built-in features to measure time elapsed on real work, and make true profiling.
And our regression tests are not just unit tests, but they have a lot of real-world-similar scenarios, to have some good hint about potential performance regressions (or optimizations).
For FPC, the fact that its compiler and linker parts are simpler/less optimized than gcc/llvm counterparts makes it less likely to suffer from layout biases.
But, with the challenge involved in this forum thread, when you come from 26MB/s to 900MB/s, or when you see 10 times less memory usage running the benchmark on this thread, it is fair to say that the comparison means something.