<warning: feature is for freaks>
It is a known fact that if the nested procedure accesses local variables declared above it, it is not optimal. Make sure you align the stack ({$codealign localmin=<usually native pointer size>}
If you declare a procedure outside and declare it isnested it is more optimized. If it does not use any local parent variables at all performance is maximal. As is a local procedure that takes parameters and is declared above the local variables.. Note Isnested needs {$modeswitch nestedprocvars}, see the user manual.
FPC can at least do something about it. In e.g. Delphi there are the same speed problems with local procedures declared after local variables and they can't be fixed!
The cause is that local procedures (entry points at least) are also allocated on the stack and the stack is byte aligned by default. That means you can also pad the stack, but the localmin will do that for you.
Thus you can make sure the procedure is called on a natural boundary. Try it!, this has huge effect. IsNested works afaik similar, but takes some getting used to and access to local vars is difficult.
This may be not totally correct, but it is close and aligning the stack works magic.
For beginners on the subject or as a guideline:
32 bit platform: {$codealign localmin=4}
64 bit platform: {$codealign localmin=8}
Codealign is a local switch, so can be used with {$push}/{$pop} on a very finely grained basis.