You are comparing global vs local vars.
Rewrite the test.
Case 1: your nested
Case 2: all nested are nested level 1 directly in OuterProcedure
That means in deep nested: ("TEstIf")
# [67] OuterVariable1 := i;
movq -8(%rbp),%rax
movq -8(%rax),%rax
movq -8(%rax),%rax
movq -8(%rax),%rax
movq -16(%rbp),%rdx
movq %rdx,-8(%rax)
And in flat nested
# [171] OuterVariable1 := i;
movq -8(%rbp),%rax
movq -16(%rbp),%rdx
movq %rdx,-8(%rax)
reducing the "movq -8(%rbp),%rax" to one quarter.
FPC 3.3.1 / 64 bit windows
deep nest / flat nest
-O1 2.6 seconds / 1.7 seconds
-O4 1.9 / 0.7
FPC 3.0.4 / 64 bit windows
-O1 2.6 seconds / 1.8 seconds
Still a difference.
It about half as fast (factor 2.5 with O4). Rather that factor ~17.
Of course it keeps one level of stack walking. But even if that was eliminated, I doubt that makes factor 10 ?
No idea why you get such different numbers
Your un-nested (global var), on my machine
FPC 3.0.4 / 64 bit windows
-O1 1.8
-O4 0.7
So that is about the same speed as all procs with 1 level nested.
- all tests done with BILLION
- intel I7 8700K
- timings by gettickcount64
- all runs done twice or more, to spot any exceptions (none found)