Recent

Author Topic: Optimization suggestion for nested variable access  (Read 9699 times)

440bx

  • Hero Member
  • *****
  • Posts: 3944
Optimization suggestion for nested variable access
« on: May 24, 2020, 02:41:13 am »
Hello,

Following the discussion in this thread, https://forum.lazarus.freepascal.org/index.php/topic,49459.0.html
I looked into FPC's "stack flattening" abilities.  To make a short story really short: FPC does not "flatten" nested procedure/function stacks (that's no surprise, no one claimed it did.)

However, there is a simple optimization in that case that FPC could do.  Consider the following small program:
Code: Pascal  [Select][+][-]
  1. {$APPTYPE CONSOLE}
  2.  
  3. program TestNestedVariableAccess;   { level 1 }
  4.  
  5.   procedure OuterProcedure();       { level 2 }
  6.   var
  7.     OuterVariable1, OuterVariable2, OuterVariable3 : integer;
  8.  
  9.     procedure Level3();             { level 3 }
  10.  
  11.       procedure Level4();           { level 4 }
  12.  
  13.         procedure Level5();         { level 5 }
  14.  
  15.           procedure Level6();       { level 6 }
  16.           begin
  17.             writeln('Level6');
  18.  
  19.             OuterVariable1 := 1;
  20.             OuterVariable2 := 2;
  21.             OuterVariable3 := 3;
  22.  
  23.             writeln('OuterVariable1 : ', OuterVariable1);
  24.             writeln('OuterVariable2 : ', OuterVariable2);
  25.             writeln('OuterVariable3 : ', OuterVariable3);
  26.           end;
  27.  
  28.         begin
  29.           writeln('Level5');
  30.           Level6();
  31.         end;
  32.  
  33.       begin
  34.         writeln('Level4');
  35.         Level5();
  36.       end;
  37.  
  38.     begin
  39.       writeln('level 3');
  40.       Level4();
  41.     end;
  42.  
  43.   begin
  44.     Level3();
  45.   end;
  46.  
  47.  
  48. begin
  49.   OuterProcedure();
  50. end.                    
When compiled with -O4 (64bit, Windows target), it produces the following code (abbreviated):
Code: ASM  [Select][+][-]
  1. TestNestedVariableAccess.lpr:19           OuterVariable1 := 1;
  2.                                           ; walk the frame pointers
  3. 00000001000015A9 488b442420               mov    0x20(%rsp),%rax
  4. 00000001000015AE 488b4020                 mov    0x20(%rax),%rax
  5. 00000001000015B2 488b4020                 mov    0x20(%rax),%rax
  6. 00000001000015B6 488b4020                 mov    0x20(%rax),%rax
  7. 00000001000015BA c7402001000000           movl   $0x1,0x20(%rax)
  8.  
  9. TestNestedVariableAccess.lpr:20           OuterVariable2 := 2;
  10.                                           ; walk the frame pointers (again)
  11. 00000001000015C1 488b442420               mov    0x20(%rsp),%rax
  12. 00000001000015C6 488b4020                 mov    0x20(%rax),%rax
  13. 00000001000015CA 488b4020                 mov    0x20(%rax),%rax
  14. 00000001000015CE 488b4020                 mov    0x20(%rax),%rax
  15. 00000001000015D2 c7402802000000           movl   $0x2,0x28(%rax)
  16.  
  17. TestNestedVariableAccess.lpr:21           OuterVariable3 := 3;
  18.                                           ; walk the frame pointers (and again - a road well traveled by now!)
  19. 00000001000015D9 488b442420               mov    0x20(%rsp),%rax
  20. 00000001000015DE 488b4020                 mov    0x20(%rax),%rax
  21. 00000001000015E2 488b4020                 mov    0x20(%rax),%rax
  22. 00000001000015E6 488b4020                 mov    0x20(%rax),%rax
  23. 00000001000015EA c7403003000000           movl   $0x3,0x30(%rax)
  24.  
Basically, the compiler is walking the stack chain for every variable reference in that scope.  It could walk it once upon entry to the Level6() procedure to have a pointer to that target stack frame of interest (in this case, OuterProcedure()'s stack frame) and use that pointer instead of walking the stack for every reference.

I just thought I'd mention it.  It's a fairly simple optimization.

« Last Edit: May 24, 2020, 02:44:29 am by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

fcu

  • Jr. Member
  • **
  • Posts: 89
Re: Optimization suggestion for nested variable access
« Reply #1 on: May 24, 2020, 07:03:42 am »
i think fpc couldn't inlined nested proc/func , so stack frame is always built in each call   

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Optimization suggestion for nested variable access
« Reply #2 on: May 24, 2020, 07:45:07 am »
i think fpc couldn't inlined nested proc/func , so stack frame is always built in each call
The fact that it didn't inline is fine.  The problem is that the compiler walks the stack frames more than is necessary.  One walk at procedure entry is all it should need, not one per reference.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Optimization suggestion for nested variable access
« Reply #3 on: May 24, 2020, 02:10:57 pm »
Hello,

Following the discussion in this thread, https://forum.lazarus.freepascal.org/index.php/topic,49459.0.html
I looked into FPC's "stack flattening" abilities.  To make a short story really short: FPC does not "flatten" nested procedure/function stacks (that's no surprise, no one claimed it did.)

However, there is a simple optimization in that case that FPC could do.  Consider the following small program:
Code: Pascal  [Select][+][-]
  1. {$APPTYPE CONSOLE}
  2.  
  3. program TestNestedVariableAccess;   { level 1 }
  4.  
  5.   procedure OuterProcedure();       { level 2 }
  6.   var
  7.     OuterVariable1, OuterVariable2, OuterVariable3 : integer;
  8.  
  9.     procedure Level3();             { level 3 }
  10.  
  11.       procedure Level4();           { level 4 }
  12.  
  13.         procedure Level5();         { level 5 }
  14.  
  15.           procedure Level6();       { level 6 }
  16.           begin
  17.             writeln('Level6');
  18.  
  19.             OuterVariable1 := 1;
  20.             OuterVariable2 := 2;
  21.             OuterVariable3 := 3;
  22.  
  23.             writeln('OuterVariable1 : ', OuterVariable1);
  24.             writeln('OuterVariable2 : ', OuterVariable2);
  25.             writeln('OuterVariable3 : ', OuterVariable3);
  26.           end;
  27.  
  28.         begin
  29.           writeln('Level5');
  30.           Level6();
  31.         end;
  32.  
  33.       begin
  34.         writeln('Level4');
  35.         Level5();
  36.       end;
  37.  
  38.     begin
  39.       writeln('level 3');
  40.       Level4();
  41.     end;
  42.  
  43.   begin
  44.     Level3();
  45.   end;
  46.  
  47.  
  48. begin
  49.   OuterProcedure();
  50. end.                    
When compiled with -O4 (64bit, Windows target), it produces the following code (abbreviated):
Code: ASM  [Select][+][-]
  1. TestNestedVariableAccess.lpr:19           OuterVariable1 := 1;
  2.                                           ; walk the frame pointers
  3. 00000001000015A9 488b442420               mov    0x20(%rsp),%rax
  4. 00000001000015AE 488b4020                 mov    0x20(%rax),%rax
  5. 00000001000015B2 488b4020                 mov    0x20(%rax),%rax
  6. 00000001000015B6 488b4020                 mov    0x20(%rax),%rax
  7. 00000001000015BA c7402001000000           movl   $0x1,0x20(%rax)
  8.  
  9. TestNestedVariableAccess.lpr:20           OuterVariable2 := 2;
  10.                                           ; walk the frame pointers (again)
  11. 00000001000015C1 488b442420               mov    0x20(%rsp),%rax
  12. 00000001000015C6 488b4020                 mov    0x20(%rax),%rax
  13. 00000001000015CA 488b4020                 mov    0x20(%rax),%rax
  14. 00000001000015CE 488b4020                 mov    0x20(%rax),%rax
  15. 00000001000015D2 c7402802000000           movl   $0x2,0x28(%rax)
  16.  
  17. TestNestedVariableAccess.lpr:21           OuterVariable3 := 3;
  18.                                           ; walk the frame pointers (and again - a road well traveled by now!)
  19. 00000001000015D9 488b442420               mov    0x20(%rsp),%rax
  20. 00000001000015DE 488b4020                 mov    0x20(%rax),%rax
  21. 00000001000015E2 488b4020                 mov    0x20(%rax),%rax
  22. 00000001000015E6 488b4020                 mov    0x20(%rax),%rax
  23. 00000001000015EA c7403003000000           movl   $0x3,0x30(%rax)
  24.  
Basically, the compiler is walking the stack chain for every variable reference in that scope.  It could walk it once upon entry to the Level6() procedure to have a pointer to that target stack frame of interest (in this case, OuterProcedure()'s stack frame) and use that pointer instead of walking the stack for every reference.

I just thought I'd mention it.  It's a fairly simple optimization.

Looks like the compiler is putting the CPU on an exercise program.

Why does it repeat the same code over and over... ?

This looks like the example I presented with multiple variables being initialized.

Most likely the compiler would generate different code if those variables were actually, but who knows..

Responses should be interesting  %)

The only true wisdom is knowing you know nothing

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: Optimization suggestion for nested variable access
« Reply #4 on: May 24, 2020, 03:13:46 pm »
Seems a task for CSE. Either the CSE misses it, or this is not within the scope of the CSE.

Delphi doesn't optimize this either.

jamie

  • Hero Member
  • *****
  • Posts: 6090
Re: Optimization suggestion for nested variable access
« Reply #5 on: May 24, 2020, 04:36:54 pm »
Seems a task for CSE. Either the CSE misses it, or this is not within the scope of the CSE.

Delphi doesn't optimize this either.
Just ran that through D, the output code is totally different.
The only true wisdom is knowing you know nothing

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Optimization suggestion for nested variable access
« Reply #6 on: May 24, 2020, 07:38:33 pm »
It could walk it once upon entry to the Level6() procedure to have a pointer to that target stack frame of interest (in this case, OuterProcedure()'s stack frame) and use that pointer instead of walking the stack for every reference.

I just thought I'd mention it.  It's a fairly simple optimization.

If my memory serves me correctly, that sort of frame cache is sometimes referred to as a "display". I think I've used a (Modula-2) compiler that did this since some targets weren't as good at frame chasing as x86, and I believe I've seen it in hardware as well.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

BeniBela

  • Hero Member
  • *****
  • Posts: 905
    • homepage
Re: Optimization suggestion for nested variable access
« Reply #7 on: May 24, 2020, 07:58:45 pm »
Fpc always sucks at optimization

Seems a task for CSE. Either the CSE misses it, or this is not within the scope of the CSE.

Delphi doesn't optimize this either.
Just ran that through D, the output code is totally different.

I just ran it through my Delphi 4

It gives the same construction

So at least fpc is not worse than a 20 year old compiler

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Optimization suggestion for nested variable access
« Reply #8 on: May 24, 2020, 08:23:45 pm »
If my memory serves me correctly, that sort of frame cache is sometimes referred to as a "display". I think I've used a (Modula-2) compiler that did this since some targets weren't as good at frame chasing as x86, and I believe I've seen it in hardware as well.
You got that right.  Some machines had hardware support to keep track of the display (a rather nice feature to have in hardware.)

Without hardware support, walking the chain of stack frames is usually less overhead than maintaining a complete display in software (N. Wirth ran into that.)  When doing it in software, which is what FPC does, it's important for the compiler to implement a limited display structure (a simple cache of selected stack frame pointers) to prevent having to walk the stack chain for every variable reference.  That's a simple optimization too and, depending on the number of references and the nesting depth, it can make a noticeable difference.


I just ran it through my Delphi 4
It gives the same construction
So at least fpc is not worse than a 20 year old compiler
This is speculation but, I suspect that part of the reason that PC class Pascal compilers don't optimize nested stack frame accesses is because of OOP.    I believe (just a belief) that nested functions/procedures became less common when Pascal got "OOP-ed".
« Last Edit: May 24, 2020, 08:25:18 pm by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Optimization suggestion for nested variable access
« Reply #9 on: May 24, 2020, 10:13:34 pm »
If my memory serves me correctly, that sort of frame cache is sometimes referred to as a "display". I think I've used a (Modula-2) compiler that did this since some targets weren't as good at frame chasing as x86, and I believe I've seen it in hardware as well.
You got that right.  Some machines had hardware support to keep track of the display (a rather nice feature to have in hardware.)

If my memory is correct, I've seen one with a 6' square of LEDs showing the display state. I think the idea was that since it changed somewhat more slowly than the actual CPU registers etc. it was useful eyecandy.

http://www.cs.berkeley.edu/~culler/courses/cs252-s05/papers/burroughs.pdf figure 5 etc.

http://users.monash.edu.au/~ralphk/burroughs.html (note the size of the disc drive platter halfway down that page)

Quote
Without hardware support, walking the chain of stack frames is usually less overhead than maintaining a complete display in software (N. Wirth ran into that.)  When doing it in software, which is what FPC does, it's important for the compiler to implement a limited display structure (a simple cache of selected stack frame pointers) to prevent having to walk the stack chain for every variable reference.  That's a simple optimization too and, depending on the number of references and the nesting depth, it can make a noticeable difference.

Looking at it from a slightly different viewpoint, it would be interesting to know how well the CPU caches the stack, and whether it could be encouraged to handle a per-thread block earmarked as the  display in the most effective way possible.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 11382
  • FPC developer.
Re: Optimization suggestion for nested variable access
« Reply #10 on: May 24, 2020, 10:47:27 pm »
This is speculation but, I suspect that part of the reason that PC class Pascal compilers don't optimize nested stack frame accesses is because of OOP.    I believe (just a belief) that nested functions/procedures became less common when Pascal got "OOP-ed".

Well. Delphi maybe (*), but FPC existed for years as non-OOP compiler. And D is also OOP afaik, so why doesn't it hold there?

While there is some truth in it, it is an overgeneralization

(*) one should wonder however what the 32-bit Delphi compiler (backend) was based on though. It might predate the Delphi dialect, technology wise.

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Optimization suggestion for nested variable access
« Reply #11 on: May 25, 2020, 12:18:23 am »
http://www.cs.berkeley.edu/~culler/courses/cs252-s05/papers/burroughs.pdf figure 5 etc.

http://users.monash.edu.au/~ralphk/burroughs.html (note the size of the disc drive platter halfway down that page)
Good stuff in that pdf.  The HTML page was a nice walk down the binary memory lane. :)

Looking at it from a slightly different viewpoint, it would be interesting to know how well the CPU caches the stack, and whether it could be encouraged to handle a per-thread block earmarked as the  display in the most effective way possible.
Yes, that is definitely something worth knowing.  To get an idea, I wrote the following two simple test programs:

Simple if (no nested accesses):
Code: Pascal  [Select][+][-]
  1. {$APPTYPE CONSOLE}
  2.  
  3. program TestSimpleIf;
  4.   { program to test the time consumed to test against some limits             }
  5.  
  6.   const
  7.     MILLION1   =   1000000;
  8.     MILLION10  =  10000000;
  9.     MILLION100 = 100000000;
  10.  
  11.     BILLION    = 1000 * MILLION1;
  12.  
  13.     LOOP_LIMIT = BILLION;
  14.  
  15.   var
  16.     VariableLimit                   : qword;
  17.  
  18.     Variable1, Variable2, Variable3 : qword;
  19.  
  20.   procedure TestIf();
  21.   var
  22.     i : qword;
  23.  
  24.   begin
  25.     writeln('constant limit, global variables - Loop STARTING');
  26.  
  27.     for i := 1 to LOOP_LIMIT do
  28.     begin
  29.       Variable1 := i;
  30.       Variable2 := i;
  31.       Variable3 := i;
  32.  
  33.       if i >= LOOP_LIMIT then
  34.       begin
  35.  
  36.         writeln('constant limit, global variables - Loop DONE ', i);
  37.         writeln;
  38.       end;
  39.     end;
  40.   end;
  41.  
  42.   procedure TestIfVariableLimit();
  43.   var
  44.     i : qword;
  45.  
  46.   begin
  47.     writeln('Variable limit, global variables - Loop STARTING');
  48.  
  49.     for i := 1 to VariableLimit do
  50.     begin
  51.       Variable1 := i;
  52.       Variable2 := i;
  53.       Variable3 := i;
  54.  
  55.       if i >= VariableLimit then
  56.       begin
  57.         writeln('Variable limit, global variables - Loop DONE ', i);
  58.         writeln;
  59.       end;
  60.     end;
  61.   end;
  62.  
  63.   procedure TestIfVariableLimitLocal();
  64.   var
  65.     i : qword;
  66.  
  67.   begin
  68.     writeln('Variable limit, local variables - Loop STARTING');
  69.  
  70.     for i := 1 to VariableLimit do
  71.     begin
  72.       Variable1 := i;
  73.       Variable2 := i;
  74.       Variable3 := i;
  75.  
  76.       if i >= VariableLimit then
  77.       begin
  78.         writeln('Variable limit, local variables - Loop DONE ', i);
  79.         writeln;
  80.       end;
  81.     end;
  82.   end;
  83.  
  84.  
  85. begin
  86.   VariableLimit := LOOP_LIMIT;
  87.  
  88.   TestIfVariableLimit();
  89. end.
  90.  
For a billion iterations, that program runs in just a tad over a second (note: that includes the load time).

Nested if (access to nested variables):
Code: Pascal  [Select][+][-]
  1. {$APPTYPE CONSOLE}
  2.  
  3. program TestNestedIf;
  4.   { program to test the time consumed to test against some limits             }
  5.  
  6.   const
  7.     MILLION1      =   1000000;
  8.     MILLION10     =  10000000;
  9.     MILLION100    = 100000000;
  10.  
  11.     BILLION       = 1000 * MILLION1;
  12.     BILLION5      =    5 * BILLION;
  13.  
  14.     LOOP_LIMIT    = BILLION;
  15.  
  16.   var
  17.     VariableLimit : qword;
  18.  
  19.   procedure OuterProcedure();                            { level 2 }
  20.   var
  21.     OuterVariable1, OuterVariable2, OuterVariable3 : qword;
  22.  
  23.  
  24.  
  25.     procedure Level3();                                  { level 3 }
  26.  
  27.       procedure Level4();                                { level 4 }
  28.  
  29.         procedure Level5();                              { level 5 }
  30.  
  31.           procedure TestIfVariableLimit();               { level 6 }
  32.           var
  33.             { FPC doesn't accept a nested variable as a            }
  34.             { "for" counter                                        }
  35.  
  36.             i : qword;
  37.  
  38.           begin
  39.             writeln('Variable limit - Loop STARTING');
  40.  
  41.             for i := 1 to VariableLimit do
  42.             begin
  43.               OuterVariable1 := i;
  44.               OuterVariable2 := i;
  45.               OuterVariable3 := i;
  46.  
  47.               if i >= VariableLimit then
  48.               begin
  49.                 writeln('Variable limit - Loop DONE ', i);
  50.                 writeln;
  51.               end;
  52.             end;
  53.           end;
  54.  
  55.           procedure TestIf();                            { level 6 }
  56.           var
  57.             i : qword;
  58.  
  59.           begin
  60.             writeln('Loop STARTING');
  61.  
  62.             for i := 1 to LOOP_LIMIT do
  63.             begin
  64.               OuterVariable1 := i;
  65.               OuterVariable2 := i;
  66.               OuterVariable3 := i;
  67.  
  68.               if i >= LOOP_LIMIT then
  69.               begin
  70.                 writeln('Loop DONE ', i);
  71.                 writeln;
  72.               end;
  73.             end;
  74.           end;
  75.  
  76.           procedure Level6();                            { level 6 }
  77.           begin
  78.             writeln('Level 6');
  79.             TestIfVariableLimit();
  80.           end;
  81.  
  82.         begin
  83.           writeln('Level 5');
  84.           Level6();
  85.         end;
  86.  
  87.       begin
  88.         writeln('Level 4');
  89.         Level5();
  90.       end;
  91.  
  92.     begin
  93.       writeln('level 3');
  94.       Level4();
  95.     end;
  96.  
  97.   begin
  98.     Level3();
  99.   end;
  100.  
  101.  
  102. begin
  103.   VariableLimit := LOOP_LIMIT;
  104.  
  105.   OuterProcedure();
  106. end.
  107.  
This one takes a little over 17 seconds.  IOW, it is about 17 times _slower_ than the program that has direct access to the variables it sets.  Walking the stack frame chain is expensive!.

NOTE: I use TCCLE which can time execution from the command line.  For this reason there is no timing mechanism in the program itself.



While there is some truth in it, it is an overgeneralization
It was speculation anyway.  I really don't know if it is an over-generalization or not but, I do believe that since classes/objects' data is accessed with just one de-reference, that is a disincentive for the compiler implementors to optimize nested variables accesses because the mantra has become "make it an object".




ETA:
The original test program from which those two are derived was meant to test how much time an "if" test took.  That's why the "if" is in the loop instead of outside.


« Last Edit: May 25, 2020, 12:22:31 am by 440bx »
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Optimization suggestion for nested variable access
« Reply #12 on: May 25, 2020, 01:32:52 am »
You are comparing global vs local vars.

Rewrite the test.

Case 1: your nested
Case 2: all nested are nested level 1 directly in OuterProcedure

That means in deep nested: ("TEstIf")
Code: Pascal  [Select][+][-]
  1. # [67] OuterVariable1 := i;
  2.         movq    -8(%rbp),%rax
  3.         movq    -8(%rax),%rax
  4.         movq    -8(%rax),%rax
  5.         movq    -8(%rax),%rax
  6.         movq    -16(%rbp),%rdx
  7.         movq    %rdx,-8(%rax)
  8.  

And in flat nested
Code: Pascal  [Select][+][-]
  1. # [171] OuterVariable1 := i;
  2.         movq    -8(%rbp),%rax
  3.         movq    -16(%rbp),%rdx
  4.         movq    %rdx,-8(%rax)
  5.  

reducing the "movq   -8(%rbp),%rax" to one quarter.

FPC 3.3.1  / 64 bit windows
             deep nest / flat nest
-O1       2.6 seconds / 1.7 seconds
-O4       1.9               /  0.7

FPC 3.0.4  / 64 bit windows
-O1       2.6 seconds / 1.8 seconds

Still a difference.
It about  half as fast (factor 2.5 with O4). Rather that factor ~17.
Of course it keeps one level of stack walking. But even if that was eliminated,  I doubt that makes factor 10 ?

No idea why you get such different numbers

Your un-nested (global var), on my machine
FPC 3.0.4  / 64 bit windows
-O1      1.8
-O4      0.7

So that is about the same speed as all procs with 1 level nested.


- all tests done with BILLION
- intel I7 8700K
- timings by gettickcount64
- all runs done twice or more, to spot any exceptions (none found)


« Last Edit: May 25, 2020, 01:36:24 am by Martin_fr »

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Optimization suggestion for nested variable access
« Reply #13 on: May 25, 2020, 01:48:38 am »
You are comparing global vs local vars.
That was the point of the test.  To measure the overhead of walking the stack chain to reach the variables.

As the nesting level increases, so does the performance loss.  Walking the stack chain only once upon entry to the function/procedure would eliminate that loss.  Note, it would be still be a bit slower because there would always be one (1) additional access through the cached frame pointer but, that loss would be small and, it would be almost fully independent of the nesting level.  Almost fully, because the time consumed by the initial walk upon entry depends on the nesting level.

(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9791
  • Debugger - SynEdit - and more
    • wiki
Re: Optimization suggestion for nested variable access
« Reply #14 on: May 25, 2020, 02:21:06 am »
You are comparing global vs local vars.
That was the point of the test.  To measure the overhead of walking the stack chain to reach the variables.

As the nesting level increases, so does the performance loss.  Walking the stack chain only once upon entry to the function/procedure would eliminate that loss.  Note, it would be still be a bit slower because there would always be one (1) additional access through the cached frame pointer but, that loss would be small and, it would be almost fully independent of the nesting level.  Almost fully, because the time consumed by the initial walk upon entry depends on the nesting level.

Well, yes I wasn't 100% correct either.

Access to global vars is different even to accessing none-nested locals (though probably not much of a time diff).
Therefore both tests would need locals. One would need nested, and the other un-nested.

But missing the other point.
Even doing the exact same test as you did, I do not get the same results: For me the factor was 2 or 2.5 instead of 17.
How to get your numbers?



Btw, I copied the test as it is => So I probably only run "TestIfVariableLimit". The others are not called.



 

TinyPortal © 2005-2018