Recent

Author Topic: [SOLVED] Poor optimization of constant folding  (Read 5427 times)

lagprogramming

  • Sr. Member
  • ****
  • Posts: 390
[SOLVED] Poor optimization of constant folding
« on: September 27, 2023, 10:53:39 am »
According to FPC's Programmer's guide:
Quote
11.1.1
Constant folding
In Free Pascal, if the operand(s) of an operator are constants, they will be evaluated at compile time.
Example
x:=1+2+3+6+5;
will generate the same code as
x:=17;
So, I thought arithmetic operations of constants are computed at compile time.
The following program should be self explanatory.
Code: Pascal  [Select][+][-]
  1. program ConstantFoldingOptimization;
  2. {$OPTIMIZATION LEVEL3}
  3. function Foo(a:integer):integer;
  4. begin
  5.   result:=-a*2;
  6.   {negl %edi
  7.    leal (%edi,%edi,1),%eax}
  8. end;
  9.  
  10. function Bar(a:integer):integer;
  11. begin
  12.   result:=a*(-2);
  13.   {movl  %edi,%eax
  14.    imull $-2,%eax}
  15. end;
  16.  
  17. function Baz(a:integer):integer;
  18. begin
  19.   result:=a*(-2)*(-1)*(-1)*(-1)*(-1)*(3);
  20.   {movl %edi,%eax
  21.   imull $-2,%eax
  22.   negl  %eax
  23.   negl  %eax
  24.   negl  %eax
  25.   negl  %eax}
  26. end;
  27.  
  28. function Wow(a:integer):integer;
  29. begin
  30.   result:=a*(-2)*(-1)*(-1)*(-1)*(-1)*(3);
  31.   {movl %edi,%eax
  32.   imull $-2,%eax
  33.   negl  %eax
  34.   negl  %eax
  35.   negl  %eax
  36.   negl  %eax
  37.   leal  (%eax,%eax,2),%eax}
  38. end;
  39.  
  40. var
  41.   z:integer;
  42. begin
  43.   z:=paramcount;
  44.   writeln(Foo(z));
  45.   writeln(Bar(z));
  46.   writeln(Baz(z));
  47.   writeln(Wow(z));
  48. end.
If constant folding optimization has been disabled temporarily, I have to say that it takes quite some time to enable it again.
Maybe this optimization is not disabled, maybe it's just incomplete.  :-\
« Last Edit: November 17, 2023, 01:24:56 pm by lagprogramming »

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Poor optimization of constant folding
« Reply #1 on: September 27, 2023, 11:06:49 am »
Its computed at compile time only if whole expression is constant

Code: ASM  [Select][+][-]
  1. # [5] result:=a*(-2)*(-1)*(-1)*(-1)*(-1)*(3);
  2.         imull   $-2,%eax
  3.         negl    %eax
  4.         negl    %eax
  5.         negl    %eax
  6.         negl    %eax
  7.         leal    (%eax,%eax,2),%eax

Code: ASM  [Select][+][-]
  1. # [5] result := (-1)*(-1)*(-1)*(-1)*(3);
  2.         movl    $3,%edx
  3. # [6] result := a*(-2)*result;
  4.         imull   $-2,%eax
  5.         imull   %edx,%eax

runewalsh

  • Jr. Member
  • **
  • Posts: 72
Re: Poor optimization of constant folding
« Reply #2 on: September 27, 2023, 11:42:56 am »
Reported here. Until the fix in mid-2030s, you can force the folding with parentheses:

Code: Pascal  [Select][+][-]
  1. result:=a*((-2)*(-1)*(-1)*(-1)*(-1)*(3));

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Poor optimization of constant folding
« Reply #3 on: September 27, 2023, 11:49:10 am »

440bx

  • Hero Member
  • *****
  • Posts: 3624
Re: Poor optimization of constant folding
« Reply #4 on: September 27, 2023, 11:53:55 am »
Until the fix in mid-2030s, you can force the folding with parentheses:

Code: Pascal  [Select][+][-]
  1. result:=a*((-2)*(-1)*(-1)*(-1)*(-1)*(3));
or better yet, calculate the value of the constant along with a comment as to how it was calculated (if necessary for clarity)
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 SP1 64bit.

runewalsh

  • Jr. Member
  • **
  • Posts: 72
Re: Poor optimization of constant folding
« Reply #5 on: September 27, 2023, 12:02:28 pm »
calculate the value of the constant along with a comment

Better to declare a const if you are that distrustful of folding.

lagprogramming

  • Sr. Member
  • ****
  • Posts: 390
Re: Poor optimization of constant folding
« Reply #6 on: September 28, 2023, 12:44:30 pm »
Until the fix in mid-2030s, you can force the folding with parentheses:

Code: Pascal  [Select][+][-]
  1. result:=a*((-2)*(-1)*(-1)*(-1)*(-1)*(3));
or better yet, calculate the value of the constant along with a comment as to how it was calculated (if necessary for clarity)
@runewalsh
Additional parenthesis is a nice trick.

@440bx
:) Reminds me of the following lines in LCL's customdrawnwinapi.inc
Code: Pascal  [Select][+][-]
  1. //lFontSize:= MulDiv(lFontSize,72,ftFont.DPI); // convert points to pixels
  2. lFontSize := Round(ftFont.TextHeight(Str) * 0.75);// ToDo: Find out why this 75% factor works
:) Most likely a developer precomputed that 0.75 constant value and other developers could not trace the arithmetic to source.
Fpgui has:
Code: Pascal  [Select][+][-]
  1. function TFPGUIWinAPIFont.GetFontSize: integer;
  2. begin
  3.   Result:=(-FFontHeight * 72) div 96;
  4. end;
72/96=0.75

Thaddy

  • Hero Member
  • *****
  • Posts: 13250
Re: Poor optimization of constant folding
« Reply #7 on: September 28, 2023, 03:48:27 pm »
declaring as inline, provided all elements are constant..
I actually get compliments for being rude... (well, Dutch, but that is the same)

ccrause

  • Hero Member
  • *****
  • Posts: 799
Re: Poor optimization of constant folding
« Reply #8 on: September 28, 2023, 05:04:46 pm »
Its computed at compile time only if whole expression is constant
The simplification of constant folding happens in taddnode.simplify (at least the initial simplification).  Here the left and right nodes of an operator are inspected to check if both are constant and of the same type, then the folding is applied. Since this simplification only looks at the immediate left and right nodes of the operand, opportunities farther away is not visible. 
Code: Pascal  [Select][+][-]
  1.   b := 10;
  2.   a := -1 * b * -1;
With -O2 or lower the following code is generated:
Code: ASM  [Select][+][-]
  1. # [9] a := -1 * b *  -1;
  2.         movslq  %esi,%rbx
  3.         negq    %rbx
  4.         negq    %rbx  
With -O3 or higher:
Code: ASM  [Select][+][-]
  1. # [9] a := -1 * b *  -1;
  2.         movl    $10,%esi  
So there is another optimization that can further simplify the expression, although the following example doesn't fold at -O4:
Code: ASM  [Select][+][-]
  1. # [15] a := b * -1 * -1;
  2.         movslq  %edi,%rbx
  3.         negq    %rbx
  4.         negq    %rbx

A workaround is to create sub expressions that can be folded before taddnode.simplify is called:
Code: ASM  [Select][+][-]
  1. # [18] a := b * (-1 * -1);
  2.         movslq  %esi,%rsi

One could perhaps expand the scope of the simplification to search one node wider (deeper?) to cover the above case, but that will then fail to fold the following example:
Code: Pascal  [Select][+][-]
  1. a = -1 * b * -1 * b * -1;
Note: this example does get simplified all the way to a constant value at -O3.

I think the current situation where seemingly obvious opportunities for constant folding are missed is caused by the fairly simplistic heuristics used by the compiler.  Perhaps someone can come up with a tweak to this heuristic that covers more situations without incurring a significant speed penalty.

440bx

  • Hero Member
  • *****
  • Posts: 3624
Re: Poor optimization of constant folding
« Reply #9 on: September 28, 2023, 07:18:03 pm »
I doubt what I am going to mention can be applied to FPC without significant effort.  The solution to a large number of expression optimization problems, including constant folding, is to evaluate the expression _symbolically_. 

This not only coalesces all constants (such as the above examples) but allows for the recognition of common algebraic patterns that can be substituted for equivalent but simpler (read faster) patterns.

A little bit of Mathematica in the compiler ;)
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 SP1 64bit.

Warfley

  • Hero Member
  • *****
  • Posts: 1469
Re: Poor optimization of constant folding
« Reply #10 on: September 29, 2023, 12:13:25 am »
Doing something like this is not as easy as one may think. Whe you as a human look at "x * 2 * 3 * 4" you probably see a (singular) chain of multiplications, basically one variable and then some tail thats multiplied onwards. But for the compiler * is a left associative binary operation. So the compiler sees "(((x*2)*3)*4)".
This is then parsed into the following binary tree:
Code: Pascal  [Select][+][-]
  1.         *
  2.        / \
  3.       *   4
  4.      / \
  5.     *   3
  6.    / \
  7.   x   2
For the compiler every single * node is a non constant node, because the left side of the multiplication is always a subtree containing a variable. To be able to reckognize that it is equivalent to one variable multipled with a constant subtree, this tree must be transformed into a right associative tree:
Code: Pascal  [Select][+][-]
  1.    *
  2.   / \
  3.  x   *
  4.     / \
  5.    2   *
  6.       / \
  7.      3   4

The operations here are pretty simple, as it's a simple tree invert, but there are more complex trees:
Code: Pascal  [Select][+][-]
  1. x * (2 * y) * 3
  2.      *
  3.     / \
  4.    *   3
  5.   / \
  6.  x   *
  7.     / \
  8.    2   y

Of course there are options to make this easier, e.g. you could use a flattend n-ary tree to reason over it, which makes it much, as you can then simply remove the non constant subtrees. While this works for such trivial examples, as soon as you have mixed multiplication and addition this won't work anymore.
Then you would need to normalize your formular, by repeatedly applying the distributive law. But this is in exp-space (and exp-time). E.g.
Code: Pascal  [Select][+][-]
  1.   (a * 2) + (b * 3) + (c * 4)
  2. = (a + b + c) * (2 + b + c) * (a + 3 + c) * (2 + 3 + c) * (a + b + 4) * (2 + b + 4) * (a + 3 + 4) * (2 + 3 + 4)
So a sum of x terms consisting of products of n factors you will get x^n terms in when normalizing using the distributive law (in this case 2^3=8). And all of this just to find a single constant subexpression that is not worth optimizing away because it blows this formular up so much
So what would happen for complex arithmetic expressions is, you would hit compile, then your FPC freezes for quite some time and then crash because it ran out of memory.

Of course there are heuristics and other things, for example GCC compiles the following non trivial expression:
Code: C  [Select][+][-]
  1. int compute(int num) {
  2.     return 1 + (num * 3) + (4 * 5);
  3. }
Into the single instruction:
Code: ASM  [Select][+][-]
  1. lea     eax, [rdi+21+rdi*2]
But this comes at a cost, and that cost is that C compilation is really really slow. Compiling FPC with FPC takes a minute or so. Compiling GCC with GCC takes around 6 hours.

The FPC is a very fast compiler compared to many C compilers like Clang or GCC. But the reason is not because the FPC is so much better optimized, but because it simply does much less stuff. And this sort of optimization is the sort of stuff FPC does not do compared to it's C counterparts
« Last Edit: September 29, 2023, 12:18:01 am by Warfley »

Fibonacci

  • Full Member
  • ***
  • Posts: 219
  • #PDK
Re: Poor optimization of constant folding
« Reply #11 on: September 29, 2023, 04:11:46 am »
Compiling GCC with GCC takes around 6 hours.

What :o

circular

  • Hero Member
  • *****
  • Posts: 4077
    • Personal webpage
Re: Poor optimization of constant folding
« Reply #12 on: September 29, 2023, 05:46:03 am »
Sometimes, deep optimizations might give a different result. Consider an obvious example:
Code: Pascal  [Select][+][-]
  1. result := (a div 3) * 3;
  2.  
It is clear here that simplifying to a would remove the effect aligning to a multiple of 3. Obviously, the compiler would not optimize in this case. But sometimes the effect can be more subtle.

For example, with range of values can be reached in intermediate values or the result is different due to the precision of floating point values. Such optimization could be depend on a compiler option for backward compatibility.

Putting things that can be optimized inside brackets or in a constant is a tradeoff to say where the compiler can simplify.
Conscience is the debugger of the mind

Leledumbo

  • Hero Member
  • *****
  • Posts: 8717
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Poor optimization of constant folding
« Reply #13 on: September 29, 2023, 07:02:04 am »
Compiling GCC with GCC takes around 6 hours.

What :o
The full source package indeed takes around that long even with a powerful CPU, a lot of fast RAM and NVMe SSD. The language, the compiler and the build system all contribute to its slow build time. However, if you just want to build the C and C++ part, it "only" takes 30-40 minutes, standard libraries included. Of course that number still looks pale compared to bootstrapping both fpc and lazarus that takes less than 5 minutes on my machine (I add -O3 to my build script, so it's already a voor).

440bx

  • Hero Member
  • *****
  • Posts: 3624
Re: Poor optimization of constant folding
« Reply #14 on: September 29, 2023, 08:00:38 am »
The full source package indeed takes around that long even with a powerful CPU, a lot of fast RAM and NVMe SSD.
I've wondered for quite some time how long it takes to do a full build of Windows 10 or 11.  That's 25 million+ lines of code, even on a really fast machine, it must be a rather "noticeable" amount of time.
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 SP1 64bit.

 

TinyPortal © 2005-2018