Forum > FPC development

[SOLVED] Poor optimization of constant folding

<< < (3/5) > >>

Warfley:
Doing something like this is not as easy as one may think. Whe you as a human look at "x * 2 * 3 * 4" you probably see a (singular) chain of multiplications, basically one variable and then some tail thats multiplied onwards. But for the compiler * is a left associative binary operation. So the compiler sees "(((x*2)*3)*4)".
This is then parsed into the following binary tree:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---        *       / \      *   4     / \    *   3   / \  x   2For the compiler every single * node is a non constant node, because the left side of the multiplication is always a subtree containing a variable. To be able to reckognize that it is equivalent to one variable multipled with a constant subtree, this tree must be transformed into a right associative tree:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---   *  / \ x   *    / \   2   *      / \     3   4
The operations here are pretty simple, as it's a simple tree invert, but there are more complex trees:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---x * (2 * y) * 3     *    / \   *   3  / \ x   *    / \   2   y
Of course there are options to make this easier, e.g. you could use a flattend n-ary tree to reason over it, which makes it much, as you can then simply remove the non constant subtrees. While this works for such trivial examples, as soon as you have mixed multiplication and addition this won't work anymore.
Then you would need to normalize your formular, by repeatedly applying the distributive law. But this is in exp-space (and exp-time). E.g.

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---  (a * 2) + (b * 3) + (c * 4)= (a + b + c) * (2 + b + c) * (a + 3 + c) * (2 + 3 + c) * (a + b + 4) * (2 + b + 4) * (a + 3 + 4) * (2 + 3 + 4)So a sum of x terms consisting of products of n factors you will get x^n terms in when normalizing using the distributive law (in this case 2^3=8). And all of this just to find a single constant subexpression that is not worth optimizing away because it blows this formular up so much
So what would happen for complex arithmetic expressions is, you would hit compile, then your FPC freezes for quite some time and then crash because it ran out of memory.

Of course there are heuristics and other things, for example GCC compiles the following non trivial expression:

--- Code: C  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---int compute(int num) {    return 1 + (num * 3) + (4 * 5);}Into the single instruction:

--- Code: ASM  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---lea     eax, [rdi+21+rdi*2]But this comes at a cost, and that cost is that C compilation is really really slow. Compiling FPC with FPC takes a minute or so. Compiling GCC with GCC takes around 6 hours.

The FPC is a very fast compiler compared to many C compilers like Clang or GCC. But the reason is not because the FPC is so much better optimized, but because it simply does much less stuff. And this sort of optimization is the sort of stuff FPC does not do compared to it's C counterparts

Fibonacci:

--- Quote from: Warfley on September 29, 2023, 12:13:25 am ---Compiling GCC with GCC takes around 6 hours.

--- End quote ---

What :o

circular:
Sometimes, deep optimizations might give a different result. Consider an obvious example:

--- Code: Pascal  [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---result := (a div 3) * 3; It is clear here that simplifying to a would remove the effect aligning to a multiple of 3. Obviously, the compiler would not optimize in this case. But sometimes the effect can be more subtle.

For example, with range of values can be reached in intermediate values or the result is different due to the precision of floating point values. Such optimization could be depend on a compiler option for backward compatibility.

Putting things that can be optimized inside brackets or in a constant is a tradeoff to say where the compiler can simplify.

Leledumbo:

--- Quote from: Fibonacci on September 29, 2023, 04:11:46 am ---
--- Quote from: Warfley on September 29, 2023, 12:13:25 am ---Compiling GCC with GCC takes around 6 hours.

--- End quote ---

What :o

--- End quote ---
The full source package indeed takes around that long even with a powerful CPU, a lot of fast RAM and NVMe SSD. The language, the compiler and the build system all contribute to its slow build time. However, if you just want to build the C and C++ part, it "only" takes 30-40 minutes, standard libraries included. Of course that number still looks pale compared to bootstrapping both fpc and lazarus that takes less than 5 minutes on my machine (I add -O3 to my build script, so it's already a voor).

440bx:

--- Quote from: Leledumbo on September 29, 2023, 07:02:04 am ---The full source package indeed takes around that long even with a powerful CPU, a lot of fast RAM and NVMe SSD.

--- End quote ---
I've wondered for quite some time how long it takes to do a full build of Windows 10 or 11.  That's 25 million+ lines of code, even on a really fast machine, it must be a rather "noticeable" amount of time.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version