<snip> ... "comparing programming languages' performance"
<snip> ... but that thread had some relevant and interesting tips ...
Compiler algorithms for code optimizations is a subject I really enjoy. I'll throw a few things here as food for thought for others who enjoy it as well.
The first thing to realize is that the language's grammar can have quite an impact on how simple or complicated optimizing code can be.
The more information and, the more accurate and precise that information is, the more likely the compiler is to be able to perform optimizations it couldn't do otherwise. The amount and quality of the information built into the language's grammar is key. In that regard, a strongly typed language has the advantage over a loosely typed language. Because the language is strongly typed, the information is often dependable.
Let's see an example of how the language grammar affects code optimizations. The C language does _not_ allow nested functions/procedure whereas Pascal does. Nested functions and procedures are a superb feature to produce code that is well organized and easy to understand _but_ from an optimization viewpoint it creates a problem. The problem is, if a nested function/procedure accesses a variable that is in one of its containing functions/procedures then the access to the variable is indirect. The compiler has a pointer to the parent's function/procedure stack frame, it uses that pointer to get to the stack frame, then the variable's offset from the parent's stack frame to access the variable. If a procedure/function is nested n-deep, that means the compiler has to "walk" through those stack frame pointers until it reaches the desired stack frame. Obviously that takes time and code, thus, though a very nice feature, from a performance viewpoint, it doesn't produce ideal results.
I'm going to be "politically incorrect" and state that OOP is quite likely a way (a rather poor one) around that problem. By putting the data in the heap instead of the stack and passing a hidden pointer (self) to it (sort of mimicking the stack frame pointer chain) the compiler doesn't have to walk a chain of pointers to access the object's/class fields. That's a way of "flattening" the stack - simply by not using the stack, using the heap instead (which causes plenty of problems.)
What's notable is that,
in the absence of recursion (bolded because that's very important), the compiler could "flatten" the stack and produce code for all nested function/procedures that is relative to the root function. That optimization isn't simple but, it's not really all that complicated either, what makes it difficult is that, the code generator and optimizer often have a limited (often, very limited) view of the surrounding code and, and that optimization may require the compiler to inspect _thousands_, maybe even tens of thousands (or more), of instructions. In most cases, the architecture of the compiler, makes such optimizations not possible.
Other optimizations that can result in substantial improvements are logical optimizations. Here is an example, take a statement like "x := a^2 - b^2;". If the compiler knew a little algebra, it could see that is the same as "x := (a + b) * (a - b);" The first statement requires two multiplications (presuming no hardware power instruction) and one subtraction. The second one requires one addition, one subtraction and only one multiplication. On most architectures it will be faster. Logical optimizations are wonderful stuff but, they can definitely extend the compile times.
The Pascal grammar isn't perfect but, it's really good. From an optimization viewpoint, it can produce better results than a language like C but, optimizing Pascal source code, will require more memory and a compiler architecture that is different than the typical one.
I know there is an old implementation of Pascal (I believe used in one of the supercomputer centers in the U.S) that Kernighan had a look at and decided he couldn't use as a basis for writing his new language (that is, C) because that compiler kept all of its symbol tables in memory. That required way more memory than the rather minimal amount available on PDP-8 class machines but, I strongly suspect the reason that compiler kept everything in memory at all times was to allow it to perform optimizations that would not be possible otherwise.
Today (April 2020), a PC-class computer can have as many as 64 computing cores and 128GB (that's, 131,072 _megabytes_) of memory yet, compilers, with a few exceptions, are still being designed and written as if computers only had 32K of memory.