### Bookstore

 Computer Math and Games in Pascal (preview) Lazarus Handbook

### Author Topic: Maximum inline Chain  (Read 649 times)

#### Warfley

• Sr. Member
• Posts: 303
##### Maximum inline Chain
« on: July 26, 2020, 03:28:31 am »
Hey, I recently discovered something interesting, and that is that the maximum number of inline chains is pretty low. Take this example:
Code: Pascal  [Select][+][-]
1. program Project1;
2.
3. {\$mode objfpc}{\$H+}
4.
5. function Foo(x, y: Integer): Integer; inline;
6. begin
7.   Result := (x + y) div 2;
8. end;
9.
10. function Bar(x, y: Integer): Integer; inline;
11. begin
12.   Result := Foo(x, x + y);
13. end;
14.
15. function FooBar(x, y: Integer): Integer; inline;
16. begin
17.   Result := Bar(y div 2, x*2);
18. end;
19.
20. function BarFoo(x, y: Integer): Integer; inline;
21. begin
22.   Result := FooBar(x*x, y*y);
23. end;
24.
25. var
26.   x, y: Integer;
27. begin
30.   WriteLn(FooBar(x,y));
31.   WriteLn(BarFoo(x,y));
32. end.
33.

So this is the simplest form of inlining I can think of, nothing but wrapping another function for certain input manipulations, but with FPC 3.2 I already get the message:
Code: Pascal  [Select][+][-]
1. project1.lpr(31,22) Note: Call to subroutine "function Foo(x:LongInt;y:LongInt):LongInt;" marked as inline is not inlined
Meaning that after only 4 function calls the FPC can't inline this function anymore. This is bad, really bad, because it penalizes the use of functions for simple expressions.

My goal is to write as readable code as possible, and complex expressions are rather hard to read, so I very often split them up into sub expressions and put them in functions. I use such one line functions like the average function above (Foo) very frequently and this basically means that after only 4 layers of such functions, the inlining won't take place anymore. And 4 layers of nested functions can be reached fairly quickly. From a simple test on O0 I found that not inlining for such a function would result in around 50% longer execution time for this code, so this is a really harsh punishment for something that is encouraged by nearly every coding guideline.
Even more this let's me think about other functions I am using like TStringHelper.Length, which are also marked as inline, and are often used by me in expressions.

So my question is, why is it so low?
Sure at some depth there must be a stop on how deep the compiler can inline functions (simply due to technical reasons, there is no infinite amount of memory), but 4 seems to be really low to me. The FPC never took that much memory on my system, even during the compilation of lazarus, the whole buildsystem costs at most a few hundered megabytes of memory (for which the vast majority is attributed to make and not the fpc). Every modern PC has at least 4 gigs of ram, even single board computers like the raspberry pi has 2 gb upwards. So why isn't this threshold set higher, it does not need to be exorbitantly high, but 32 for example would probably cover most of the code out there, while still should result in reasonably low memory usage.
Or it could be configurable, I have 32 gigs of RAM in my working machine, I would not mind the fpc taking up 30 of that 32 gigs if the resulting code would be better.

But 4 is just way to small, I've now written 3 programs with 3.2 and reached that limit in every project on multiple occasions in functions which where basically just wrappers for other functions to ease usage and make code clearer.

PS: besides the technical detail that having a function call for a simple wrapper or something similar is pretty wasteful, the "xxx marked as inline not inlined" message is also pretty annoying in these cases, because there is nothing one can do
« Last Edit: July 26, 2020, 03:42:19 am by Warfley »

#### ASBzone

• Sr. Member
• Posts: 461
• Automation leads to relaxation...
##### Re: Maximum inline Chain
« Reply #1 on: July 26, 2020, 04:30:40 am »
Isn't inlining essentially a request or recommendation to the compiler, and not a directive?
-ASB: https://www.BrainWaveCC.com

Lazarus v2.0.11 r63516 / FPC v3.2.1-r46879 (via FpcUpDeluxe) -- Windows 64-bit install w/32-bit cross-compile
Primary System: Windows 10 Pro x64, Version 2004 (Build 19041.508)
Other Systems: Windows 10 Pro x64, Version 2004 or greater

#### 440bx

• Hero Member
• Posts: 1994
##### Re: Maximum inline Chain
« Reply #2 on: July 26, 2020, 05:48:24 am »
I'm not fond of the idea of nesting inline function/procedures, particularly because it has a tendency to obfuscate code.

In the sample program you showed, the call to BarFoo yields the expression
Code: Pascal  [Select][+][-]
1.   (y * y) div 2 + (x * x);
That expression seems a lot simpler and easier to understand than going through four levels of inlined function calls, not to mention all the calculations that can easily be removed.
« Last Edit: July 26, 2020, 05:51:05 am by 440bx »
FPC v3.0.4 and Lazarus 1.8.2 on Windows 7 64bit.

#### Warfley

• Sr. Member
• Posts: 303
##### Re: Maximum inline Chain
« Reply #3 on: July 26, 2020, 02:35:23 pm »
Isn't inlining essentially a request or recommendation to the compiler, and not a directive?

Yes, sure, but if the compiler doesn't inline these things, this means that a. I need to restrict the usage of functions or b. for performance critical work I simply can't use FPC. I mean my current projects, while the message "xxx marked inline not inlined" is annoying, are not performance critical, so who cares, but I actually have another project I wanted to rewrite (which handles a few gigabytes of raw data for visualization and needs to be pretty fast) and I am now currently thinking that because of this I might actually need to work with C++ and QT (which is a pain in the ass but seems necessary). And I don't want that, I really like Pascal and Lazarus.

I'm not fond of the idea of nesting inline function/procedures, particularly because it has a tendency to obfuscate code.

In the sample program you showed, the call to BarFoo yields the expression
Code: Pascal  [Select][+][-]
1.   (y * y) div 2 + (x * x);
That expression seems a lot simpler and easier to understand than going through four levels of inlined function calls, not to mention all the calculations that can easily be removed.
Sure the example was just something I made up for an easy to write and reproducable example. One of my real examples is from a Hashset I've wrote:
Code: Pascal  [Select][+][-]
1. function THashSet.Add(constref AItem: T): Boolean;
2. begin
4.   Result := not Contains(AItem);
5.   If Result then FHashTable.Add(AItem);
6. end;
7.
9. var
11. begin
14.     IncreaseSize
15.   else if loadFactor < MinLoadFactor then
16.     DecreaseSize;
17. end;
18.
19. function THashSet.GetCurrentLoadFactor: Double;
20. begin
21.   Result := Size / Capacity;
22. end;
23.
24. function THashSet.GetSize: Double;
25. begin
26.   Result := FHashTable.ElementCount;
27. end;
28.
29. function THashTable.GetElementCount: SizeInt;
30. begin
31.   Result := FElementList.Size;
32. end;

So lets go through it one by one. Add only is 3 lines of code, with the real work being done in FHashTable.Add, so inlining would be sensible, but this function needs to be provided, because otherwise  THashSet would be useless. CheckLoadfactorAndRehash is again only a few lines, but as this is something that is most likely for someone looking into the Add function not of interest (while it might be of interest that this will happen, how the rehashing is implemented is not of interest most of the time), so having it in a separate functions makes sense. GetCurrentLoadFactor and GetSize are both properties so the need to be functions, but as they are only single liners, they should also be inlined. Lastly GetElementCount is a property of the underlying HashTable datastructure and needs to be provided because of visibility reasons.

So here we have an example with 5 functions, all of them should be inlined, with removing any of these functions would lead to code thats arguably worse to read.

And now we can even go a step further, my HashMap implementation, basically is just a wrapper for a HashSet used with a Key-Value pair, meaning this will add another layer of inlinable wrapper functions to every single function chain in that HashSet.

On top of that I've also built an LargeObjectHashset and LargeObjectHashMap, which has the objects in a separate location (so they don't need to be copied on rehash, but only their references), this adds another layer of inline functions that basically are just wrappers.

Another example from the very same project, the HashSet calls .Hash of the type T given to get the hash, which is in most cases an inlinable function. So for a HashMap this is THashMapEntry.Hash which is just a wrapper for THashMapEntry.Key.Hash. If the key is now of type TPair (which I use pretty frequently) this is also just a wrapper to TPair.First.Hash and TPair.Second.Hash.
So here we already have a chain of 3 inlinable functions, just for hash computation and which simply *can't* be reduced, because the generic class doesn't know what the target type will be
« Last Edit: July 26, 2020, 02:43:47 pm by Warfley »

#### MarkMLl

• Hero Member
• Posts: 1219
##### Re: Maximum inline Chain
« Reply #4 on: July 26, 2020, 06:51:49 pm »
...after only 4 function calls the FPC can't inline this function anymore. This is bad, really bad, because it penalizes the use of functions for simple expressions.

See the function check_inlining in compiler/ncal.pas

MarkMLl
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.

#### Warfley

• Sr. Member
• Posts: 303
##### Re: Maximum inline Chain
« Reply #5 on: July 26, 2020, 09:50:31 pm »
See the function check_inlining in compiler/ncal.pas

MarkMLl

May I ask what exactly the nodecount of function is? (I don't know anything about the fpcs internals)

#### MarkMLl

• Hero Member
• Posts: 1219
##### Re: Maximum inline Chain
« Reply #6 on: July 26, 2020, 10:02:34 pm »
See the function check_inlining in compiler/ncal.pas

MarkMLl

May I ask what exactly the nodecount of function is? (I don't know anything about the fpcs internals)

Time to start learning then >:-)

I got to that with ten minutes of grepping, and you might possibly find that even without a detailed understanding of what's going on changing that prominent embedded constant has an interesting effect. (Roughly translated: I don't claim to know much more than you do :-)

The compiler parses input and generates nodes in a tree, rather than outputting assembler or some form of object file immediately. Each function generates a certain number of nodes, depending on its complexity; see any elementary book on compilers. There's a variable that increments/decrements depending on inline depth. The compiler chooses whether to honour an "include" hint based on the number of nodes currently in the tree, not on a fixed depth. So for simple functions an include depth of four is honoured, make one of them a bit more complex and that will reduce to three and so on.

MarkMLl
« Last Edit: July 26, 2020, 10:11:34 pm by MarkMLl »
Turbo Pascal v1 on CCP/M-86, multitasking with LAN and graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.