Recent

Author Topic: Faster code execution idiom  (Read 1232 times)

lagprogramming

  • Sr. Member
  • ****
  • Posts: 407
Faster code execution idiom
« on: November 28, 2023, 11:59:56 am »
For a faster code execution speed, which programming style(idiom) is better?
Code: Pascal  [Select][+][-]
  1. function Foo:pointer;
  2. begin
  3.   Result:=Function1;
  4.   if Result=nil then Result:=Function2;
  5.   if Result=nil then Result:=Function3;
  6.   if Result=nil then Result:=Function4;
  7. end;
vs.
Code: Pascal  [Select][+][-]
  1. Function Bar:pointer;
  2. begin
  3.   Result:=Function1;
  4.   if Result=nil then
  5.   begin
  6.     Result:=Function2;
  7.     if Result=nil then
  8.     begin
  9.       Result:=Function3;
  10.       if Result=nil then Result:=Function4;
  11.     end;
  12.   end;
  13. end;
  14.  
Function Bar should aid many CPUs in order to let them have improved conditional jump prediction over Foo, right? When is Bar slower than Foo? I ask because the compiler can target multiple CPU architectures and I'm interested if I can write a recommended Pascal programming practice for this kind of situations.
I've noticed that core developers of both Free Pascal and Lazarus prefer the Foo programming style(idiom).

RayoGlauco

  • Full Member
  • ***
  • Posts: 193
  • Beers: 1567
Re: Faster code execution idiom
« Reply #1 on: November 28, 2023, 12:13:32 pm »
I think the second version will be a little faster, because the first version makes useless comparisons, (in case Function1 is not nil, for example).
What about this:
Code: Pascal  [Select][+][-]
  1.     function Another:pointer;
  2.     begin
  3.       if Function1<>nil then exit(Function1);
  4.       if Function2<>nil then exit(Function2);
  5.       if Function3<>nil then exit(Function3);
  6.       Result:=Function4;
  7.     end;
  8.  
« Last Edit: November 28, 2023, 12:20:02 pm by RayoGlauco »
To err is human, but to really mess things up, you need a computer.

RayoGlauco

  • Full Member
  • ***
  • Posts: 193
  • Beers: 1567
Re: Faster code execution idiom
« Reply #2 on: November 28, 2023, 12:33:34 pm »
Yet another version. Maybe optimization will make it all result in the same assembly code. I don't know much about optimization and multithreading, but I think that code with simpler logic should not be penalized by these things.
Code: Pascal  [Select][+][-]
  1.     function AnotherOne:pointer;
  2.     begin
  3.       if Function1<>nil then Result:=Function1
  4.       else if Function2<>nil then Result:=Function2
  5.       else if Function3<>nil then Result:=Function3
  6.       else Result:=Function4;
  7.     end;
« Last Edit: November 28, 2023, 12:43:50 pm by RayoGlauco »
To err is human, but to really mess things up, you need a computer.

MathMan

  • Sr. Member
  • ****
  • Posts: 405
Re: Faster code execution idiom
« Reply #3 on: November 28, 2023, 01:32:54 pm »
For a faster code execution speed, which programming style(idiom) is better?
Code: Pascal  [Select][+][-]
  1. function Foo:pointer;
  2. begin
  3.   Result:=Function1;
  4.   if Result=nil then Result:=Function2;
  5.   if Result=nil then Result:=Function3;
  6.   if Result=nil then Result:=Function4;
  7. end;
vs.
Code: Pascal  [Select][+][-]
  1. Function Bar:pointer;
  2. begin
  3.   Result:=Function1;
  4.   if Result=nil then
  5.   begin
  6.     Result:=Function2;
  7.     if Result=nil then
  8.     begin
  9.       Result:=Function3;
  10.       if Result=nil then Result:=Function4;
  11.     end;
  12.   end;
  13. end;
  14.  
Function Bar should aid many CPUs in order to let them have improved conditional jump prediction over Foo, right? When is Bar slower than Foo? I ask because the compiler can target multiple CPU architectures and I'm interested if I can write a recommended Pascal programming practice for this kind of situations.
I've noticed that core developers of both Free Pascal and Lazarus prefer the Foo programming style(idiom).

From my perspective this is a really tricky question - and I'm not sure that it can be answered in a general way.

I'm not following the compiler development too closely (and haven't had a chance to look at the produced asm yet), but I do see a fair chance of it being able to completely linearise 'Foo' with conditional move instructions available on modern CPU architectures. I would give it lower chance to do the same with 'Bar'. However, I also would not derive a general rule from that, as the compiler (and CPU architectures) are evolving constantly.

Beside that one needs to consider the use case of 'Foo' / 'Bar' - how often is it called, how often do the pointers to 'function1-4' change, etc. Especially the latter might provide a challenge to branch predictors, as this is purely data dependent branching then. In addition it is to be expected that the retrieved function pointer gets activated later on. And depending on your CPU architecture - size of branch-target-buffer, etc - this can induce a more severe pipeline stall and slowdown than the incorrect predicted branches in 'Foo' / 'Bar' itself.

Cheers,
MathMan

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10561
  • Debugger - SynEdit - and more
    • wiki
Re: Faster code execution idiom
« Reply #4 on: November 28, 2023, 02:32:42 pm »
If you ask such a question, then the answer (if at all) can only be given for
- a specific given compiler
- at a specific version
- with exact compilation settings known
- for a specific target computer (excact CPU model, and sometimes other hardware)
....


In general, each "if" is a time consumer.
- In the first example you always execute all 3 if (and all 3 comparisons). Even if "Function1" returned non-nil.
- In the 2nd example (cascade begin/end) you stop comparing as soon as you have a non-nil.

So that means, if all functions return randomly nil/non-nil, then in the 2nd example you have less comparisons/branches on average => faster.

However, depending on how this is translated, the resulting machine code may influence the CPU's branch-predictor in different ways... Or it may not, that really can't be told from the Pascal code alone.
If it does, then depending on the pattern in which nil/non-nil is returned, either of the 2 solutions may be faster.

But until this point, the 2nd one is likely either faster or equal.
(You probably need to run it several million times before you can notice a difference, and even then we are talking fractions of a second.)



Further more either version of the code, could be differently (better or worse) treated by the compilers optimizer.

I don't think that fpc 3.2.n will make a difference there, but...



Also about the nested begin/end version => it changes the amount of code each condition has to skip => the conditional jumps become 1) longer / 2) end on different addresses.
Depending on the exact CPU (even different models of the same manufacturer), this may or may not influence execution speed (for the better or worse).




But in the end, for all practical reasons, if I were writing code that needed to perform fast (and counting every nanosecond) then I would go for the nested begin/end version.


MathMan

  • Sr. Member
  • ****
  • Posts: 405
Re: Faster code execution idiom
« Reply #5 on: November 28, 2023, 02:39:54 pm »
@Martin_fr - thanks for this more elaborate roll-out of what I also tried to convey. I definitely agree on all of your points.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10561
  • Debugger - SynEdit - and more
    • wiki
Re: Faster code execution idiom
« Reply #6 on: November 28, 2023, 02:48:20 pm »
@Martin_fr - thanks for this more elaborate roll-out of what I also tried to convey. I definitely agree on all of your points.

It gets even worse. Even if a specific choice is faster on a given platform, then it could appear slower in a benchmark - even with all control-able factors being equal.

On some CPU, the location in memory (in relation to certain boundaries)  may affect the speed. Which means changes in a different procedure (that is not even called while the time is measured) can affect the time taken during a benchmark (or in a real live run).

Josh

  • Hero Member
  • *****
  • Posts: 1344
Re: Faster code execution idiom
« Reply #7 on: November 28, 2023, 02:49:07 pm »
would this optimize better?
Code: [Select]
function Foo:pointer;
begin
  Result:=function1;
  if Result<>nil then exit;
  Result:=function2;
  if Result<>nil then exit;
  Result:=function3;
  if Result<>nil then exit;
  Result:=function4;
end;
« Last Edit: November 28, 2023, 03:03:28 pm by Josh »
The best way to get accurate information on the forum is to post something wrong and wait for corrections.

MathMan

  • Sr. Member
  • ****
  • Posts: 405
Re: Faster code execution idiom
« Reply #8 on: November 28, 2023, 03:06:19 pm »
@Martin_fr - thanks for this more elaborate roll-out of what I also tried to convey. I definitely agree on all of your points.

It gets even worse. Even if a specific choice is faster on a given platform, then it could appear slower in a benchmark - even with all control-able factors being equal.

On some CPU, the location in memory (in relation to certain boundaries)  may affect the speed. Which means changes in a different procedure (that is not even called while the time is measured) can affect the time taken during a benchmark (or in a real live run).

I sometimes use whats called "superoptimizer" - a tool that takes asm source, generates allowed instruction shuffles out of that, measures execution time and by that helps to detect "ideal" code arrangements. On older CPU architectures I used to see up to 20% improvements, nowadays I'm happy to see even 5%. So implementational quirks seem to vanish from architectures.

Today, when trying to optimze something I'm much more worried about data layouts, minimising copies, etc. The code produced by modern compilers is usually less of "time waster" than a bad data design - imho.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10561
  • Debugger - SynEdit - and more
    • wiki
Re: Faster code execution idiom
« Reply #9 on: November 28, 2023, 03:08:50 pm »
would this optimize better?
Code: [Select]
function Foo:pointer;
begin
  Result:=function1;
  if Result<>nil then exit;
  Result:=function2;
  if Result<>nil then exit;
  Result:=function3;
  if Result<>nil then exit;
  Result:=function4;
  if Result<>nil then exit;
end;

Within the controllable factors, probably similar to the nested begin/end.

However, it inverts the comparison logic (if the compiler does not internally change it in either version).
This may affects the branch predictor. And if so, then it depends what results are most likely by either function.

I have no idea how fpc handles this. Afaik some compilers create code that a "then" is assumed more likely than an "else" (though not sure, what is in absence of an "else").

So then (if code is indeed generated in such way) the question is at which percentage does each procedure return nil or non-nil. And therefore how often the branch predictor gets it right.




Again, on any modern CPU the difference will be nano second, if at all. You need to run millions (if not billions) of iterations to get a measurable difference. But if you run that code such often, you also spent time in the called functions that often => assuming that they are actually doing work they will consume much more time. So in terms of time gained, you are talking tiny fractions of a percent.





The real optimization potential may lay elsewhere.

For example if each of the called functions would start something like
Code: Pascal  [Select][+][-]
  1. function func_n: Pointer;
  2. begin
  3.   If SomeVal <> CONST_EXP then exit(nil);
  4. //... lots of code to process

Then you may (in many, but not all cases) want that condition to be executed before the function is called.

But you don't necessarily want to copy and paste it to every place where the function is called...

So you do
Code: Pascal  [Select][+][-]
  1. function func_n: Pointer; INLINE;
  2. begin
  3.   If SomeVal <> CONST_EXP then exit(nil);
  4.   result := Internally_do_func_n;
  5. end;
  6. function Internally_do_func_n: Pointer;
  7. begin
  8.  
  9. //... lots of code to process

So then the compiler can inline that one condition for you.

Again, even for this, you need the code to be called many times in a loop, or you wont notice the difference.

And also, it only makes sense, if the condition has some likelihood of returning without calling the Internally_do_func_n.
Otherwise you make the code on the caller site larger => And increasing the size of a loop body can (sometimes) also be slowing things down.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 10561
  • Debugger - SynEdit - and more
    • wiki
Re: Faster code execution idiom
« Reply #10 on: November 28, 2023, 03:17:06 pm »
I sometimes use whats called "superoptimizer" - a tool that takes asm source, generates allowed instruction shuffles out of that, measures execution time and by that helps to detect "ideal" code arrangements. On older CPU architectures I used to see up to 20% improvements, nowadays I'm happy to see even 5%. So implementational quirks seem to vanish from architectures.

Today, when trying to optimze something I'm much more worried about data layouts, minimising copies, etc. The code produced by modern compilers is usually less of "time waster" than a bad data design - imho.

And by optimizing for one specific CPU (the one you benchmark on) you could de-optimize for others. => if you distribute an app, it's not enough to be optimized for one single CPU. Rather do an fair above avg on many CPU.

There was some recent discussion on the fpc mail list. I did not follow it all the way, but if I got it right: Some lea versus mov+add => And either depending on the CPU either could be faster.



Mind that fpc trunk hand some work on those kind of asm-instruction-order/replacement optimizations (peep hole opt).

While many of my apps haven't seen much difference, one of them is (iirc) 3% faster with fpc trunk (on -O4)

 

TinyPortal © 2005-2018