Recent

Author Topic: performance problem with Free Pascal 2.6.x  (Read 12128 times)

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: performance problem with Free Pascal 2.6.x
« Reply #15 on: November 04, 2014, 09:02:11 pm »
I know modern CPU do a lot of pre-fetch and prediction stuff (I don't know the details)

The above code, has for many people been faster. Even with on a modern CPU, I would expect it at worst to be of equal speed.

But then I am no expect on this topic.

Out of interest which modern cpu feature did you think of?
- "branch prediction" should not be affected (the branch at the end of the loop stays the same)
- data-fetching/caching/cache lines should also be the same, as the same data, as accessed in the same order.




lagprogramming

  • Sr. Member
  • ****
  • Posts: 405
Re: performance problem with Free Pascal 2.6.x
« Reply #16 on: November 04, 2014, 10:58:46 pm »
@Martin_fr
   I'm not an expert either but regarding the presented code I've noticed:
1)If I change "For n:= 1 to Number do" to "For n:=0 to Number-1 do" and change "[n-1]" with "[n]" inside the loop
and
If I change "For k:= 1 to Number do" to "For k:=0 to Number-1 do" and change "[k-1]" with "[k]" inside the loop
then the code will be executed slower, although to me, it should run faster due to the removal of "-1"s. Probably it's due to the initial assignment at the beginning of the "for" loops, I didn't looked at the code produced.

2)Situations like the following one are known. If I completely remove variable Trapezoid and replace
Code: [Select]
ZHelp:= ZHelp + Trapezoid;with
Code: [Select]
ZHelp:= ZHelp + ((abs(ZArray[k-1]-ZArray[k])/2 + Min(ZArray[k-1], ZArray[k])) * (XValues[k]-XValues[k-1]));again, the code is executed slower.

3)As far as I remember I've been surprised to notice that declaring a new variable "XValuesdouble:double;"
and changing
Code: [Select]
For k:=0 to Number-1 do
    ZArray[k]:= ZValues[k] * A * exp(-Pi*power(A/1000*(XValues[k]-XValues[n]), 2));

with

Code: [Select]
XValuesdouble:=XValues[n];
For k:=0 to Number-1 do
    ZArray[k]:= ZValues[k] * A * exp(-Pi*power(A/1000*(XValues[k]-XValuesdouble), 2));

again lead to a decrease of code execution speed.

Using level 3 optimization level.
However, It's a very strange CPU I have here, many things don't work as expected.

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: performance problem with Free Pascal 2.6.x
« Reply #17 on: November 05, 2014, 12:22:22 am »
It may not be your CPU, if cou change variables, you may change, how fpc allocates registers.

And if an important value, no longer stays in a register, then execution speed drops.

As far the for loop, I would have to look at the generated code.

As for my example with the pointer, see the bug I linked, it was used as a fixed there, and worked well.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8747
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: performance problem with Free Pascal 2.6.x
« Reply #18 on: November 05, 2014, 10:08:41 am »
instead of
Code: [Select]
  for i = 0 to 100 do begin
    x := x + someArrayOfDouble[i];
  end;

do
Code: [Select]
  var p: ^Double; // pointer to array element type

  p := @someArrayOfDouble[0]; // point to first element
  for i = 0 to 100 do begin
    x := x + p^;
    inc(p); // point to next element
  end;

I am not sure, if and/or when fpc does optimize this, but it may not, on not always.:

In the first loop, the byte address of each array element has to be calculated as
  byteAddressOfDouble = index *sizeof(double)

In the 2nd loop, this multiplication is no longer needed.


   I understand your approach but you might have a surprise regarding modern processors. The second code proposed by you will run slower than the first one on many new processors.  I have this problem when trying to optimize loops for zero comparisons. It's very tricky. :)
No, the second one will always run faster ATM. FPC simply doesn't optimize such access yet (not even in trunk AFAIR). Moreover, for loops are complex. It's much slower than while / repeat - until. This should help more:
Code: [Select]
  var p,q: ^Double; // pointer to array element type

  p := @someArrayOfDouble[0]; // point to first element
  q := @someArrayOfDouble[100]; // point to last element
  while p <= q do begin
    x := x + p^;
    inc(p); // point to next element
  end;

« Last Edit: November 05, 2014, 10:11:31 am by Leledumbo »

lagprogramming

  • Sr. Member
  • ****
  • Posts: 405
Re: performance problem with Free Pascal 2.6.x
« Reply #19 on: November 05, 2014, 05:41:30 pm »
   Because it might be related to the subject, the project attached shows me the following values:
qq1 31251 ms.
qq2 21762 ms.
qq1_nested 22134 ms.
qq2_nested 31257 ms.
   Something has to be wrong because qq1 is the same as qq1_nested and qq2 is the same as qq2_nested.
Am I missing something regarding the code? If not, do you have the similar results?
   I might be too tired now. %)

prino

  • New Member
  • *
  • Posts: 14
Re: performance problem with Free Pascal 2.6.x
« Reply #20 on: November 09, 2014, 05:19:39 pm »
A side note: If I'm right that:
abs(A-B)/2 + Min(A, B)

equals:
(A+B)/2
Sadly, that's not always true. The first equation will not overflow for very large integers, the second will!

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 9794
  • Debugger - SynEdit - and more
    • wiki
Re: performance problem with Free Pascal 2.6.x
« Reply #21 on: November 09, 2014, 05:42:01 pm »
   Something has to be wrong because qq1 is the same as qq1_nested and qq2 is the same as qq2_nested.
Am I missing something regarding the code? If not, do you have the similar results?
   I might be too tired now. %)

Use -al as compiler option, and compare generated assembler

Btw, they are not the same: Nested functions have a hidden parameter containing a pointer to the stackframe of the outer func/proc. Even so this is not used, it is present, and therefore it may affect how registers are allocated.
Plus your measuring includes the time to call the procedure, and that also includes passing the hidden param.

engkin

  • Hero Member
  • *****
  • Posts: 3112
Re: performance problem with Free Pascal 2.6.x
« Reply #22 on: November 10, 2014, 01:08:51 am »
A side note: If I'm right that:
abs(A-B)/2 + Min(A, B)

equals:
(A+B)/2
Sadly, that's not always true. The first equation will not overflow for very large integers, the second will!
Prino, welcome to Lazarus/FPC forum.  :)

Although A and B in this case are of type double, you still have a valid point. With integer numbers in general, would it be different if B were a negative number? Don't you think that A-B would overflow, if A and B were of different signs?

lagprogramming

  • Sr. Member
  • ****
  • Posts: 405
Re: performance problem with Free Pascal 2.6.x
« Reply #23 on: November 10, 2014, 05:17:54 pm »
   Something has to be wrong because qq1 is the same as qq1_nested and qq2 is the same as qq2_nested.
Am I missing something regarding the code? If not, do you have the similar results?
   I might be too tired now. %)

Use -al as compiler option, and compare generated assembler

Btw, they are not the same: Nested functions have a hidden parameter containing a pointer to the stackframe of the outer func/proc. Even so this is not used, it is present, and therefore it may affect how registers are allocated.
Plus your measuring includes the time to call the procedure, and that also includes passing the hidden param.

qq1 31251 ms.
qq2 21762 ms.
qq1_nested 22134 ms.
qq2_nested 31257 ms.

Thank you. It appears it's not related to the subject so I won't continue, however:
1) The assembler code appears fine.
2) The idea wasn't why the nested procedures were slower, but why one of them was much faster(qq1_nested).
3) Most probably I've hit the "slow if then else" anomaly again. I think it may be related to the alignment(memory addresses) of the functions/procedures, conditional jumps...  :-\ I don't know exactly...

 

TinyPortal © 2005-2018