Recent

Author Topic: performance problem with Free Pascal 2.6.x  (Read 9375 times)

Muso

  • New Member
  • *
  • Posts: 10
performance problem with Free Pascal 2.6.x
« on: November 04, 2014, 02:23:57 pm »
As my old Delphi 7 does no longer work correctly with Win 7 profession 64bit, I switched to Free pascal with Lazarus. I encounter differences in the speed of the compiled programs. Programs using the same code are slower than with Delphi. Here is a simple routine:

Code: [Select]
procedure Filter(const A: double);
var ZHelp,Trapezoid : double;
    k,n, Number : integer : integer;
    ZArray,ZArrayFilter : array of double;
 
begin
 For n:= 1 to Number do
  begin
   ZHelp:= 0;
   For k:= 1 to Number do
    ZArray[k-1]:= ZValues[k-1] * A * exp(-Pi*power(A/1000*(XValues[k-1]-XValues[n-1]), 2));
   // integrate (trapezoidal rule)
   For k:= 1 to Number-1 do
   begin
    Trapezoid:= (abs(ZArray[k-1]-ZArray[k])/2 + Min(ZArray[k-1], ZArray[k]))
                * (XValues[k]-XValues[k-1]); //trapezoid  = triangle + rectangle
    ZHelp:= ZHelp + Trapezoid;
   end; //end for k
   ZArrayFilter[n-1]:= ZHelp / 1000;
 end; //end for n (Filter)
end;

My arrays have up to 30.000 values and Number is in the same region. For Number:=10800 Delphi needs 17 s while Free Pascal 2.6 needs 20 s (18% slower).  (I cannot compile using Delphi 7 anymore, so my times are hand-stopped.)

I read in this thread: http://free-pascal-general.1045716.n5.nabble.com/code-optimization-td2848157.html
that FreePascal is much slower than Delphi because it has problems if the array size is not a power of 2. But this info is from 2010 and Free pascal 2.2.

So my question is what I can do to make the code at least as fast as on Delphi 7? I already use the compiler options
-Mdelphi -O3
and also tried
-Cfsse2 and -Cfsse3
without any gain in speed. I also read that there is an optimization called "fastmath" but Lazarus doesn't provide such a compiler option.

Any ideas?
« Last Edit: November 04, 2014, 02:26:03 pm by Muso »

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7510
Re: performance problem with Free Pascal 2.6.x
« Reply #1 on: November 04, 2014, 02:32:42 pm »
Not the array size, but the ELEMENT size a power of two. But you seem to use doubles which have size 8, which is 2**3, so a power of 2.

The  posts you seem to indicate that Delphi does some form of strength reduction in such case to avoid repeated multiplications (I can't remember seeing D7 do that, but that is what it seems to say).

I don't really see anything that can be improved quickly, so you would probably have to compare the generated assembler.

Or just try the development version.


Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5717
    • wiki
Re: performance problem with Free Pascal 2.6.x
« Reply #2 on: November 04, 2014, 03:43:15 pm »
Look at:
http://bugs.freepascal.org/view.php?id=10275

Afaik fpc trunk optimizes more, but I do not know how much of this issue is covered.

One think you can try (if not solved otherwise):

ZArray[k] / ZArray[k-1]

before the loop:
ZArrayElementPointer = @ZArray[0]

in the loop
inc(ZArrayElementPointer);


marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7510
Re: performance problem with Free Pascal 2.6.x
« Reply #3 on: November 04, 2014, 03:49:36 pm »
Afaik SSA is still a private branch Florian works on from time to time.

Never

  • Sr. Member
  • ****
  • Posts: 409
  • OS:Win7 64bit / Lazarus 1.4
Re: performance problem with Free Pascal 2.6.x
« Reply #4 on: November 04, 2014, 04:12:16 pm »
replace all var / some_num with coresponting var * and see if this makes a dif
ex: A/1000=A*0,001 
Edit***: also [ A/1000 ] is a const for its place so calculate once before the for
« Last Edit: November 04, 2014, 04:30:48 pm by Never »
Νέπε Λάζαρε λάγγεψων οξωκά ο φίλοσ'ς αραεύσε

wp

  • Hero Member
  • *****
  • Posts: 6370
Re: performance problem with Free Pascal 2.6.x
« Reply #5 on: November 04, 2014, 04:43:54 pm »
Quote
As my old Delphi 7 does no longer work correctly with Win 7 profession 64bi
I don't want to advertise Delphi here, but I have D7 running on Win 7-64bit, and it is running fine. The only thing to take care of is not to install it to Program Files, but to any other folder to which you have write access - I have it in C:\Delphi7.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7510
Re: performance problem with Free Pascal 2.6.x
« Reply #6 on: November 04, 2014, 04:49:20 pm »
Quote
As my old Delphi 7 does no longer work correctly with Win 7 profession 64bi
I don't want to advertise Delphi here, but I have D7 running on Win 7-64bit, and it is running fine. The only thing to take care of is not to install it to Program Files, but to any other folder to which you have write access - I have it in C:\Delphi7.

And install the help supplement for the .hlp help?

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: performance problem with Free Pascal 2.6.x
« Reply #7 on: November 04, 2014, 05:07:37 pm »
A side note: If I'm right that:
abs(A-B)/2 + Min(A, B)

equals:
(A+B)/2

This part from the second inner loop:
Code: [Select]
    Trapezoid:= (abs(ZArray[k-1]-ZArray[k])/2 + Min(ZArray[k-1], ZArray[k]))
                * (XValues[k]-XValues[k-1]); //trapezoid  = triangle + rectangle

Could be changed to:
Code: [Select]
    Trapezoid:= (ZArray[k-1]+ZArray[k])/2
                * (XValues[k]-XValues[k-1]); //trapezoid  = triangle + rectangle

wp

  • Hero Member
  • *****
  • Posts: 6370
Re: performance problem with Free Pascal 2.6.x
« Reply #8 on: November 04, 2014, 05:09:01 pm »
Quote
help supplement for the .hlp help
Yes, if I remember correctly hlp files don't work in Win7 out of the box, but MS has WinHlp32 as a separate download (http://www.microsoft.com/en-us/download/details.aspx?id=91), I guess that's what you mean by "help supplement". Working perfectly.
Lazarus trunk / fpc 3.0.4 / all 32-bit on Win-10

Never

  • Sr. Member
  • ****
  • Posts: 409
  • OS:Win7 64bit / Lazarus 1.4
Re: performance problem with Free Pascal 2.6.x
« Reply #9 on: November 04, 2014, 06:02:16 pm »
@engkin ***thumbs up***
Νέπε Λάζαρε λάγγεψων οξωκά ο φίλοσ'ς αραεύσε

Muso

  • New Member
  • *
  • Posts: 10
Re: performance problem with Free Pascal 2.6.x
« Reply #10 on: November 04, 2014, 06:28:13 pm »
Quote
abs(A-B)/2 + Min(A, B)
equals:
(A+B)/2

Oh, stupid me. Many thanks!!! Now the whole routine is more than 10 % faster. (is now as fast as Delphi with the old code (hand stopped))

Quote
[ A/1000 ] is a const

Thanks! I have overseen this.

Quote
running on Win 7-64bit, and it is running fine. The only thing to take care of is not to install it to Program Files
MS has WinHlp32 as a separate download

many thanks!!! Now I can use Delphi under Win 7.

Summary: I have learned how to improve the logic of the coding and how to run Delphi under Win 7 but Free pascal is still slower than Delphi. Maybe there is something one can do?

I will nevertheless stay with Lazarus/Free pascal. the IDE is more comfortable in my opinion and the killer feature for me is the platform independence.

Many thanks again to all people who helped.

Muso

  • New Member
  • *
  • Posts: 10
Re: performance problem with Free Pascal 2.6.x
« Reply #11 on: November 04, 2014, 06:31:14 pm »
One think you can try (if not solved otherwise):

ZArray[k] / ZArray[k-1]

before the loop:
ZArrayElementPointer = @ZArray[0]

in the loop
inc(ZArrayElementPointer);

Thanks. I must admit that I don't understand what you mean. (I never worked yet with pointers in Pascal) How would the resulting code look and why might a pointer be a better solution?

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 5717
    • wiki
Re: performance problem with Free Pascal 2.6.x
« Reply #12 on: November 04, 2014, 07:03:27 pm »
instead of
Code: [Select]
  for i = 0 to 100 do begin
    x := x + someArrayOfDouble[i];
  end;

do
Code: [Select]
  var p: ^Double; // pointer to array element type

  p := @someArrayOfDouble[0]; // point to first element
  for i = 0 to 100 do begin
    x := x + p^;
    inc(p); // point to next element
  end;

I am not sure, if and/or when fpc does optimize this, but it may not, on not always.:

In the first loop, the byte address of each array element has to be calculated as
  byteAddressOfDouble = index *sizeof(double)

In the 2nd loop, this multiplication is no longer needed.

engkin

  • Hero Member
  • *****
  • Posts: 2513
Re: performance problem with Free Pascal 2.6.x
« Reply #13 on: November 04, 2014, 07:19:09 pm »
@Never, thanks!

@Musa, your code is fine and easy to understand. That's how I write mine. I know am not answering your question directly, but still any speed gained here might help in general, so I hope you don't mind.

The first inner loop can accept some change, as well. For instance, instead of Power in:
Code: [Select]
    ZArray[k-1]:= ZValues[k-1] * A * exp(-Pi*power(A/1000*(XValues[k-1]-XValues[n-1]), 2));

you can use IntPower for integer exponent values. Power and IntPower have the same accuracy:
Code: [Select]
    ZArray[k-1]:= ZValues[k-1] * A * exp(-Pi*intpower(A/1000*(XValues[k-1]-XValues[n-1]), 2));

In case of 2, I think a simple multiplication should be faster with (or without?) optimization.
Code: [Select]
  Ad1000 := A/1000;
..
    deltaX := Ad1000*(XValues[k-1]-XValues[n-1]);
    ZArray[k-1]:= ZValues[k-1] * A * exp(-Pi*deltaX*deltaX);
But it does not have the same accuracy.

lagprogramming

  • Full Member
  • ***
  • Posts: 159
Re: performance problem with Free Pascal 2.6.x
« Reply #14 on: November 04, 2014, 08:05:34 pm »
instead of
Code: [Select]
  for i = 0 to 100 do begin
    x := x + someArrayOfDouble[i];
  end;

do
Code: [Select]
  var p: ^Double; // pointer to array element type

  p := @someArrayOfDouble[0]; // point to first element
  for i = 0 to 100 do begin
    x := x + p^;
    inc(p); // point to next element
  end;

I am not sure, if and/or when fpc does optimize this, but it may not, on not always.:

In the first loop, the byte address of each array element has to be calculated as
  byteAddressOfDouble = index *sizeof(double)

In the 2nd loop, this multiplication is no longer needed.


   I understand your approach but you might have a surprise regarding modern processors. The second code proposed by you will run slower than the first one on many new processors.  I have this problem when trying to optimize loops for zero comparisons. It's very tricky. :)