Recent

Author Topic: [SOLVED] Pascal performance on polynomial benchmark slower than expected  (Read 18168 times)

Never

  • Sr. Member
  • ****
  • Posts: 409
  • OS:Win7 64bit / Lazarus 1.4
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #15 on: October 14, 2014, 01:31:05 pm »
Code: [Select]
      for J := 0 to 99 do begin
        Mu := (Mu + 2.0) / 2.0;<--------------This is a constant for each place of J calc outside of the loop replace  as told / 2.0 with * 0.5
       
      end;
Code: [Select]
program project1;
{$mode objfpc}

uses
  windows,
  interfaces,
  sysutils;

var
  z: integer;
  _start:longint;

  function DoIt(const x: double): double;
  var
    Mu: double = 10.0;
    Su: double;
    I, J, N: integer;
    var aP : array [0..99] of double;
  begin
    _start:=GetTickCount;
    N := 500000;
    result := 0;
    for J := 0 to 99 do begin
        Mu := (Mu + 2.0) * 0.5;
       aP[j]:=Mu;
    end;
    for I := 0 to N - 1 do begin
      Su := 0.0;
      for J := 0 to 99 do begin
       // Mu := (Mu + 2.0) / 2.0;
        Su := x * Su + aP[j];
      end;
      result := result + Su;
    end;
  end;

begin
  _start:=GetTickCount;
  for z := 1 to 10 do writeln(DoIt(0.2));
  writeln(inttostr(GetTickCount-_Start));
  readln;
end.

Edit ***:just calc it on p6200 1,8gh 2GB ram  result is 358
« Last Edit: October 14, 2014, 01:47:49 pm by Never »
Νέπε Λάζαρε λάγγεψων οξωκά ο φίλοσ'ς αραεύσε

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 670
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #16 on: October 14, 2014, 01:44:16 pm »
Now, what makes me think there is something wrong on my system is that another user on the Gambas user list (using FPC 2.6.2-8 [2014/01/22] for x86_64 -- older than mine) reported times of 5.376s and 4.172s for Pascal and Gambas, respectively -- showing Pascal to be only marginally slower; not two times slower.

Yes, I have a slow system:
Intel(R) Pentium(R) 4 CPU 2.40GHz, 1G RAM
Mageia 3 (Linux), Kernel 3.10.54, KDE4 Desktop
Free Pascal Compiler version 2.6.4 [2014/03/07] for i386
The difference in speed between what you and the other person observed is caused by the fact that FPC for x86-64 automatically uses the SSE unit for floating point code (due to the fact that all x86-64 CPUs support it), while FPC for i386 doesn't (FPC's the default target CPU is still the i386 itself).

Add the -Cfsse2 parameter (and possibly -Cppentium4, although I doubt that will change much), and you'll probably see a significant speedup on the i386.

botster

  • New Member
  • *
  • Posts: 18
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #17 on: October 14, 2014, 09:01:07 pm »
@engkin: Lol. Yes, that assembly version of the function DoIt definitely speeded it up! It isn't really an equivalent program. But, it was fun to try it out.

@Jonas Maebe:  You were right, targeting the CPU with "-Cppentium4" did not make much of a change, if any. The "-Cfsse2" compiler command-line option is equivalent to the in-code compiler directive "{$FPUTYPE SSE2}", isn't it? For testing, I like to create different source versions so I can keep track of which executable is which. So, I used the directive instead of the command-line option.

Using that directive did not increase performance over not using it; 15.552s and 15.583s, respectively. I think that may be due what http://www.freepascal.org/docs-html/prog/progsu92.html#x100-990001.3.9 says about the $E switch, "Under linux and most unix’es, the kernel takes care of the coprocessor support, so this switch is not necessary on those platforms." Wouldn't that also mean that targeting the co-processor would also not be necessary?


The biggest performance gain was obtained by changing the divide by 2 into a multiply by 0.5: from 21.306s down to 15.583s. (I had my browser running during my initial tests.:-[ ) And I also changed that in the Gambas program (which made no appreciable difference there), so it is still equivalent.

That brings the ratio to 1.37:1, whereas the other Gambas user's performance ratio would be 1.29:1. I think that's pretty darn close. And it could be that the difference is amplified on my system due to how relatively slow it is overall.

Thank you everyone. I have learned a few things about code optimization and compiler options.

I still have a couple of questions, though.

1. Why is floating point division slower than floating point multiplication? I'm not sure how it could be, but is it related to the direction of bit-shifting in the registers?

2. In the optimized code that DelphiFreak posted, the variable "result" was not declared in the "var" section prior to being used in the main body of function DoIt. FPC did not complain and the program executed without errors. Why is that? I thought that all vars had to be declared before use.
Lee

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 670
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #18 on: October 14, 2014, 09:59:31 pm »
@Jonas Maebe:  You were right, targeting the CPU with "-Cppentium4" did not make much of a change, if any. The "-Cfsse2" compiler command-line option is equivalent to the in-code compiler directive "{$FPUTYPE SSE2}", isn't it? For testing, I like to create different source versions so I can keep track of which executable is which. So, I used the directive instead of the command-line option.

Using that directive did not increase performance over not using it; 15.552s and 15.583s, respectively.
Did you also use -O2 at the same time?

Quote
I think that may be due what http://www.freepascal.org/docs-html/prog/progsu92.html#x100-990001.3.9 says about the $E switch, "Under linux and most unix’es, the kernel takes care of the coprocessor support, so this switch is not necessary on those platforms." Wouldn't that also mean that targeting the co-processor would also not be necessary?
No.

Quote
1. Why is floating point division slower than floating point multiplication? I'm not sure how it could be, but is it related to the direction of bit-shifting in the registers?
Dividing is more than just bit shifting with floating point (and even with integers, it's more than just bit shifting even when dividing by a power of two in case the nominator's base type is signed).

Quote
2. In the optimized code that DelphiFreak posted, the variable "result" was not declared in the "var" section prior to being used in the main body of function DoIt. FPC did not complain and the program executed without errors. Why is that? I thought that all vars had to be declared before use.
http://www.freepascal.org/docs-html/ref/refse81.html

botster

  • New Member
  • *
  • Posts: 18
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #19 on: October 15, 2014, 06:15:40 am »
@Jonas Maebe:  You were right, targeting the CPU with "-Cppentium4" did not make much of a change, if any. The "-Cfsse2" compiler command-line option is equivalent to the in-code compiler directive "{$FPUTYPE SSE2}", isn't it? For testing, I like to create different source versions so I can keep track of which executable is which. So, I used the directive instead of the command-line option.

Using that directive did not increase performance over not using it; 15.552s and 15.583s, respectively.
Did you also use -O2 at the same time?

No, I did not.

I just tested again with four different versions of the source. One with no optimization directives (only the "{$mode objfpc}" directive), one with "{$OPTIMIZATION LEVEL2}", another with "{$FPUTYPE SSE2}", and finally one with "{$FPUTYPE SSE2}{$OPTIMIZATION LEVEL2}".

The respective execution times for these was: 0m16.426s, 0m15.964s, 0m15.698s, & 0m16.080s.

I also ran the Gambas program again, and it completed in 0m12.090s. That's a ratio of about 1.33:1 which is real close to the 1.29:1 performance ratio reported by another Gambas user. I feel more confident, now, that there isn't something terribly wrong with my system. I just have to make sure my web browser isn't running if I want performance from other applications  ::)

Thank you for your help.
Lee

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 670
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #20 on: October 15, 2014, 10:19:16 am »
{$optimization level2} is the same as -Oolevel2, which does almost nothing by itself. Please literally use -O2 (or higher). There is no equivalent source level directive that you can use.

Leledumbo

  • Hero Member
  • *****
  • Posts: 8112
  • Programming + Glam Metal + Tae Kwon Do = Me
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #21 on: October 15, 2014, 10:55:09 am »
You make my hands itchy to test myself. My system is 64-bit Manjaro Linux using kernel 3.17 with KDE 4.14 desktop running on Core i5 4200u, 8 GB of RAM 1333 MHz. Here's a result of the latest code in this thread, modified to remove time calculation and I/O from the program:
Code: [Select]
program project1;

{$mode objfpc}

uses
  sysutils;

var
  z: integer;

  function DoIt(const x: double): double;
  var
    Mu: double = 10.0;
    Su: double;
    I, J, N: integer;
    var aP : array [0..99] of double;
  begin
    N := 500000;
    result := 0;
    for J := 0 to 99 do begin
        Mu := (Mu + 2.0) * 0.5;
       aP[j]:=Mu;
    end;
    for I := 0 to N - 1 do begin
      Su := 0.0;
      for J := 0 to 99 do begin
       // Mu := (Mu + 2.0) / 2.0;
        Su := x * Su + aP[j];
      end;
      result := result + Su;
    end;
  end;

begin
  for z := 1 to 10 do DoIt(0.2);
end.
Results:
Quote
$ fpc test.pas
Hint: End of reading config file /etc/fpc.cfg
Target OS: Linux for x86-64
Compiling test.pas
Linking test
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
35 lines compiled, 0.1 sec
1 hint(s) issued

$ time ./test
real    0m2.640s
user    0m2.637s
sys     0m0.000s

$ fpc -O2 test.pas
Hint: End of reading config file /etc/fpc.cfg
Target OS: Linux for x86-64
Compiling test.pas
Linking test
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
35 lines compiled, 0.1 sec
1 hint(s) issued

$ time ./test
real    0m1.358s
user    0m1.357s
sys     0m0.000s

$ fpc -O3 test.pas
Hint: End of reading config file /etc/fpc.cfg
Target OS: Linux for x86-64
Compiling test.pas
Linking test
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
35 lines compiled, 0.1 sec
1 hint(s) issued

$ time ./test
real    0m1.389s
user    0m1.387s
sys     0m0.000s

$ fpc -O4 test.pas
Hint: End of reading config file /etc/fpc.cfg                                                                                                                                                   
Target OS: Linux for x86-64                                                                                                                                                                     
Compiling test.pas                                                                                                                                                                               
Linking test                                                                                                                                                                                     
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?                                                                                                                     
35 lines compiled, 0.1 sec                                                                                                                                                                       
1 hint(s) issued                                                                                                                                                                                 

$ time ./test
real    0m1.386s
user    0m1.383s
sys     0m0.003s

$ fpc -O4 -Ooloopunroll test.pas
Hint: End of reading config file /etc/fpc.cfg
Target OS: Linux for x86-64
Compiling test.pas
Linking test
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
35 lines compiled, 0.1 sec
1 hint(s) issued

$ time ./test
real    0m1.417s
user    0m1.410s
sys     0m0.000s
I'm using trunk about a month of age. My conclusion is that other than nothing to -O2, further optimizations don't really affect much. The difference between -O's are negligible, and I run it while running Chrome with two tabs + MP3 player running.

Fred vS

  • Hero Member
  • *****
  • Posts: 1675
    • miXimum is the DJ's best friend
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #22 on: October 15, 2014, 01:52:18 pm »
Quote
http://www.freepascal.org/docs-html/ref/refse81.html

=>

Quote
Function MyFunction : Integer; 
begin 
  Exit(12); 
end;

Yep, many thanks for that tip that i did not know !  :-[

Fred
I use Lazarus 1.8.0 32/64 and FPC 3.0.3 32/64 on Linux Mint Mate 17 32/64, Windows 10, Windows 7 32/64, Windows XP 32,  FreeBSD 64 and Mac OS X Snow Leopard 32.
Widgetset: fpGUI, MSEgui, Win32, GTK2, Qt, Carbon.

https://github.com/fredvs

botster

  • New Member
  • *
  • Posts: 18
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #23 on: October 15, 2014, 09:11:40 pm »
{$optimization level2} is the same as -Oolevel2, which does almost nothing by itself. Please literally use -O2 (or higher). There is no equivalent source level directive that you can use.

Oh.

The documentation is confusing me.

http://www.freepascal.org/docs-html/prog/progsu58.html#x65-640001.2.58 speaks of LEVELx optimizations.

`fpc -?` says:
Quote
  -O<x>  Optimizations:
      -O-        Disable optimizations
      -O1        Level 1 optimizations (quick and debugger friendly)
      -O2        Level 2 optimizations (-O1 + quick optimizations)
      -O3        Level 3 optimizations (-O2 + slow optimizations)
      -Oa<x>=<y> Set alignment
      -Oo[NO]<x> Enable or disable optimizations, see fpc -i for possible values
      -Op<x>     Set target cpu for optimizing, see fpc -i for possible values
      -OW<x>     Generate whole-program optimization feedback for optimization <x>, see fpc -i for possible values
      -Ow<x>     Perform whole-program optimization <x>, see fpc -i for possible values
      -Os        Optimize for size rather than speed

So there are two, different sets of LEVELx optimizations?


http://www.freepascal.org/docs-html/prog/progsu58.html#x65-640001.2.58 also says, "This switch is also activated by the -Ooxxx command line switch."

From `fpc -?`:
Quote
-Oo[NO]<x> Enable or disable optimizations, see fpc -i for possible values

Yet `fpc -i` does not list any LEVELx option:
Quote
Supported Optimizations:
  REGVAR
  UNCERTAIN
  STACKFRAME
  PEEPHOLE
  ASMCSE
  LOOPUNROLL
  TAILREC
  CSE

http://www.freepascal.org/docs-html/prog/progsu58.html#x65-640001.2.58 also lists a FASTMATH option which, when I tried that, caused the compiler to abort saying it was an invalid option.

I find this confusing. So, should I primarily follow the local documentation ahead of the online documentation?


I ran four different tests this time using the code I orginally posted except with the modification of "Mu :=  (Mu + 2.0) / 2.0;" to "Mu :=  (Mu + 2.0) * 0.5;" which is the version I have been using ever since it was suggested. All tests used command-line options with no source level directives other than $MODE.

Quote
$ fpc polynom.pas
Free Pascal Compiler version 2.6.4 [2014/03/07] for i386
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling polynom.pas
Linking polynom
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
42 lines compiled, 0.1 sec

$ time ./polynom
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006

real    0m15.333s
user    0m14.450s
sys     0m0.002s


$ fpc -O2 polynom.pas
Free Pascal Compiler version 2.6.4 [2014/03/07] for i386
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling polynom.pas
Linking polynom
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
42 lines compiled, 0.1 sec

$ time ./polynom
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006

real    0m16.119s
user    0m15.425s
sys     0m0.007s


$ fpc -Cfsse2 polynom.pas
Free Pascal Compiler version 2.6.4 [2014/03/07] for i386
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling polynom.pas
Linking polynom
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
42 lines compiled, 0.1 sec

$ time ./polynom
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006

real    0m15.783s
user    0m14.911s
sys     0m0.006s


$ fpc -Cfsse2 -O2 polynom.pas
Free Pascal Compiler version 2.6.4 [2014/03/07] for i386
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling polynom.pas
Linking polynom
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?
42 lines compiled, 0.1 sec

$ time ./polynom
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006
 1.25000000000000E+006

real    0m8.789s
user    0m8.046s
sys     0m0.003s

Summary:

Time results with command-line optimization options:
None: 15.333s
"-O2": 16.119s
"-Cfsse2": 15.783s
"-Cfsse2 -O2": 8.789s

So, using either "-O2" or "-Cfsse2" alone does not do much. But, using the two together does indeed make a huge difference!

I guess I hadn't learned as much as I previously thought I had. :-[

Thank you!
« Last Edit: October 15, 2014, 09:14:44 pm by botster »
Lee

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 670
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #24 on: October 15, 2014, 09:39:00 pm »
{$optimization level2} is the same as -Oolevel2, which does almost nothing by itself. Please literally use -O2 (or higher). There is no equivalent source level directive that you can use.

Oh.

The documentation is confusing me.

http://www.freepascal.org/docs-html/prog/progsu58.html#x65-640001.2.58 speaks of LEVELx optimizations.
I've removed all mention of level1/2/3 from that page in the svn version of the documentation, and explicitly added that {$optimization KKK} is equivalent to -OoKKK.

Quote
So there are two, different sets of LEVELx optimizations?
The -O1/2/3 command line parameters enable various sets of -OoKKK options. Amongst those are -Oolevel1/2/3 in case of resp. -O1/2/3, but as mentioned before those options do very little by themselves. They are not mentioned in the help because they indeed would probably only confuse people.

Quote
http://www.freepascal.org/docs-html/prog/progsu58.html#x65-640001.2.58 also lists a FASTMATH option which, when I tried that, caused the compiler to abort saying it was an invalid option.
That's an option that will only be available in the next release, which somehow found its way into the documentation of the current release.

Quote
So, using either "-O2" or "-Cfsse2" alone does not do much. But, using the two together does indeed make a huge difference!
True, with the caveat "for this particular program".

botster

  • New Member
  • *
  • Posts: 18
Re: Pascal performance on polynomial benchmark slower than expected
« Reply #25 on: October 16, 2014, 12:15:20 am »
[True, with the caveat "for this particular program".

Right. I forgot about that part  ;-)

Thank you for the information and clarifications.

And I thank everyone for their assistance. You have all been very helpful.  :)
« Last Edit: October 16, 2014, 12:18:42 am by botster »
Lee

DelphiFreak

  • Full Member
  • ***
  • Posts: 246
    • Fresh sound.
Re: [SOLVED] Pascal performance on polynomial benchmark slower than expected
« Reply #26 on: October 17, 2014, 08:10:18 am »
Summary
"""""""""""""
You started with:

real    0m38.576s
user    0m20.965s
sys     0m0.063s

and ended with:

real    0m8.789s
user    0m8.046s
sys     0m0.003s

by changing some compiler options.

World is ok again :-) FPC is faster than Gambas.

Sam



Linux Mint 19.1, Lazarus 2.0, Windows 7&10, Delphi 7, Delphi 10.3 Rio

botster

  • New Member
  • *
  • Posts: 18
Re: [SOLVED] Pascal performance on polynomial benchmark slower than expected
« Reply #27 on: October 17, 2014, 09:22:08 am »
Summary
"""""""""""""
You started with:

real    0m38.576s
user    0m20.965s
sys     0m0.063s

and ended with:

real    0m8.789s
user    0m8.046s
sys     0m0.003s

by changing some compiler options.

World is ok again :-) FPC is faster than Gambas.

Sam

No, Sam, sorry. I started with real: 0m38.576s and got to real: 15.333s by closing my web browser when I ran the tests :-[. I got from 0m15.333s to 0m8.789s by changing some compiler options.

But not to worry, FPC is still faster than Gambas ;-)  (But not by much! :P )

 :-X
« Last Edit: October 17, 2014, 09:38:42 am by botster »
Lee

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 670
Re: [SOLVED] Pascal performance on polynomial benchmark slower than expected
« Reply #28 on: October 17, 2014, 09:41:47 am »
But not to worry, FPC is still faster than Gambas ;-)  (But not by much! :P )

That's to be expected on a program like this, if Gambas uses a good Just-in-Time compiler (which apparently it does). A program containing a single tight loop is the ideal case for Just-in-Time compilers, as you only have a little overhead (compiling a single code fragment once, or maybe twice in case you use incremental optimization) compared to the number of times the code is executed. If you have similar code that can be optimised better in the presence of profiling information, the JiT version can even easily outperform static compilation.