Forum > Unix

[SOLVED] Pascal performance on polynomial benchmark slower than expected

(1/6) > >>

botster:
Hi All,

Being a Linux user, I have been working with Gambas (BASIC) which employs a visual GUI designer as Lazarus does. It is a nice app, but it does not do cross-platform compiling. So, I have recently took another look at Lazarus.

I became curious how a Free Pascal compiled program would perform against a Gambas 'just in time' compiled program. I thought it would blow Gambas out of the water, so to speak. It did not.

Please understand that I am not trying to be a troll. I just think there is something in my system that makes an FPC compiled program not perform as it should. And, I am hoping someone might be able to help me find what that is.

Here's what I did. I took the Gambas polynomial benchmark program from http://gambaswiki.org/wiki/doc/benchmark/polynom and converted it to Pascal, compiled it with `fpc polynom.pas`, and then timed its execution with the Linux 'time' command (`time ./polynom`).

The time information of the output was:
real    0m38.576s
user    0m20.965s
sys     0m0.063s

For the Gambas program, executed with `time gbs3 -f -c polynom.gambas`, the time info was:
real    0m17.093s
user    0m9.604s
sys     0m0.066s

Over twice as fast as the pre-compiled Pascal program. The '-f' option invokes the Just-In-Time compiler, and the '-c' option ignores the compile cache to force a compile. So, the time for the Gambas program includes compile time.

Now, what makes me think there is something wrong on my system is that another user on the Gambas user list (using FPC 2.6.2-8 [2014/01/22] for x86_64 -- older than mine) reported times of 5.376s and 4.172s for Pascal and Gambas, respectively -- showing Pascal to be only marginally slower; not two times slower.

Yes, I have a slow system:
Intel(R) Pentium(R) 4 CPU 2.40GHz, 1G RAM
Mageia 3 (Linux), Kernel 3.10.54, KDE4 Desktop
Free Pascal Compiler version 2.6.4 [2014/03/07] for i386
Gambas 3.5.4

Here's the Pascal program:

--- Code: ---program Polynom;
{$mode objfpc}

var
  z : integer;

function DoIt(x : double) : double;
  var Mu : double = 10.0;
  var Pu, Su : double;
  var I, J, N : integer;
  var aPoly : array [0..99] of double;

begin
  N := 500000;
  Pu := 0;

  For I := 0 To N-1 do
  begin
    For J := 0 To 99 do
    begin
      Mu :=  (Mu + 2.0) / 2.0;
      aPoly[J] := Mu;
    end;
    Su := 0.0;
    For J := 0 To 99 do
    begin
      Su := X * Su + aPoly[J];
    end;
    Pu := Pu + Su;
  end;

  DoIt := Pu;
end;

Begin

For z := 1 To 10 do
begin
writeln( DoIt(0.2) );
end;

End.

--- End code ---

I could not attach my "fpc.cfg" file. So I've included it here:

--- Code: ---#
# Config file generated by fpcmkcfg on 27-9-14 - 03:58:23
# Example fpc.cfg for Free Pascal Compiler
#

# ----------------------
# Defines (preprocessor)
# ----------------------

#
# nested #IFNDEF, #IFDEF, #ENDIF, #ELSE, #DEFINE, #UNDEF are allowed
#
# -d is the same as #DEFINE
# -u is the same as #UNDEF
#

#
# Some examples (for switches see below, and the -? helppages)
#
# Try compiling with the -dRELEASE or -dDEBUG on the commandline
#

# For a release compile with optimizes and strip debuginfo
#IFDEF RELEASE
  -O2
  -Xs
  #WRITE Compiling Release Version
#ENDIF

# For a debug version compile with debuginfo and all codegeneration checks on
#IFDEF DEBUG
  -gl
  -Crtoi
  #WRITE Compiling Debug Version
#ENDIF

# assembling
#ifdef darwin
# use pipes instead of temporary files for assembling
-ap
# path to Xcode 4.3+ utilities (no problem if it doesn't exist)
-FD/Applications/Xcode.app/Contents/Developer/usr/bin
#endif

# ----------------
# Parsing switches
# ----------------

# Pascal language mode
#      -Mfpc      free pascal dialect (default)
#      -Mobjfpc   switch some Delphi 2 extensions on
#      -Mdelphi   tries to be Delphi compatible
#      -Mtp       tries to be TP/BP 7.0 compatible
#      -Mgpc      tries to be gpc compatible
#      -Mmacpas   tries to be compatible to the macintosh pascal dialects
#
# Turn on Object Pascal extensions by default
#-Mobjfpc

# Assembler reader mode
#      -Rdefault  use default assembler
#      -Ratt      read AT&T style assembler
#      -Rintel    read Intel style assembler
#
# All assembler blocks are AT&T styled by default
#-Ratt

# Semantic checking
#      -S2        same as -Mobjfpc
#      -Sc        supports operators like C (*=,+=,/= and -=)
#      -Sa        include assertion code.
#      -Sd        same as -Mdelphi
#      -Se<x>     error options. <x> is a combination of the following:
#         <n> : compiler stops after <n> errors (default is 1)
#         w   : compiler stops also after warnings
#         n   : compiler stops also after notes
#         h   : compiler stops also after hints
#      -Sg        allow LABEL and GOTO
#      -Sh        Use ansistrings
#      -Si        support C++ styled INLINE
#      -Sk        load fpcylix unit
#      -SI<x>     set interface style to <x>
#         -SIcom    COM compatible interface (default)
#         -SIcorba  CORBA compatible interface
#      -Sm        support macros like C (global)
#      -So        same as -Mtp
#      -Sp        same as -Mgpc
#      -Ss        constructor name must be init (destructor must be done)
#      -Sx        enable exception keywords (default in Delphi/ObjFPC modes)
#
# Allow goto, inline, C-operators, C-vars
-Sgic

# ---------------
# Code generation
# ---------------

# Uncomment the next line if you always want static/dynamic units by default
# (can be overruled with -CD, -CS at the commandline)
#-CS
#-CD

# Set the default heapsize to 8Mb
#-Ch8000000

# Set default codegeneration checks (iocheck, overflow, range, stack)
#-Ci
#-Co
#-Cr
#-Ct

# Optimizer switches
# -Os        generate smaller code
# -Oa=N      set alignment to N
# -O1        level 1 optimizations (quick optimizations, debuggable)
# -O2        level 2 optimizations (-O1 + optimizations which make debugging more difficult)
# -O3        level 3 optimizations (-O2 + optimizations which also may make the program slower rather than faster)
# -Oo<x>     switch on optimalization x. See fpc -i for possible values
# -OoNO<x>   switch off optimalization x. See fpc -i for possible values
# -Op<x>     set target cpu for optimizing, see fpc -i for possible values

#ifdef darwin
#ifdef cpui386
-Cppentiumm
-Oppentiumm
#endif
#endif

# -----------------------
# Set Filenames and Paths
# -----------------------

# Both slashes and backslashes are allowed in paths

# path to the messagefile, not necessary anymore but can be used to override
# the default language
#-Fr/usr/lib/fpc/$fpcversion/msg/errore.msg
#-Fr/usr/lib/fpc/$fpcversion/msg/errorn.msg
#-Fr/usr/lib/fpc/$fpcversion/msg/errores.msg
#-Fr/usr/lib/fpc/$fpcversion/msg/errord.msg
#-Fr/usr/lib/fpc/$fpcversion/msg/errorr.msg

# searchpath for units and other system dependent things
-Fu/usr/lib/fpc/$fpcversion/units/$fpctarget
-Fu/usr/lib/fpc/$fpcversion/units/$fpctarget/*
-Fu/usr/lib/fpc/$fpcversion/units/$fpctarget/rtl

#IFDEF FPCAPACHE_1_3
-Fu/usr/lib/fpc/$fpcversion/units/$fpctarget/httpd13/
#ELSE
#IFDEF FPCAPACHE_2_0
-Fu/usr/lib/fpc/$fpcversion/units/$fpctarget/httpd20
#ELSE
-Fu/usr/lib/fpc/$fpcversion/units/$fpctarget/httpd22
#ENDIF
#ENDIF

# searchpath for fppkg user-specific packages
-Fu~/.fppkg/lib/fpc/$fpcversion/units/$FPCTARGET/*

# path to the gcclib
#ifdef cpui386
-Fl/usr/lib/gcc/i586-mageia-linux-gnu/4.7.2
#endif
#ifdef cpux86_64
-Fl/usr/lib/gcc/i586-mageia-linux-gnu/4.7.2
#endif

# searchpath for libraries
#-Fl/usr/lib/fpc/$fpcversion/lib
#-Fl/lib;/usr/lib
-Fl/usr/lib/fpc/$fpcversion/lib/$FPCTARGET

# searchpath for tools
-FD/usr/lib/fpc/$fpcversion/bin/$FPCTARGET

#IFNDEF CPUI386
#IFNDEF CPUAMD64
#DEFINE NEEDCROSSBINUTILS
#ENDIF
#ENDIF

#IFNDEF Linux
#DEFINE NEEDCROSSBINUTILS
#ENDIF

# binutils prefix for cross compiling
#IFDEF FPC_CROSSCOMPILING
#IFDEF NEEDCROSSBINUTILS
  -XP$FPCTARGET-
#ENDIF
#ENDIF


# -------------
# Linking
# -------------

# generate always debugging information for GDB (slows down the compiling
# process)
#      -gc        generate checks for pointers
#      -gd        use dbx
#      -gg        use gsym
#      -gh        use heap trace unit (for memory leak debugging)
#      -gl        use line info unit to show more info for backtraces
#      -gv        generates programs tracable with valgrind
#      -gw        generate dwarf debugging info
#
# Enable debuginfo and use the line info unit by default
#-gl

# always pass an option to the linker
#-k-s

# Always strip debuginfo from the executable
-Xs


# -------------
# Miscellaneous
# -------------

# Write always a nice FPC logo ;)
-l

# Verbosity
#      e : Show errors (default)       d : Show debug info
#      w : Show warnings               u : Show unit info
#      n : Show notes                  t : Show tried/used files
#      h : Show hints                  s : Show time stamps
#      i : Show general info           q : Show message numbers
#      l : Show linenumbers            c : Show conditionals
#      a : Show everything             0 : Show nothing (except errors)
#      b : Write file names messages   r : Rhide/GCC compatibility mode
#          with full path              x : Executable info (Win32 only)
#      v : write fpcdebug.txt with     p : Write tree.log with parse tree
#          lots of debugging info
#
# Display Info, Warnings and Notes
-viwn
# If you don't want so much verbosity use
#-vw

--- End code ---

I have searched both the web and this forum for optimizations related to floating point numbers, but came up with nothing useful.

Thank you for any clues to guide me.

Lee

Blaazen:
Try to compile with higher level of optimizations:

--- Code: ---fpc polynom.pas -O3
--- End code ---

Blaazen:
Also, try replace

--- Code: ---... / 2.0;
--- End code ---
with

--- Code: ---... * 0.5;
--- End code ---
Maybe it is done automatically, I'm not sure.

serbod:
Lazarus 1.2.4 with FPC 2.6.4
(Win7-32, Intel Duo T5250, 2Gb RAM)

Your code in default GUI application (Create new.. application)
time: 00:00:10.681

no debugging, added -O3 optimizations
time: 00:00:22.319  (Why?!)

-Or (use register variables), {$MAXFPUREGISTERS 5}
time: 00:00:10.602  (Why?)

Looking at assembler - variables not in registers, even cycle counter.

Upd: * 0.5 instead of / 2.0
time: 00:00:10.371

jarto:
Tested on Linux Mint with Intel(R) Core(TM) i5 CPU       U 470  @ 1.33GHz

fpc 2.6.4, 64 bit, -O3, smart linking, no debug:
real   0m6.324s
user   0m6.319s
sys   0m0.000s

fpc 2.7.1, 64 bit, -O3, smart linking, no debug:
real   0m5.818s
user   0m5.808s
sys   0m0.004s

fpc 2.7.1, 64 bit, -O3, smart linking, no debug, a few tweaks to the code:
real   0m5.165s
user   0m5.159s
sys   0m0.004s

Almost always, if FPC performs really slow, you are compiling with debug on. Check if the compiled binary is about 35k or a lot bigger.

Navigation

[0] Message Index

[#] Next page

Go to full version