Forum > Unix
[SOLVED] Pascal performance on polynomial benchmark slower than expected
Hi All,
Being a Linux user, I have been working with Gambas (BASIC) which employs a visual GUI designer as Lazarus does. It is a nice app, but it does not do cross-platform compiling. So, I have recently took another look at Lazarus.
I became curious how a Free Pascal compiled program would perform against a Gambas 'just in time' compiled program. I thought it would blow Gambas out of the water, so to speak. It did not.
Please understand that I am not trying to be a troll. I just think there is something in my system that makes an FPC compiled program not perform as it should. And, I am hoping someone might be able to help me find what that is.
Here's what I did. I took the Gambas polynomial benchmark program from and converted it to Pascal, compiled it with `fpc polynom.pas`, and then timed its execution with the Linux 'time' command (`time ./polynom`).
The time information of the output was:
real 0m38.576s
user 0m20.965s
sys 0m0.063s
For the Gambas program, executed with `time gbs3 -f -c polynom.gambas`, the time info was:
real 0m17.093s
user 0m9.604s
sys 0m0.066s
Over twice as fast as the pre-compiled Pascal program. The '-f' option invokes the Just-In-Time compiler, and the '-c' option ignores the compile cache to force a compile. So, the time for the Gambas program includes compile time.
Now, what makes me think there is something wrong on my system is that another user on the Gambas user list (using FPC 2.6.2-8 [2014/01/22] for x86_64 -- older than mine) reported times of 5.376s and 4.172s for Pascal and Gambas, respectively -- showing Pascal to be only marginally slower; not two times slower.
Yes, I have a slow system:
Intel(R) Pentium(R) 4 CPU 2.40GHz, 1G RAM
Mageia 3 (Linux), Kernel 3.10.54, KDE4 Desktop
Free Pascal Compiler version 2.6.4 [2014/03/07] for i386
Gambas 3.5.4
Here's the Pascal program:
--- Code: ---program Polynom;
{$mode objfpc}
z : integer;
function DoIt(x : double) : double;
var Mu : double = 10.0;
var Pu, Su : double;
var I, J, N : integer;
var aPoly : array [0..99] of double;
N := 500000;
Pu := 0;
For I := 0 To N-1 do
For J := 0 To 99 do
Mu := (Mu + 2.0) / 2.0;
aPoly[J] := Mu;
Su := 0.0;
For J := 0 To 99 do
Su := X * Su + aPoly[J];
Pu := Pu + Su;
DoIt := Pu;
For z := 1 To 10 do
writeln( DoIt(0.2) );
--- End code ---
I could not attach my "fpc.cfg" file. So I've included it here:
--- Code: ---#
# Config file generated by fpcmkcfg on 27-9-14 - 03:58:23
# Example fpc.cfg for Free Pascal Compiler
# ----------------------
# Defines (preprocessor)
# ----------------------
# nested #IFNDEF, #IFDEF, #ENDIF, #ELSE, #DEFINE, #UNDEF are allowed
# -d is the same as #DEFINE
# -u is the same as #UNDEF
# Some examples (for switches see below, and the -? helppages)
# Try compiling with the -dRELEASE or -dDEBUG on the commandline
# For a release compile with optimizes and strip debuginfo
#WRITE Compiling Release Version
# For a debug version compile with debuginfo and all codegeneration checks on
#WRITE Compiling Debug Version
# assembling
#ifdef darwin
# use pipes instead of temporary files for assembling
# path to Xcode 4.3+ utilities (no problem if it doesn't exist)
# ----------------
# Parsing switches
# ----------------
# Pascal language mode
# -Mfpc free pascal dialect (default)
# -Mobjfpc switch some Delphi 2 extensions on
# -Mdelphi tries to be Delphi compatible
# -Mtp tries to be TP/BP 7.0 compatible
# -Mgpc tries to be gpc compatible
# -Mmacpas tries to be compatible to the macintosh pascal dialects
# Turn on Object Pascal extensions by default
# Assembler reader mode
# -Rdefault use default assembler
# -Ratt read AT&T style assembler
# -Rintel read Intel style assembler
# All assembler blocks are AT&T styled by default
# Semantic checking
# -S2 same as -Mobjfpc
# -Sc supports operators like C (*=,+=,/= and -=)
# -Sa include assertion code.
# -Sd same as -Mdelphi
# -Se<x> error options. <x> is a combination of the following:
# <n> : compiler stops after <n> errors (default is 1)
# w : compiler stops also after warnings
# n : compiler stops also after notes
# h : compiler stops also after hints
# -Sg allow LABEL and GOTO
# -Sh Use ansistrings
# -Si support C++ styled INLINE
# -Sk load fpcylix unit
# -SI<x> set interface style to <x>
# -SIcom COM compatible interface (default)
# -SIcorba CORBA compatible interface
# -Sm support macros like C (global)
# -So same as -Mtp
# -Sp same as -Mgpc
# -Ss constructor name must be init (destructor must be done)
# -Sx enable exception keywords (default in Delphi/ObjFPC modes)
# Allow goto, inline, C-operators, C-vars
# ---------------
# Code generation
# ---------------
# Uncomment the next line if you always want static/dynamic units by default
# (can be overruled with -CD, -CS at the commandline)
# Set the default heapsize to 8Mb
# Set default codegeneration checks (iocheck, overflow, range, stack)
# Optimizer switches
# -Os generate smaller code
# -Oa=N set alignment to N
# -O1 level 1 optimizations (quick optimizations, debuggable)
# -O2 level 2 optimizations (-O1 + optimizations which make debugging more difficult)
# -O3 level 3 optimizations (-O2 + optimizations which also may make the program slower rather than faster)
# -Oo<x> switch on optimalization x. See fpc -i for possible values
# -OoNO<x> switch off optimalization x. See fpc -i for possible values
# -Op<x> set target cpu for optimizing, see fpc -i for possible values
#ifdef darwin
#ifdef cpui386
# -----------------------
# Set Filenames and Paths
# -----------------------
# Both slashes and backslashes are allowed in paths
# path to the messagefile, not necessary anymore but can be used to override
# the default language
# searchpath for units and other system dependent things
# searchpath for fppkg user-specific packages
# path to the gcclib
#ifdef cpui386
#ifdef cpux86_64
# searchpath for libraries
# searchpath for tools
# binutils prefix for cross compiling
# -------------
# Linking
# -------------
# generate always debugging information for GDB (slows down the compiling
# process)
# -gc generate checks for pointers
# -gd use dbx
# -gg use gsym
# -gh use heap trace unit (for memory leak debugging)
# -gl use line info unit to show more info for backtraces
# -gv generates programs tracable with valgrind
# -gw generate dwarf debugging info
# Enable debuginfo and use the line info unit by default
# always pass an option to the linker
# Always strip debuginfo from the executable
# -------------
# Miscellaneous
# -------------
# Write always a nice FPC logo ;)
# Verbosity
# e : Show errors (default) d : Show debug info
# w : Show warnings u : Show unit info
# n : Show notes t : Show tried/used files
# h : Show hints s : Show time stamps
# i : Show general info q : Show message numbers
# l : Show linenumbers c : Show conditionals
# a : Show everything 0 : Show nothing (except errors)
# b : Write file names messages r : Rhide/GCC compatibility mode
# with full path x : Executable info (Win32 only)
# v : write fpcdebug.txt with p : Write tree.log with parse tree
# lots of debugging info
# Display Info, Warnings and Notes
# If you don't want so much verbosity use
--- End code ---
I have searched both the web and this forum for optimizations related to floating point numbers, but came up with nothing useful.
Thank you for any clues to guide me.
Try to compile with higher level of optimizations:
--- Code: ---fpc polynom.pas -O3
--- End code ---
Also, try replace
--- Code: ---... / 2.0;
--- End code ---
--- Code: ---... * 0.5;
--- End code ---
Maybe it is done automatically, I'm not sure.
Lazarus 1.2.4 with FPC 2.6.4
(Win7-32, Intel Duo T5250, 2Gb RAM)
Your code in default GUI application (Create new.. application)
time: 00:00:10.681
no debugging, added -O3 optimizations
time: 00:00:22.319 (Why?!)
-Or (use register variables), {$MAXFPUREGISTERS 5}
time: 00:00:10.602 (Why?)
Looking at assembler - variables not in registers, even cycle counter.
Upd: * 0.5 instead of / 2.0
time: 00:00:10.371
Tested on Linux Mint with Intel(R) Core(TM) i5 CPU U 470 @ 1.33GHz
fpc 2.6.4, 64 bit, -O3, smart linking, no debug:
real 0m6.324s
user 0m6.319s
sys 0m0.000s
fpc 2.7.1, 64 bit, -O3, smart linking, no debug:
real 0m5.818s
user 0m5.808s
sys 0m0.004s
fpc 2.7.1, 64 bit, -O3, smart linking, no debug, a few tweaks to the code:
real 0m5.165s
user 0m5.159s
sys 0m0.004s
Almost always, if FPC performs really slow, you are compiling with debug on. Check if the compiled binary is about 35k or a lot bigger.
[0] Message Index
[#] Next page