Recent

Author Topic: FreePascal 3.2 benchmark implementation with results comparing with C++ etc.  (Read 1795 times)

Rochus

  • Jr. Member
  • **
  • Posts: 50
In case you're interested, I implemented a FP version of the Are-we-fast-yet benchmark suite. Here is the source code and the report with the results so far: https://github.com/rochus-keller/Are-we-fast-yet/tree/main/FreePascal

Currently, only the microbenchmarks are implemented; the macrobenchmarks are work in progress. You still might want to have a look at the source code, especially the Storage.pas implementation.

The Are-we-fast-yet benchmark suite has its origins in scientific studies of the performance of various dynamic programming languages (see [1] and [2]). It can be used to compare different programming languages as well as different implementations of the same programming language. For someone who builds compilers, this is an important tool, e.g. to assess the effectiveness of optimization measures. Compared to other benchmark suites such as The Computer Language Benchmarks Game [3], the goal is not to win by any means, but to compare as fairly and objectively as possible.

[1] https://github.com/smarr/are-we-fast-yet
[2] https://stefan-marr.de/papers/dls-marr-et-al-cross-language-compiler-benchmarking-are-we-fast-yet/
[3] https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html


EDIT: I attached a report comparing all four optimization levels with unoptimized.
« Last Edit: August 12, 2023, 01:49:53 am by Rochus »

Rochus

  • Jr. Member
  • **
  • Posts: 50
Here is another interesting result. I added the numbers of the unoptimized C++ benchmark.

The report is attached.

As we can see, the unoptimized C++ is about 10% faster than the unoptimized FP implementation. On the other hand the optimized C++ is about 70% faster than the O4 optimized FP implementation.

The GCC 4.8.2 optimizer therefore causes a speedup factor 2.4, whereas the FPC 3.2.2 optimizer causes a factor 1.6.

Rochus

  • Jr. Member
  • **
  • Posts: 50
Here are even more interesting results. The above results - as documented by the referenced report - were measured on a Linux x86 machine (Intel Core Duo L9400 1.86GHz, 4GB RAM).

Meanwhile I repeated some of the measurements on a Linux x86_64 machine (12th Gen Intel Core i3-1215U, 1200 MHz, Windows 11, VMWare Workstation 17 Player, 4 Cores, 4GB RAM, Debian Linux 11 x86_64).

The report is attached.

As we can calculate from the number, the unoptimized C++ is about 20% faster than the unoptimized FP implementation (compared to 10% on x86). On the other hand the optimized C++ is about 80% faster than the O4 optimized FP implementation (compared to 70% on x86).

The GCC 10.2.1 optimizer causes a speedup factor 1.9 (compared to 2.4 with GCC 4.8 on x86), whereas the FPC 3.2.2 optimizer causes a factor 1.4 (compared to 1.6 on x86).

It is also interesting to note that on x86 the optimized C++ version runs 34% faster than on x64 when comparing to LuaJIT. Also the optimized FP version runs 40% faster on x86 than on x64 compared to LuaJIT.

Rochus

  • Jr. Member
  • **
  • Posts: 50
Here is another interesting finding concerning the performance of memory management and dynamic arrays.

My first implementation of Storage.pas, which achieved an average time of 4373us when compiled with -O4 and run on my x86 test machine, used the following data structure:

      
Code: Pascal  [Select][+][-]
  1.                 TreePtr = ^Tree;
  2.                 TreeList = array of TreePtr;
  3.                 Tree = object
  4.                                 sub: TreeList;
  5.                                 constructor init;
  6.                                 destructor deinit;
  7.                            end;

Tree.deinit called dispose for all subs. When I remove the constructor and destructor and do deletions elsewhere, the average time goes down to 3800us; so the constructor/destructor calls are responsible for a ~13% speed-down.

When I only allocate without deleting anything, the average time goes down to 2300us; so the iteration over all Tree instances and calling dispose() on each causes a speed-down of ~65%!

In the C++ version I had an array of Tree, where Tree was embedded by value in the array, thus didn't require additional allocations/deallocations per sub element. Unfortunately I wasn't able to reproduce this with the Pascal syntax which didn't allow to use the type in the object by value. But fortunately user BeniBela demonstrated how to properly do this in FreePascal (see this thread: https://forum.lazarus.freepascal.org/index.php?topic=64275.0). So now I'm using this data structure:

      
Code: Pascal  [Select][+][-]
  1.                 Tree = object
  2.                                type TreeList = array of Tree;
  3.                            public
  4.                                sub: TreeList;
  5.                            end;

The trick apparently is a local type declaration. The result is an average time of 2514us (instead of 4373us, corresponding to an overall speed-up of ~3%). So the solution with a dynamic allocation per array element causes a speed-down of 51% compared to the solution where Tree is embedded in the array by value.

I also checked the performance of the approach recommended by user runewalsh where the array of Tree is replaced by ^Tree and allocated by GetMem(count * sizeof(Tree)). This approach is tricky and not ideomatic at all (because it disregards language support for the required feature altogether and corresponds rather to C than to Pascal style of programming). But it is understandable why it is used; the resulting average time for the same benchmark goes down to 1010us (instead of 2514us, corresponding to an overall speed-up of ~7%); the ideomatic dynamic array solution is therfore 2.5 times or 150% slower than the non-ideomatic low-level solution based on GetMem/FreeMem.

All in all, these microbenchmarks have already revealed areas where FreePascal wastes performance unnecessarily compared to C++, and where the compiler could still be improved.

If you have any comments or questions, send me an email or post an issue to https://github.com/rochus-keller/are-we-fast-yet/.
« Last Edit: August 19, 2023, 01:45:29 am by Rochus »

 

TinyPortal © 2005-2018