Recent

Author Topic: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)  (Read 4013 times)

Akira1364

  • Hero Member
  • *****
  • Posts: 539
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

I made a modified version of the multi-threaded version originally posted in this thread by Nitorami:
http://forum.lazarus.freepascal.org/index.php?topic=39935.0

and submitted it after clarifying with them that it would be ok to do so.

Edit: as I've stated in another comment below, I came up with a further-modified version after this one and sent it in, and FPC is now at first place!
« Last Edit: March 09, 2019, 12:41:17 am by Akira1364 »

dbannon

  • Hero Member
  • *****
  • Posts: 756
    • tomboy-ng, a rewrite of the classic Tomboy
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #1 on: March 08, 2019, 02:12:39 am »
Way, way up  indeed !

Nice One !

 :D

Davo
Lazarus 2, Linux (and reluctantly Win10, OSX)
My Project - https://github.com/tomboy-notes/tomboy-ng

Thaddy

  • Hero Member
  • *****
  • Posts: 9190
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #2 on: March 08, 2019, 09:26:02 am »
Looking at the cores there is still a bit room to improve even a bit more. Good job. Have you tried installing another memory manager as well? Also note there are some nice x86_64 optimizations in trunk.
I have a feeling that many more "competition" examples for FPC are less than optimal.
Again: good job!
also related to equus asinus.

silvestre

  • New Member
  • *
  • Posts: 43
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #3 on: March 08, 2019, 11:15:00 am »
¬°Freepascal power ;) !

 It would be great to add the latest optimizations in compilation...The first position is very near! Great work and good press for this community.
« Last Edit: March 08, 2019, 12:11:35 pm by silvestre »

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7508
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #4 on: March 08, 2019, 11:21:51 am »
Looking at the cores there is still a bit room to improve even a bit more. Good job. Have you tried installing another memory manager as well? Also note there are some nice x86_64 optimizations in trunk.
I have a feeling that many more "competition" examples for FPC are less than optimal.
Again: good job!

Most work was afaik done by eg. neli and some others before they switched it to allow explicit threading.

BrunoK

  • Full Member
  • ***
  • Posts: 190
  • Retired programmer
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #5 on: March 08, 2019, 11:43:16 am »
Tried to run the code as published in https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-fpascal-3.html,   

gives me :
Code: Pascal  [Select]
  1. Project BinaryTrees raised exception class 'External: SIGSEGV'.
  2.  
  3.  In file 'BinaryTrees.pas' at line 66:
  4. MakeTree(Depth, MP), MP),
  5.  

Linux :
  DISTRIB_DESCRIPTION="Linux Mint 19.1 Tessa"   

Windows gives the same error.

What can be wrong in my installation ?
Lazarus trunk r. 62137/27.10.2019 (+/- patches regarding TScrollBar, IntitalSetupDialog, Options.Environment options, SearchResults).  Lazarus 3.0.6 raw from svn.
FPC 3.0.4 32 bits. (+heaptrc with leaked ClassName+Revisited TList) , Windows 10 Pro x64 (v. 1903 / 18362.418)

Thaddy

  • Hero Member
  • *****
  • Posts: 9190
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #6 on: March 08, 2019, 11:51:24 am »
Mode?....< that's a bit of a sigh, I may change this to  grumpy mode >:D when that is the case....>
« Last Edit: March 08, 2019, 11:55:11 am by Thaddy »
also related to equus asinus.

BrunoK

  • Full Member
  • ***
  • Posts: 190
  • Retired programmer
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #7 on: March 08, 2019, 12:14:12 pm »
@Grumpy.

Did you actually compile and run the program.

If mode delphi is not specified it doesn't compile because of AdvancedRecords usage.

Still waiting for FreeAndNil badness article.
Lazarus trunk r. 62137/27.10.2019 (+/- patches regarding TScrollBar, IntitalSetupDialog, Options.Environment options, SearchResults).  Lazarus 3.0.6 raw from svn.
FPC 3.0.4 32 bits. (+heaptrc with leaked ClassName+Revisited TList) , Windows 10 Pro x64 (v. 1903 / 18362.418)

Akira1364

  • Hero Member
  • *****
  • Posts: 539
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #8 on: March 08, 2019, 02:54:08 pm »
If mode delphi is not specified it doesn't compile because of AdvancedRecords usage.

I originally had

Code: Pascal  [Select]
  1. {$mode ObjFPC}
  2. {$modeswitch AdvancedRecords}

in there. The Benchmarks game maintainer guy doesn't like in-source compiler directives for some reason though, so he changed it to use

Code: [Select]
-Mdelphi
instead. The -MDelphi is visible in the output at the bottom of the page, however.

What command line argument were you using with it to indicate

Code: [Select]
MaxDepth
though?

It would be great to add the latest optimizations in compilation...

I agree, but I doubt it would be possible to convince the maintainer to switch to an arbitrary FPC trunk build. So hopefully the next "stable" version comes out soonish...
« Last Edit: March 08, 2019, 03:07:54 pm by Akira1364 »

BrunoK

  • Full Member
  • ***
  • Posts: 190
  • Retired programmer
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #9 on: March 08, 2019, 04:08:32 pm »
@Akira1364  02:54:08 pm
Thank you. I had missed the parameter 21 (ParamCount). So it used the default 32.

Interesting "benchmark".

A case where Linux seems much faster than windows.

As in general -O1 + -OoREGVAR seems to be a satisfying optimization compromize.
Lazarus trunk r. 62137/27.10.2019 (+/- patches regarding TScrollBar, IntitalSetupDialog, Options.Environment options, SearchResults).  Lazarus 3.0.6 raw from svn.
FPC 3.0.4 32 bits. (+heaptrc with leaked ClassName+Revisited TList) , Windows 10 Pro x64 (v. 1903 / 18362.418)

Akira1364

  • Hero Member
  • *****
  • Posts: 539
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #10 on: March 08, 2019, 04:17:26 pm »
A case where Linux seems much faster than windows.

I actually consistently get around 1.8 seconds or so with an input of 21 on both Linux and Windows, using the same hardware (Haswell i7-4770k CPU.) This is with native 64-bit FPC for both, to be clear.

Note also that the machine the benchmarks run on for the site is somewhat outdated... (Core 2 Q6600 CPU.)
« Last Edit: March 08, 2019, 04:19:27 pm by Akira1364 »

Nitorami

  • Sr. Member
  • ****
  • Posts: 368
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #11 on: March 08, 2019, 05:38:32 pm »
@Akira: Thanks for submitting this, I guess I would have dragged it on forever...

A  note on the inflationary use of the inline modifier: It really only makes sense for very short leaf functions. Certainly not for MakeTree, which is used recursively and cannot be inlined in the first place.

The benefit of inlining on moden processors is dubious to me anyway. Sometimes it even makes code slower; this probably has to do with how the CPU optimises the program flow.

@Thaddy
Quote
I have a feeling that many more "competition" examples for FPC are less than optimal.

Not sure. I have tinkered with a few, and could not do a lot. An example is fannkuch-redux, where I may get 10%...20% off. The problem is that the effect of changes is often not predictable, it's try and error, and the result may be different on different systems.
The core of fannkuch is the flip function. This is another example where you get faster code without inlining. Putting some local variables on the heap rather than the stack also results in a small but disctinct improvement, but only on win64; on win32 the effect is exactly opposite. On Linux, it may again be different (cannot test this).
Thus, even after a lot of manual optimization, we cannot really tell how the code will perform on a different target system. The gcc compiler may simply be better in producing the best code on each specific system.... That said, I am talking about really small effects here, 10-20% up or down, not too far from the performance of gcc, and irrelevent for a real world program.

What's really poor is FPC's performance in the Mandelbrot benchmark, factor ten slower than gcc, and also much slower than many others, Rust, Swift, C#, .NET, Java, even LISP and Haskell. And it cannot be improved - at least I did not manage it - simply because FPC lacks support of vector processing, and gives away factor 8 in performance.





Akira1364

  • Hero Member
  • *****
  • Posts: 539
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #12 on: March 08, 2019, 05:59:17 pm »
@Akira: Thanks for submitting this, I guess I would have dragged it on forever...

No problem!

A  note on the inflationary use of the inline modifier: It really only makes sense for very short leaf functions. Certainly not for MakeTree, which is used recursively and cannot be inlined in the first place.

People say this often, but I'm not sure I really agree in many cases. The thing to keep in mind is that inlining does not actually mean just literally copying and pasting the assembler generated for the function body into different places.

In my experience, a lot of the time (perhaps most of the time), an inlined function looks absolutely nothing like what was generated for the standalone body as it gets optimized in the particular context (i.e. combined with what surrounds it and such.)

This often actually not only improves speed but results in smaller, not larger, executables overall I've found.

That said, about the MakeTree thing, yeah, I noticed it was pointless after I submitted it but it doesn't really matter either way as it just does nothing.

The benefit of inlining on moden processors is dubious to me anyway.

I really don't think it is. IMO FPC is way too conservative about it, if you compare what it typically generates (or even attempts to generate) with regards to inlining to just about any other "native code" compiler.

Especially the fact that "compilerprocs" (even ones not written in assembly) are never ever inlined is rather less than ideal, in my mind.
« Last Edit: March 08, 2019, 06:07:35 pm by Akira1364 »

Akira1364

  • Hero Member
  • *****
  • Posts: 539
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #13 on: March 08, 2019, 09:18:31 pm »
Ok, so, FPC is now number one! I came up with another revised version, and submitted it earlier today.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

One thing I noticed when making this second version, also: recursive functions are in fact inlinable, and can even inline themselves.
« Last Edit: March 09, 2019, 12:39:19 am by Akira1364 »

howardpc

  • Hero Member
  • *****
  • Posts: 3183
Re: FPC's Binary Trees score for "Benchmarks Game" just went way up! :)
« Reply #14 on: March 08, 2019, 09:35:41 pm »
Kudos to Akira1364 and FPC developers!