Lazarus

Miscellaneous => Other => Topic started by: srcstorm on March 27, 2017, 11:55:37 am

Title: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: srcstorm on March 27, 2017, 11:55:37 am

As you know Delphi 10.2 Tokyo has been released recently. An author of codingforspeed.com had a pre-release version of Delphi and published results of an integer performance comparison test some time ago:
https://codingforspeed.com/integer-performance-comparison-for-c-c-delphi (https://codingforspeed.com/integer-performance-comparison-for-c-c-delphi)

According to this test, latest version of Delphi performs better than Visual C++ on Win64 target. The test measures both single-thread and multi-thread performance, and Delphi has significant lead in both tests.

Since then the official release has been announced, and I installed Starter Edition. The source codes of this test are on Github (https://github.com/DoctorLai/CountingPrime), so I decided to give it a try. Meanwhile Visual Studio 2017 also released. Now we can see if C++ 2017 has any improvements.

Test setup:
RAD Studio 10.2 Starter, Delphi Version 25.0.26309.314
>> Supports only Win32 target.
Visual Studio Community 2017, Version 15.0.0+26228.9, .NET Framework Version 4.6.01586
>> Supports Win32 and Win64 targets for C++, only Any CPU for C#.
Lazarus 1.6.4 64-bit, SVN Revision 54278
>> Supports Win64 target, Win32 target needs add-on.

Desktop PC:
Windows 10 64-bit, Build 1607
AMD A10-7860K@3.8 GHz
16 GB 2133 MHz DDR3 RAM

I tried to use latest versions of development environments. On particular CPU I am using I got these results:

--- Any CPU ---
C# Serial 11344
C# Parallel 2921

--- Win32 ---
C++ Serial 11656
C++ Parallel 2937, 2984
Delphi Serial 11406
Delphi Parallel 3094, 2890, 2969
Lazarus Serial 11750
Lazarus Parallel 4656

--- Win64 ---
C++ Serial 11687
C++ Parallel 2922, 2969
Delphi Trial Serial 11203
Delphi Trial Parallel 3157, 2828, 2922
Lazarus Serial 11359
Lazarus Parallel 4250

Conclusion:
In Delphi, there were 3 different methods to implement concurrency for the tested calculation. In C++, the author came up with 2 methods. Although Delphi Starter doesn't support Win64 target, Win32 test alone showed a very promising result. Lazarus is also following Delphi closely. Parallel code didn't work in Lazarus so I only did serial test, and it is 2.8% faster than C++ on Win64 target. I attached the Lazarus code I used.

You can also post your test results, especially if you have 64-bit Delphi, so we can have a better idea of latest situation. If you like you can post results of other benchmarks like SciMark too. It would be nice if we have a broader perspective of how latest compiler versions are performing.

When it comes to performance, Pascal compilers compete each other. There is no other competition ;)

Edit:
Using the code suggested by ykot, multi-threading test for Lazarus was performed and results were updated. Compared to other products, the latest stable version of Lazarus has mediocre parallel processing speed.

Edit2:
I discovered that RAD Studio 10.2 Trial has Delphi Win64 compiler, so I added results for Delphi Win64.

You can download Delphi 10.2 Tokyo Starter here, it doesn't have a time limitation but it supports only Win32 target:
https://www.embarcadero.com/products/delphi/starter/promotional-download (https://www.embarcadero.com/products/delphi/starter/promotional-download)

RAD Studio 10.2 Tokyo Architect Trial includes Delphi Win64 compiler and all cross-compilers:
https://www.embarcadero.com/products/rad-studio/start-for-free (https://www.embarcadero.com/products/rad-studio/start-for-free)

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Thaddy on March 27, 2017, 12:27:51 pm

What were your compiler settings?

FPC has more options and when I ran the test compiled with:

fpc -CX -XXs -Sv -CfSSE41 -CpATHLON64 -OpATHLON64 -Mobjfpc -OoFASTMATH -O4 benchint.lpr

It was another 4% faster compared to the default fpc -CX -XXs -Mobjfpc -O2 benchint.lpr

On my (very slow AMD E-2500) laptop FPC did actually better (23844) than Berlin (24719) 8-)

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: srcstorm on March 27, 2017, 12:59:06 pm

@Thaddy,

This is not an FPC test. Only Lazarus is tested. Modern IDEs have Debug and Release profiles. We switch to Release mode on each IDE. No other setting is modified. So this is a test of what you get "out of the box".

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Thaddy on March 27, 2017, 01:20:03 pm

@srcstorm

It is not a Lazarus program.
Lazarus uses the FPC compiler.
These compiler settings can all be set in Lazarus. What a stupid remark. >:D >:D

The performance comes from the compiler, not from Lazarus. You are testing performance, NOT an editor.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: marcov on March 27, 2017, 01:39:33 pm

I never had a practical use for paralel for in ten years of programming. (while my apps are generally multithreading).

I always wonder a bit why people think it is so great? Anybody have real world examples?

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: HeavyUser on March 27, 2017, 05:55:46 pm

Quote from: marcov on March 27, 2017, 01:39:33 pm

I never had a practical use for paralel for in ten years of programming. (while my apps are generally multithreading).

I always wonder a bit why people think it is so great? Anybody have real world examples?

google search? the human genome project? SETI?

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Thaddy on March 27, 2017, 08:32:41 pm

Yeah, but you can donate a RPi for that and let it do it's job....

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 27, 2017, 09:46:11 pm

srcstorm, you should really be testing Delphi's TParallel against C++11 threads with lambdas, since that's what it is, not against OpenMP or other similar extensions, much less running it through CLI.

Also, any chance of throwing a comparison with FreePascal version compiled with -O4 to the mix?

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Thaddy on March 27, 2017, 09:58:17 pm

Quote from: ykot on March 27, 2017, 09:46:11 pm

Also, any chance of throwing a comparison with FreePascal version compiled with -O4 to the mix?

No, because he thinks Lazarus IS a compiler. Well, we all know that that's not the case....
And YES, because I did... 8)

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Leledumbo on March 27, 2017, 10:31:23 pm

Quote from: marcov on March 27, 2017, 01:39:33 pm

I never had a practical use for paralel for in ten years of programming. (while my apps are generally multithreading).

I always wonder a bit why people think it is so great? Anybody have real world examples?

It's more to ease of use, I guess. "Modern" programmers think that managing threads (or processes, don't really care the backend) manually is cumbersome and time consuming, so if they have built-in solution they will prefer that regardless its overhead whatsoever they can't control, at least that's what my CTO thinks.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 27, 2017, 10:49:12 pm

Made a benchmark in a real hurry of Delphi 10.1 x64 vs Visual Studio 2015 x64. Visual Studio x64 is actually faster. (running inside VM with 8 processors enabled on Linux Host, Core i7 6700K, 2400 Mhz DDR4 RAM, VMWare Player)

(edit) Updated the post, taking the source code for calculating prime numbers from this GitHub (https://github.com/DoctorLai/CountingPrime) page, that has been taken from OP article. This actually increases the gap between Delphi and MSVC, now the difference is bigger.

(edit2) Added FreePascal implementation using parallel procedures (http://wiki.freepascal.org/Parallel_procedures). The loop function, however, is not inlined so am not sure if this is the optimal approach, likely there is a better way of doing it.

I've executed each sample application 4 times and produced average times, the output is seen on screenshot (if you're not logged, attachments don't seem to show up).

Delphi project used "Release" (optimizations on), MSVC used /Ox /GL, FreePascal used -O4 -OoLoopUnroll -Sv.
Results so far:

Delphi 10.1 x64: 968.75 ms
FreePascal 3.1.1 (trunk): 2679.75 ms
Visual Studio 2015 x64: 721.25 ms

So MSVC seems to be around 34% faster than Delphi, whereas FreePascal source seems to have some room for optimizations (please feel free to adjust the source code).

I've updated attachments, including latest sources. Original post is quite surprising because Delphi's native compiler is getting rather old. I suppose that even more complex benchmarks would actually increase the gap. My guess is that OP has compared Delphi's TParallel test code (which is just a wrapper for native threads) against actual parallel languages, which provide much greater flexibility at the expense of some minimal overhead. This is not a fair comparison.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: srcstorm on March 28, 2017, 06:58:33 pm

Quote from: ykot on March 27, 2017, 10:49:12 pm

Made a benchmark in a real hurry of Delphi 10.1 x64 vs Visual Studio 2015 x64.

Delphi 10.1 and Visual C++ 2015 are things of the past. Did you steal them from a museum? Still, I adopted your code and updated first post.

The Lazarus project I used for multi-threading test is in the attachment.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Blestan on March 28, 2017, 07:38:24 pm

things from the past????????
delphi 10.2 tokyo is 10 days old:))))
it's very important that you learn to read and understand the readed text before you start to post/write :)))
hahahahah

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Martin_fr on March 28, 2017, 08:11:46 pm

Quote from: Thaddy on March 27, 2017, 01:20:03 pm

@srcstorm

It is not a Lazarus program.
Lazarus uses the FPC compiler.
These compiler settings can all be set in Lazarus. What a stupid remark. >:D >:D

The performance comes from the compiler, not from Lazarus. You are testing performance, NOT an editor.

You clearly mis-read his comment.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 28, 2017, 08:20:38 pm

I have written a FreePascal version of the test, but using native threads directly and recompiled same application with Delphi. Tests conditions, configuration and target is the same. I have attached updated sources for FreePascal, Delphi and MSVC. If you recompile each of them, please make sure to enable all optimization options: Release mode for Delphi, "-OoLoopUnroll -OoFastMath -Sv -CpCoreI -CfSSE42 -OpCoreI" for FreePascal and "-Ox -GL" for MSVC.

Still, I'm getting the following figures for x64 target:

FreePascal (native): ~2400 ms
FreePascal (MTProcs): ~2840 ms
Delphi (native): ~1030 ms
Delphi (TParallel class): ~1010 ms
MSVC: ~710 ms

Out of curiousity, for 32-bit target:

FreePascal (native): ~1030 ms
Delphi (TParallel class): ~1030 ms
MSVC: ~730 ms

Srcstorm, I'm not sure how really you are compiling the projects, but your benchmarks seem to be rather bogus - in both 32-bit and 64-bit tests, Delphi is roughly 50% slower than the corresponding Visual Studio compiled project in both Win32 and Win64 targets. Also, I doubt there have been any changes to Win32/Win64 Delphi compilers in Delphi 10.2 (in fact, likely since the release of Delphi XE 2), so performance tests are likely the same for both Delphi 10.1 and 10.2.

However, I still don't understand why FreePascal version is much slower for x64 target, even when using threads directly via "TThread" class. Perhaps the issue is actually in how "IsPrime" function gets compiled?

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 28, 2017, 08:45:06 pm

Furthermore, recompiling this project (https://github.com/DoctorLai/CountingPrime/blob/master/vs.cpp) from the same author also runs at around 730 ms for both 32-bit and 64-bit targets, still being 50% faster than Delphi. I guess the author(s) of published results are compiling for debug build or something.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: marcov on March 28, 2017, 08:47:34 pm

Quote from: Leledumbo on March 27, 2017, 10:31:23 pm

It's more to ease of use, I guess. "Modern" programmers think that managing threads (or processes, don't really care the backend) manually is cumbersome and time consuming, so if they have built-in solution they will prefer that regardless its overhead whatsoever they can't control, at least that's what my CTO thinks.

I know but even then you first need to have a problem that actually is actually divisible enough with large enough chunks to be sane.

I assume a smart implementation could make the chunks larger. (dividing 10000 parallel items into 10x 1000 items) But that requires a lot of support from the language (to recognize that items/chunks are computationally small and thus switching overhead is large)

p.s. My CEO thinks every problem can be solved by writing a very broad outline on the back of a beer coaster and then handing it to me, and then counter every resistance with cliches like "think positive". That doesn't work either :-)

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Laksen on March 28, 2017, 10:18:43 pm

The difference between x86_64 and i386 for fpc is because of the mod operation. It always calculates that as 64bit on x86_64 which is about 2-3 times slower than a 32bit division.

If you do (int64_t)x % (int64_t)y in gcc and clang you get the same performance decrease as well.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 29, 2017, 12:05:49 am

Laksen, thanks for explaining it. I'm looking at the generated assembly:

Code: [Select]

	addl	$1,%r9d
	movslq	%ebx,%rax
	movslq	%r9d,%r8
	cqto
	idivq	%r8
	testq	%rdx,%rdx
	jne	.Lj13

So it does seem to be using "idivq" instruction. However, if looking at the same code for x64 target from gcc/clang (https://godbolt.org/g/Tr49rh), they seem to generate the following:

Code: [Select]

        mov     eax, ebx
        cdq
        idiv    ecx
        test    edx, edx
        je      .L5

I suppose since they use "idiv" instead of "idivq", it might be faster, is there any way to tell FreePascal to do that?

P.S. Using (int64_t)x % (int64_t)y in gcc/clang still seem to be (https://godbolt.org/g/wN0l0J) using "idiv" instead of "idivq"?

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 29, 2017, 12:45:09 am

I've modified FreePascal source adding "Modulo" function:

Code: Pascal [Select][+]

{$AsmMode Intel}
function Modulo(const X, Y: Integer): Integer; assembler;
asm
  mov eax, ecx
  mov ecx, edx
  cdq
  idiv ecx
  mov eax, edx
end;
 

Then in "IsPrime" function modifying "( x mod i = 0)" to "Modulo(x, i) = 0", the resulting timing on x64 platform is around 960 ms, slightly faster than Delphi (which doesn't seem to benefit from that Modulo function).

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: srcstorm on March 30, 2017, 11:46:48 am

@ykot,

Why don't you do yourself a favor and install the latest version:
https://www.embarcadero.com/products/delphi/starter/promotional-download (https://www.embarcadero.com/products/delphi/starter/promotional-download)

Good news is RAD Studio 10.2 Tokyo Architect Trial contains literally everything, including Delphi Win64 compiler:
https://www.embarcadero.com/products/rad-studio/start-for-free (https://www.embarcadero.com/products/rad-studio/start-for-free)

Nothing can be less important than your opinions. Running tests with older versions of compilers yields technically meaningless data. But maybe one can compare Delphi 10. 2 to 10.1, Visual C++ 2017 to 2015 to measure improvements.

After I tested your code, I uninstalled Starter Edition and installed the Trial so now we can see Win64 results too.

--- Win32 ---
C++ (Native) 2937
Delphi (Native) 2984
Delphi (Parallel) 2906
Lazarus (Native) 2859
Lazarus (MTProcs) 4593

--- Win64 ---
C++ (Native) 2914
Delphi (Native) 2828
Delphi (Parallel) 2844
Lazarus (Native) 3813
Lazarus (Native - modulo hack) 3016
Lazarus (MTProcs) 4296

Starter and Trial Win32 results are identical, so my previous test numbers are valid. As you see, Delphi Win64 compiler scales nicely over Win32.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: hnb on March 30, 2017, 12:53:01 pm

Quote from: ykot on March 29, 2017, 12:05:49 am

I suppose since they use "idiv" instead of "idivq", it might be faster, is there any way to tell FreePascal to do that?

Please use http://bugs.freepascal.org .

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 30, 2017, 05:58:48 pm

Quote from: srcstorm on March 30, 2017, 11:46:48 am

Nothing can be less important than your opinions.

Is that an insult? Because that blog's publication AND your post just sound like a hype marketing, but if one gets to bottom of it, the benchmarks actually show quite the opposite. The company must be desperate for sales...

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: srcstorm on March 30, 2017, 07:05:03 pm

@ykot,

Your opinions, prejudices, impressions have no value for me. Get lost.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: ykot on March 31, 2017, 03:52:44 pm

Quote from: srcstorm on March 30, 2017, 07:05:03 pm

Your opinions, prejudices, impressions have no value for me. Get lost.

You must be burning from inside, but there's no reason for crying. What did you expect, that nobody would dare to verify your tests? Why don't you try to re-run the tests, but now enable proper optimization options in Visual Studio? Universe is full of surprises.

Title: Re: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#
Post by: Akira1364 on January 10, 2022, 08:13:31 pm

This is an old thread, but I want to note that this bit in the FPC source file:

Code: Pascal [Select][+]

{$IFDEF UNIX}
cthreads, cmem,
{$ENDIF}

should have been this:

Code: Pascal [Select][+]

cmem,
{$IFDEF UNIX}
cthreads,
{$ENDIF}

While CThreads is Unix-specific, the memory manager implementation in CMem is completely cross-platform, and it makes an absolutely massive positive performance difference for multi-threaded applications on Windows as well. Basically always use CMem when compiling an application with FPC that makes a lot of use of multi-threading, regardless of your platform.