* * *

Author Topic: Integer Performance Test: Delphi 10.2 Tokyo outperforms Visual C++ and Visual C#  (Read 6212 times)

ykot

  • Full Member
  • ***
  • Posts: 136
Furthermore, recompiling this project from the same author also runs at around 730 ms for both 32-bit and 64-bit targets, still being 50% faster than Delphi. I guess the author(s) of published results are compiling for debug build or something.

marcov

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 5875
It's more to ease of use, I guess. "Modern" programmers think that managing threads (or processes, don't really care the backend) manually is cumbersome and time consuming, so if they have built-in solution they will prefer that regardless its overhead whatsoever they can't control, at least that's what my CTO thinks.

I know but even then you first need to have a problem that actually is actually divisible enough with large enough chunks to be sane.


I assume a smart implementation could make the chunks larger. (dividing 10000 parallel items into 10x 1000 items) But that requires a lot of support from the language (to recognize that items/chunks are computationally small and thus switching overhead is large)

p.s. My CEO thinks every problem can be solved by writing a very broad outline on the back of a beer coaster and then handing it to me, and then counter every resistance with cliches like "think positive". That doesn't work either :-)
« Last Edit: March 28, 2017, 08:50:39 pm by marcov »

Laksen

  • Hero Member
  • *****
  • Posts: 596
    • J-Software
The difference between x86_64 and i386 for fpc is because of the mod operation. It always calculates that as 64bit on x86_64 which is about 2-3 times slower than a 32bit division.

If you do (int64_t)x % (int64_t)y in gcc and clang you get the same performance decrease as well.

ykot

  • Full Member
  • ***
  • Posts: 136
Laksen, thanks for explaining it. I'm looking at the generated assembly:

Code: [Select]
addl $1,%r9d
movslq %ebx,%rax
movslq %r9d,%r8
cqto
idivq %r8
testq %rdx,%rdx
jne .Lj13

So it does seem to be using "idivq" instruction. However, if looking at the same code for x64 target from gcc/clang, they seem to generate the following:

Code: [Select]
        mov     eax, ebx
        cdq
        idiv    ecx
        test    edx, edx
        je      .L5

I suppose since they use "idiv" instead of "idivq", it might be faster, is there any way to tell FreePascal to do that?

P.S. Using (int64_t)x % (int64_t)y in gcc/clang still seem to be using "idiv" instead of "idivq"?

« Last Edit: March 29, 2017, 12:09:44 am by ykot »

ykot

  • Full Member
  • ***
  • Posts: 136
I've modified FreePascal source adding "Modulo" function:

Code: Pascal  [Select]
  1. {$AsmMode Intel}
  2. function Modulo(const X, Y: Integer): Integer; assembler;
  3. asm
  4.   mov eax, ecx
  5.   mov ecx, edx
  6.   cdq
  7.   idiv ecx
  8.   mov eax, edx
  9. end;
  10.  

Then in "IsPrime" function modifying "( x mod i = 0)" to "Modulo(x, i) = 0", the resulting timing on x64 platform is around 960 ms, slightly faster than Delphi (which doesn't seem to benefit from that Modulo function).

srcstorm

  • New member
  • *
  • Posts: 21
@ykot,

Why don't you do yourself a favor and install the latest version:
https://www.embarcadero.com/products/delphi/starter/promotional-download

Good news is RAD Studio 10.2 Tokyo Architect Trial contains literally everything, including Delphi Win64 compiler:
https://www.embarcadero.com/products/rad-studio/start-for-free

Nothing can be less important than your opinions. Running tests with older versions of compilers yields technically meaningless data. But maybe one can compare Delphi 10. 2 to 10.1, Visual C++ 2017 to 2015 to measure improvements.

After I tested your code, I uninstalled Starter Edition and installed the Trial so now we can see Win64 results too.

--- Win32 ---
C++ (Native) 2937
Delphi (Native) 2984
Delphi (Parallel) 2906
Lazarus (Native) 2859
Lazarus (MTProcs) 4593

--- Win64 ---
C++ (Native) 2914
Delphi (Native) 2828
Delphi (Parallel) 2844
Lazarus (Native) 3813
Lazarus (Native - modulo hack) 3016
Lazarus (MTProcs) 4296


Starter and Trial Win32 results are identical, so my previous test numbers are valid. As you see, Delphi Win64 compiler scales nicely over Win32.


hnb

  • Full Member
  • ***
  • Posts: 202
I suppose since they use "idiv" instead of "idivq", it might be faster, is there any way to tell FreePascal to do that?

Please use http://bugs.freepascal.org .
Checkout NewPascal initiative and donate beer - ready to use tuned FPC compiler + Lazarus for mORMot project

best regards,
Maciej Izak

ykot

  • Full Member
  • ***
  • Posts: 136
Nothing can be less important than your opinions.
Is that an insult? Because that blog's publication AND your post just sound like a hype marketing, but if one gets to bottom of it, the benchmarks actually show quite the opposite. The company must be desperate for sales...

srcstorm

  • New member
  • *
  • Posts: 21
@ykot,

Your opinions, prejudices, impressions have no value for me. Get lost.

ykot

  • Full Member
  • ***
  • Posts: 136
Your opinions, prejudices, impressions have no value for me. Get lost.
You must be burning from inside, but there's no reason for crying. What did you expect, that nobody would dare to verify your tests? Why don't you try to re-run the tests, but now enable proper optimization options in Visual Studio? Universe is full of surprises.

laciroeye

  • Newbie
  • Posts: 2
This version is no problem at all.

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus