Recent

Author Topic: Apple M1 performance  (Read 3869 times)

ChrisR

  • Full Member
  • ***
  • Posts: 247
Apple M1 performance
« on: November 19, 2020, 11:13:22 pm »
For those who are interested, I wrote up a performance summary for the Apple M1. This includes both C code (pigz, niimath) and Lazarus code (MRIcroGL). I wrote from the perspective of a scientist working with brain imaging, where we work with large datasets that do not fit in cache.

https://github.com/neurolabusc/AppleSiliconForNeuroimaging

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 1058
Re: Apple M1 performance
« Reply #1 on: November 20, 2020, 12:19:50 am »
Could you try compiling MRIcroGL with the LLVM backend to compare the performance with FPC's internal code generator? See https://wiki.freepascal.org/LLVM for instructions. Use -ClvXcode-11.0 as the target LLVM version.

ChrisR

  • Full Member
  • ***
  • Posts: 247
Re: Apple M1 performance
« Reply #2 on: November 22, 2020, 09:10:55 pm »
Jonas-
 This is an excellent point. The FPC values provided use optimization (-O3) for x86-64 but disables optimization for the M1 (as the peephole optimizer still has some subtle bugs). I now include values for LLVM with -O3 optimization. The results here are sensational. Informally, I also explored FPC with -O3 as the bugs do not appear in the functions used by the benchmark, and found performance between FPC without optimization and LLVM with -O3.
 Thanks for your help getting FPC running. I will update the wiki with more hints on enabling LLVM on this new hardware.

Jonas Maebe

  • Hero Member
  • *****
  • Posts: 1058
Re: Apple M1 performance
« Reply #3 on: November 22, 2020, 10:19:40 pm »
I've added a warning on the LLVM wiki page that should solve the problem you encountered. FWIW, since the LLVM backend is also supported for macOS/x86-64 and Linux/x86-64, you can also try it there. I expect less big performance gains there, although that depends on how much LLVM is able to vectorise. You may want to use -Cfavx2 there as well (also when using plain FPC, although I don't expect too big of an impact there).

Edit: you may also want to compile the AArch64 version using the FPC code generator with the options -O2 -Oonopeephole (or -O3 -Oonopeephole, if you prefer). That will enable, a.o., register variables and CSE but disable the (buggy) peephole optimiser. That way you'll still get most performance gains that FPC has to offer (I doubt the peephole optimiser makes a big difference at this point). Another option you may want to try, both with and without LLVM, is -Oofastmath (it's not part of -O3).
« Last Edit: November 22, 2020, 11:35:39 pm by Jonas Maebe »

mr-highball

  • Full Member
  • ***
  • Posts: 233
    • Highball Github
Re: Apple M1 performance
« Reply #4 on: November 23, 2020, 04:16:08 am »
Interesting read 👍

jwdietrich

  • Hero Member
  • *****
  • Posts: 1232
    • formatio reticularis
Re: Apple M1 performance
« Reply #5 on: November 25, 2020, 09:30:46 am »
This is important and interesting. Please keep us updated.
function GetRandomNumber: integer; // xkcd.com
begin
  GetRandomNumber := 4; // chosen by fair dice roll. Guaranteed to be random.
end;

http://www.formatio-reticularis.de

Lazarus 2.2.6 | FPC 3.2.2 | PPC, Intel, ARM | macOS, Windows, Linux

MathMan

  • Sr. Member
  • ****
  • Posts: 325
Re: Apple M1 performance
« Reply #6 on: November 25, 2020, 11:05:43 am »
This is important and interesting. Please keep us updated.

It looks like the M1 is a very capable core indeed. There are first technical oriented usage statements - one more prominent example is the following statement from GMP lead developer Torbjörn Granlund here - https://gmplib.org/list-archives/gmp-discuss/2020-November/006600.html.

"The GMP project got a low-end Apple Mac Mini M1 in order to make sure GMP works for arm-macos systems.

We had a major surprise from the GMP performance of these CPUs!

No other CPU runs GMP this well.  Almost every inner loop runs at < 1 cycle/limb.  That inclues mpn_mul_1, but not the most important loop mpn_addmul_1.  And that is before any attempt at optmising things for the M1.

The 3.2 GHz M1 in our system takes the #2 spot in the GMPbench top-list. The #1 spot is an AMD Ryzen, but that runs ar 4.4 GHz.

..."

Regards,
MathMan

 

TinyPortal © 2005-2018