Apple M1 performance

ChrisR

Full Member
Posts: 247

Apple M1 performance

« on: November 19, 2020, 11:13:22 pm »

For those who are interested, I wrote up a performance summary for the Apple M1. This includes both C code (pigz, niimath) and Lazarus code (MRIcroGL). I wrote from the perspective of a scientist working with brain imaging, where we work with large datasets that do not fit in cache.

https://github.com/neurolabusc/AppleSiliconForNeuroimaging

Logged

Jonas Maebe

Hero Member
Posts: 1058

Re: Apple M1 performance

« Reply #1 on: November 20, 2020, 12:19:50 am »

Could you try compiling MRIcroGL with the LLVM backend to compare the performance with FPC's internal code generator? See https://wiki.freepascal.org/LLVM for instructions. Use -ClvXcode-11.0 as the target LLVM version.

Logged

ChrisR

Full Member
Posts: 247

Re: Apple M1 performance

« Reply #2 on: November 22, 2020, 09:10:55 pm »

Jonas-
This is an excellent point. The FPC values provided use optimization (-O3) for x86-64 but disables optimization for the M1 (as the peephole optimizer still has some subtle bugs). I now include values for LLVM with -O3 optimization. The results here are sensational. Informally, I also explored FPC with -O3 as the bugs do not appear in the functions used by the benchmark, and found performance between FPC without optimization and LLVM with -O3.
Thanks for your help getting FPC running. I will update the wiki with more hints on enabling LLVM on this new hardware.

Logged

Jonas Maebe

Hero Member
Posts: 1058

Re: Apple M1 performance

« Reply #3 on: November 22, 2020, 10:19:40 pm »

I've added a warning on the LLVM wiki page that should solve the problem you encountered. FWIW, since the LLVM backend is also supported for macOS/x86-64 and Linux/x86-64, you can also try it there. I expect less big performance gains there, although that depends on how much LLVM is able to vectorise. You may want to use -Cfavx2 there as well (also when using plain FPC, although I don't expect too big of an impact there).

Edit: you may also want to compile the AArch64 version using the FPC code generator with the options -O2 -Oonopeephole (or -O3 -Oonopeephole, if you prefer). That will enable, a.o., register variables and CSE but disable the (buggy) peephole optimiser. That way you'll still get most performance gains that FPC has to offer (I doubt the peephole optimiser makes a big difference at this point). Another option you may want to try, both with and without LLVM, is -Oofastmath (it's not part of -O3).

« Last Edit: November 22, 2020, 11:35:39 pm by Jonas Maebe »

Logged

mr-highball

Full Member
Posts: 233

Re: Apple M1 performance

« Reply #4 on: November 23, 2020, 04:16:08 am »

Interesting read 👍

Logged

Cheers 🍻

-Highball

threading library - https://github.com/mr-highball/ezthreads
pas2js gui library - https://github.com/mr-highball/nyx
crypto trading - https://github.com/mr-highball/simplebot-support
json decorators - https://github.com/mr-highball/ezjson

jwdietrich

Hero Member
Posts: 1232

Re: Apple M1 performance

« Reply #5 on: November 25, 2020, 09:30:46 am »

This is important and interesting. Please keep us updated.

Logged

function GetRandomNumber: integer; // xkcd.com
begin
GetRandomNumber := 4; // chosen by fair dice roll. Guaranteed to be random.
end;

http://www.formatio-reticularis.de

Lazarus 2.2.6 | FPC 3.2.2 | PPC, Intel, ARM | macOS, Windows, Linux

MathMan

Sr. Member
Posts: 325

Re: Apple M1 performance

« Reply #6 on: November 25, 2020, 11:05:43 am »

Quote from: jwdietrich on November 25, 2020, 09:30:46 am

This is important and interesting. Please keep us updated.

It looks like the M1 is a very capable core indeed. There are first technical oriented usage statements - one more prominent example is the following statement from GMP lead developer Torbjörn Granlund here - https://gmplib.org/list-archives/gmp-discuss/2020-November/006600.html.

"The GMP project got a low-end Apple Mac Mini M1 in order to make sure GMP works for arm-macos systems.

We had a major surprise from the GMP performance of these CPUs!

No other CPU runs GMP this well. Almost every inner loop runs at < 1 cycle/limb. That inclues mpn_mul_1, but not the most important loop mpn_addmul_1. And that is before any attempt at optmising things for the M1.

The 3.2 GHz M1 in our system takes the #2 spot in the GMPbench top-list. The #1 spot is an AMD Ryzen, but that runs ar 4.4 GHz.

..."

Regards,
MathMan

Logged

Lazarus

Bookstore

Search

Recent

Author Topic: Apple M1 performance (Read 3901 times)

ChrisR

Apple M1 performance

Jonas Maebe

Re: Apple M1 performance

ChrisR

Re: Apple M1 performance

Jonas Maebe

Re: Apple M1 performance

mr-highball

Re: Apple M1 performance

jwdietrich

Re: Apple M1 performance

MathMan

Re: Apple M1 performance

	Computer Math and Games in Pascal (preview)
	Lazarus Handbook