Forum > macOS / Mac OS X
Threaded application in MacOS not faster as in Windows and Linux
han:
I have added threads to my application to speed up. In Windows and Linux it runs now about two times faster compared to a single thread version but not in MacOS. For an Intel Mac the processing speed is about the same. And for a M-processor Mac I get reports it is much slower then a single thread version.
Furthermore the System.CPUCount indicates one cpu in the debugger but the activity monitor indicated more 6 or 7 threads.
Has anybody an idea what could cause this poor performance and how to fix this? If have tried including cmem in the .lpr file but it doesn't help.
Thaddy:
What fpc version are you on?
If you happen to be on trunk you can test with this:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program simplethread2.pas;{$mode objfpc}{$I-}{$modeswitch anonymousfunctions}uses {$ifdef unix}cthreads,{$endif}sysutils,classes;var Sync:IReadWriteSync; WorkerThread1, WorkerThread2, ControllerThread:TThread; start:int64;begin Sync:=TMultiReadExclusiveWriteSynchronizer.create; Sync.BeginWrite; start := gettickcount64; Sync.EndWrite; writeln('Program start time saved'); WorkerThread1:=TThread.executeinthread(procedure var s:string = 'Hey Main, I am still busy....'; i:integer; begin for i := 0 to 100000 do if i mod 100 = 0 then begin Sync.BeginWrite; writeln(s); Sync.EndWrite; end; Sync.BeginRead; writeln('WorkerThread1 done', ' took:', gettickcount64 - start); Sync.EndRead; end); WorkerThread2:=TThread.executeinthread(procedure var s:string = 'Hey, One,Still chatting?....'; i:integer; begin for i := 0 to 100000 do if i mod 100 = 0 then begin Sync.BeginWrite; writeln(s); Sync.EndWrite; end; Sync.BeginRead; writeln('WorkerThread2 done', ' took:', gettickcount64 - start); Sync.EndRead; end); ControllerThread:=TThread.executeinthread(procedure begin writeln('Test the ControllerThread'); WorkerThread1.waitfor; WorkerThread2.waitfor; Sync.BeginRead; writeln('ControllerThread done, handing over to main', ' took:', gettickcount64 - start); Sync.EndRead; end); writeln('perform main workload'); sleep(5000); writeln('Main workload finished', ' took:', gettickcount64 - start); if not ControllerThread.finished then ControllerThread.waitfor; writeln('Main program finished in ', gettickcount64 - start); end.Result should be that the main thread is finished in equal time or slightly more than the longest running thread.
It should be much faster on Apple M's or any Linux based AARCH64 compared to Windows.
This test code uses exactly 4 threads: hardware cores.
If you use more threads than cores the hardware supports you can get results like you described.
The sync primitives are not strictly necessary here.
cdbc:
Hi Thaddy
Nice little example there, Me Likey 8)
Regards Benny
han:
Thanks Thaddy,
I have tested your program but it didn't show the difference between threaded and unthreaded. I noticed now my problem is caused by the function System.CPUCount returning 1 on my Mac. I got an suggestion to use the little unit lazarus/components/multithreadprocs/mtpcpu.pas for detection of logical processors and now i get a gain of 2 for 4 logical processors. The second mistake I made was to add the X86_64 executable in my ARM installer. :(
If there something else then unit mtpcpu I could use?
If System.CPUCount is not available in MacOS then why is this function not excluded such that you get an compile error?
Attached my test program
Update: For Windows I get reported 12 processor from both unit MTPCPU and system.CPU. So both seem to report logical processors of my 6 core AMD Ryzen.
Secondly I noted that system.CPU does not work on a native Linux computer but in my Linux virtual machine running in VMPLayer under Windows it reports 2 logical processors. Weird.
Thaddy:
--- Quote from: han on March 11, 2025, 12:55:27 pm ---Thanks Thaddy,
I have tested your program but it didn't show the difference between threaded and unthreaded.
--- End quote ---
It should be performing twice the tasks as the threadless version in the same time and on my very recent (2024) Mac M4 series mini, it does.
It is also at least 10 times faster than similarly clocked Windows and on a par with Linux.
I use that code for benchmarking in a slightly different guise.
Btw: I can reproduce the core count issue. That seems a bug to me.
Navigation
[0] Message Index
[#] Next page