Recent

Author Topic: Threaded application in MacOS not faster as in Windows and Linux  (Read 5580 times)

han

  • Full Member
  • ***
  • Posts: 137
I have added threads to my application to speed up. In Windows and Linux it runs now about two times faster compared to a single thread version but not in MacOS. For an Intel Mac the processing speed is about the same. And for a M-processor Mac I get reports it is much slower then a single thread version.

Furthermore the System.CPUCount indicates one cpu in the debugger but the activity monitor indicated more 6 or 7 threads.

Has anybody an idea what could cause this poor performance and how to fix this? If have tried including cmem in the .lpr file but it doesn't help.

Thaddy

  • Hero Member
  • *****
  • Posts: 19241
  • Glad to be alive.
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #1 on: March 11, 2025, 08:41:46 am »
What fpc version are you on?
If you happen to be on trunk you can test with this:
Code: Pascal  [Select][+][-]
  1. program simplethread2.pas;
  2. {$mode objfpc}{$I-}
  3. {$modeswitch anonymousfunctions}
  4. uses {$ifdef unix}cthreads,{$endif}sysutils,classes;
  5. var
  6.   Sync:IReadWriteSync;
  7.   WorkerThread1,
  8.   WorkerThread2,
  9.   ControllerThread:TThread;
  10.   start:int64;
  11. begin
  12.   Sync:=TMultiReadExclusiveWriteSynchronizer.create;
  13.   Sync.BeginWrite;
  14.   start := gettickcount64;
  15.   Sync.EndWrite;
  16.   writeln('Program start time saved');
  17.   WorkerThread1:=TThread.executeinthread(procedure
  18.                              var
  19.                                s:string = 'Hey Main, I am still busy....';
  20.                                i:integer;
  21.                              begin
  22.                                for i := 0 to 100000 do if i mod 100 = 0 then
  23.                                begin
  24.                                  Sync.BeginWrite;
  25.                                  writeln(s);
  26.                                  Sync.EndWrite;
  27.                                end;                              
  28.                                Sync.BeginRead;
  29.                                writeln('WorkerThread1 done',  ' took:', gettickcount64 - start);
  30.                                Sync.EndRead;
  31.                              end);
  32.   WorkerThread2:=TThread.executeinthread(procedure
  33.                              var
  34.                                s:string = 'Hey, One,Still chatting?....';
  35.                                i:integer;
  36.                              begin
  37.                                for i := 0 to 100000 do if i mod 100 = 0 then
  38.                                begin
  39.                                  Sync.BeginWrite;
  40.                                  writeln(s);
  41.                                  Sync.EndWrite;
  42.                                end;                              
  43.                                Sync.BeginRead;
  44.                                writeln('WorkerThread2 done',  ' took:', gettickcount64 - start);
  45.                                Sync.EndRead;
  46.                              end);
  47.   ControllerThread:=TThread.executeinthread(procedure
  48.                              begin
  49.                                writeln('Test the ControllerThread');
  50.                                                            WorkerThread1.waitfor;
  51.                                WorkerThread2.waitfor;                          
  52.                                                            Sync.BeginRead;
  53.                                writeln('ControllerThread done, handing over to main', ' took:', gettickcount64 - start);
  54.                                Sync.EndRead;
  55.                              end);
  56.   writeln('perform main workload');                          
  57.   sleep(5000);
  58.   writeln('Main workload finished', ' took:', gettickcount64 - start);
  59.   if not ControllerThread.finished then ControllerThread.waitfor;
  60.   writeln('Main program finished in ', gettickcount64 - start);    
  61. end.
Result should be that the main thread is finished in equal time or slightly more than the longest running thread.
It should be much faster on Apple M's or any Linux based AARCH64 compared to Windows.

This test code uses exactly 4 threads: hardware cores.
If you use more threads than cores the hardware supports you can get results like you described.
The sync primitives are not strictly necessary here.
« Last Edit: March 11, 2025, 08:51:34 am by Thaddy »
objects are fine constructs. You can even initialize them with constructors.

cdbc

  • Hero Member
  • *****
  • Posts: 2807
    • http://www.cdbc.dk
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #2 on: March 11, 2025, 09:28:14 am »
Hi Thaddy
Nice little example there, Me Likey  8)
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

han

  • Full Member
  • ***
  • Posts: 137
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #3 on: March 11, 2025, 12:55:27 pm »
Thanks Thaddy,

I have tested your program but it didn't show the difference between threaded and unthreaded. I  noticed now my problem is caused by the function System.CPUCount returning 1 on my Mac. I got an suggestion to use the little unit lazarus/components/multithreadprocs/mtpcpu.pas for detection of logical processors and now i get a gain of 2 for 4 logical processors.  The second mistake I made was to add the X86_64 executable in my ARM installer.   :(

If there something else then unit mtpcpu I could use?

If System.CPUCount is not available  in MacOS then why is this function not excluded such that you get an compile error?

Attached my test program

Update: For Windows I get reported 12 processor from both unit MTPCPU and system.CPU. So both seem to report logical processors of my 6 core AMD Ryzen.
Secondly I noted that system.CPU does not work on a native Linux computer but in my Linux virtual machine running in VMPLayer under Windows it reports 2 logical processors. Weird.
« Last Edit: March 11, 2025, 02:23:56 pm by han »

Thaddy

  • Hero Member
  • *****
  • Posts: 19241
  • Glad to be alive.
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #4 on: March 11, 2025, 03:31:36 pm »
Thanks Thaddy,

I have tested your program but it didn't show the difference between threaded and unthreaded.
It should be performing twice the tasks as the threadless version in the same time and on my very recent (2024) Mac M4 series mini, it does.
It is also at least 10 times faster than similarly clocked Windows and on a par with Linux.
I use that code for benchmarking in a slightly different guise.

Btw: I can reproduce the core count issue. That seems a bug to me.
objects are fine constructs. You can even initialize them with constructors.

han

  • Full Member
  • ***
  • Posts: 137
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #5 on: March 11, 2025, 04:03:43 pm »
Quote
Btw: I can reproduce the core count issue. That seems a bug to me
.

Should a so called "Issue" be raised?

Thaddy

  • Hero Member
  • *****
  • Posts: 19241
  • Glad to be alive.
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #6 on: March 11, 2025, 04:56:17 pm »
I need to research it a bit, but if only one CPU is reported, we usually mean cores and we have 10 of them. (Well, 4 main+6 workers)
objects are fine constructs. You can even initialize them with constructors.

cdbc

  • Hero Member
  • *****
  • Posts: 2807
    • http://www.cdbc.dk
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #7 on: March 11, 2025, 04:57:00 pm »
Hi
The core count issue seems to be fixed in trunk... At least in my Linux flavour.
eta:
3.2.2 reports 1, due to fallback function.
3.3.1 reports 4, due to GetCpuCount being implemented...

Regards Benny
« Last Edit: March 11, 2025, 04:59:05 pm by cdbc »
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

PascalDragon

  • Hero Member
  • *****
  • Posts: 6396
  • Compiler Developer
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #8 on: March 11, 2025, 08:57:47 pm »
If System.CPUCount is not available  in MacOS then why is this function not excluded such that you get an compile error?

Because code expects that property to exist and thus an implementation that returns a sane default value (in this case 1) is better than code not compiling. Cause you can be very sure that the very next bug report would be “CPUCount missing on platform $XYZ”.

The core count issue seems to be fixed in trunk... At least in my Linux flavour.
eta:
3.2.2 reports 1, due to fallback function.
3.3.1 reports 4, due to GetCpuCount being implemented...


For macOS it's not implemented. It needs someone with knowledge about macOS to implement it.

dbannon

  • Hero Member
  • *****
  • Posts: 3821
    • tomboy-ng, a rewrite of the classic Tomboy
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #9 on: March 12, 2025, 05:41:47 am »
han, I suggest, on the Mac, you assume 4 cores.  That will give you a useful speed up on a system with lots of cores. Most real threaded code does not scale particularly well beyond 4 anyway.

Its unlikely your app will be used on a one or two core Mac, but if it is, it will still run, almost as fast as if you had chosen '1'. Its a guess but probably a reasonably safe one.

Davo
Lazarus 3, Linux (and reluctantly Win10/11, OSX Monterey)
My Project - https://github.com/tomboy-notes/tomboy-ng and my github - https://github.com/davidbannon

han

  • Full Member
  • ***
  • Posts: 137
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #10 on: March 12, 2025, 08:35:13 am »
Thanks all for the feedback.

Quote
Because code expects that property to exist and thus an implementation that returns a sane default value (in this case 1) is better than code not compiling. Cause you can be very sure that the very next bug report would be “CPUCount missing on platform $XYZ”.
For me I would have expected at least a compiler warning/hint message "system.CPU not implemented and returns always 1" Then it can be addressed rather then that it stays hidden.

Quote
For macOS it's not implemented. It needs someone with knowledge about macOS to implement it.
I tried latest trunk and it is not working for MacOS. Since 2008 it is available in unit lazarus/components/multithreadprocs/mtpcpu.pas. I will raise an issue to request implementation.

Quote
han, I suggest, on the Mac, you assume 4 cores.  That will give you a useful speed up on a system with lots of cores. Most real threaded code does not scale particularly well beyond 4 anyway.
I have considered that but up to now my code seems very scalable up to at least 10 processors handling slices of the same image array. The processing speed seems to improve linear about 0.4 * number of CPU's. I'm sure somewhere in the process I will hit limitations but haven't seen it.



han

  • Full Member
  • ***
  • Posts: 137
« Last Edit: March 12, 2025, 08:25:59 pm by han »

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12893
  • FPC developer.
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #12 on: March 12, 2025, 09:31:54 am »
The automatic detection should be overridable in the first place.   New processors with mixed types of cores are coming out all the time. (including with intel with two I/O die LP cores)

han

  • Full Member
  • ***
  • Posts: 137
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #13 on: March 12, 2025, 09:40:43 am »
The automatic detection should be overridable in the first place.   New processors with mixed types of cores are coming out all the time. (including with intel with two I/O die LP cores)

Do you suggest the current detection will not work properly for some new processors?

marcov

  • Administrator
  • Hero Member
  • *
  • Posts: 12893
  • FPC developer.
Re: Threaded application in MacOS not faster as in Windows and Linux
« Reply #14 on: March 12, 2025, 03:02:48 pm »
The automatic detection should be overridable in the first place.   New processors with mixed types of cores are coming out all the time. (including with intel with two I/O die LP cores)

Do you suggest the current detection will not work properly for some new processors?

I don't know the exact status. But I do know that new processors come out during a release cycle, and that it is easier to simply override if necessary rather than wait for a new release.

 

TinyPortal © 2005-2018