Recent

Author Topic: Multithreading isn't faster, than singlethreading?  (Read 1535 times)

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Multithreading isn't faster, than singlethreading?
« on: February 22, 2020, 08:55:58 pm »
I have simple program. It queries current state inside critical section and then processes it. If it matches some criteria - it's sent to UI via Synchronize. It happens not so often, so it shouldn't be bottleneck and limit overall performance of program. Every time, when state is queried, it's incremented. So, next time it will be new state. It also has counter, that counts every state increment. It's monitored and reset every second via TTimer event, so it shows states per second (SPS) ratio. I do it in thread to avoid UI blocking.

And I have 4 core + HT processor, so I wanted to try to boost performance via adding more threads. I know, how threads work, but I use them very rarely, so I've never encountered such problems. Problem is - performance decreases instead of increasing. Single threaded variant shows 1.2M SPS at 12% CPU load. First of all, I dunno, why it's not 100%. Multithreaded variant shows around 800K SPS at 20% CPU load. Adding more threads doesn't change anything - 20% load simply distributes between them without any SPS increase. But Task Manager and Process Explorer show, that at least threads distribute between cores.

I don't think, that state query and increase is bottleneck, as state processing should take much longer. So, adding more threads should show some performance increase. What is wrong with it?
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

MarkMLl

  • Hero Member
  • *****
  • Posts: 6676
Re: Multithreading isn't faster, than singlethreading?
« Reply #1 on: February 22, 2020, 09:02:48 pm »
You've told us nothing about what OS etc. you're running. For all we know you might be on Windows '95...

Try chopping out the Synchronize and see if that's the bottleneck. If it is, then replace it by QueueAsyncCall() since it doesn't block.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

lainz

  • Hero Member
  • *****
  • Posts: 4460
    • https://lainz.github.io/
Re: Multithreading isn't faster, than singlethreading?
« Reply #2 on: February 22, 2020, 09:08:11 pm »
Check that you're not creating and destroying the threads. Reuse them and it will run faster.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: Multithreading isn't faster, than singlethreading?
« Reply #3 on: February 22, 2020, 09:09:49 pm »
You've told us nothing about what OS etc. you're running. For all we know you might be on Windows '95...

Try chopping out the Synchronize and see if that's the bottleneck. If it is, then replace it by QueueAsyncCall() since it doesn't block.

MarkMLl
It's Win7 64bit. No, it's not Synchronize, because complete removal of it doesn't help. Increasing thread priority helps a little bit, but not so much.
Check that you're not creating and destroying the threads. Reuse them and it will run faster.
No, I don't. I just create N threads, that do follwing:
Code: [Select]
while not Terminated do begin
  State := QueryState;
  NewState := ProcessState(State);
  if CheckState(NewState) then Synchronize(@SendToUI);
end;
I.e. QueryState works the same way, as in single threaded mode, but ProcessState should be processed via several cores simultaneously, that should boost performance on multi-core processor.

Currently I have to use Synchronize, because NewState is variable in thread class, so it shouldn't change between state processing and calling SendToUI, that can happen in case of async call. I can use some state list, but it would overcomplicate things.
« Last Edit: February 22, 2020, 09:32:29 pm by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

440bx

  • Hero Member
  • *****
  • Posts: 3944
Re: Multithreading isn't faster, than singlethreading?
« Reply #4 on: February 22, 2020, 10:12:52 pm »
I don't have a clear picture of what the process you described is doing but, what concerns me in what you stated is that the threads are sending something to the UI.

That gives the impression that the threads are going to block until the UI processes whatever the threads produced for it.  If that's the case, it's no surprise that adding threads doesn't increase performance, they'll all be waiting for the UI to process whatever they handed to it.

That's my guess based on what I understood from your description.
(FPC v3.0.4 and Lazarus 1.8.2) or (FPC v3.2.2 and Lazarus v3.2) on Windows 7 SP1 64bit.

TheLastCayen

  • Jr. Member
  • **
  • Posts: 81
Re: Multithreading isn't faster, than singlethreading?
« Reply #5 on: February 23, 2020, 05:31:38 am »
I had a similar issue with a software that tries to connect on multiple computers over the network and fetch some information... To bypass the issue, I did a simple array of mytype(the information I fetch) Every time I created a thread, I changed the length of that array and lunch the thread with the location where it can write the fetched information. The only thing my main software do is read that arry at a specific interval with a ttimer and update a gride....

There is probably a better way to do it, but it worked fine for me:)

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: Multithreading isn't faster, than singlethreading?
« Reply #6 on: February 23, 2020, 07:55:03 am »
I don't have a clear picture of what the process you described is doing but, what concerns me in what you stated is that the threads are sending something to the UI.

That gives the impression that the threads are going to block until the UI processes whatever the threads produced for it.  If that's the case, it's no surprise that adding threads doesn't increase performance, they'll all be waiting for the UI to process whatever they handed to it.

That's my guess based on what I understood from your description.
As I've already said, I could implement state queue for UI, but currently it's not needed, because this event triggers rarely enough and I can test whole pipeline with this code commented and it doesn't help.

According to my yesterday's test there are two problems:
1) State query is actually some sort of bottleneck. With all processing code commented it can give me only around 3M SPS. I can optimize it a little bit, but I don't think it will give me much more SPS.
2) Second problem, that causes #1 - 20% CPU load limit. State query is simple, so it shouldn't be so slow, but it just can't work faster. How can I bypass it?

UPD I've optimized QueryState and now it gives 35M SPS. Each thread gives about 1M SPS and around 12% CPU load. So, I can get up to 4M SPS with 4 threads. But once CPU load hits 50%, it no longer rises, so performance starts to drop again. I know, that there is 50% per process CPU limit in Windows. I remember, that something can be done to bypass it, but I don't remember what.
« Last Edit: February 23, 2020, 08:25:58 am by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

balazsszekely

  • Guest
Re: Multithreading isn't faster, than singlethreading?
« Reply #7 on: February 23, 2020, 08:26:31 am »
@Mr.Madguy

You wrote:
Quote
I just create N threads, that do follwing...
What is N and what exactly QueryState is doing? I'm asking this because creating multiple threads, are only useful when the work inside the thread is not processor intensive. For processor intensive task, the general rule that N should not exceed 2xCoreNumbers, in your case 8 threads.  If you exceed that number, threads will only slow you down.

Otto

  • Full Member
  • ***
  • Posts: 226
Re: Multithreading isn't faster, than singlethreading?
« Reply #8 on: February 23, 2020, 09:31:48 am »
@ GetMen

Hello  GetMen.

How do I find the Core-Numbers (or CPU-Numbers)?
For now I have given found this: http://www.delphigroups.info/2/b2/509701.html

Code: Pascal  [Select][+][-]
  1. uses
  2.   Windows;  
  3. // ...
  4.  function GetCpuCount: Cardinal;
  5.   var
  6.     Info: SYSTEM_INFO;
  7.   begin
  8.     GetSystemInfo(Info);
  9.     Result := Info.dwNumberOfProcessors;
  10.   end;
  11.  

Otto.
« Last Edit: February 23, 2020, 10:12:45 am by Otto »
Kind regards.

Mr.Madguy

  • Hero Member
  • *****
  • Posts: 844
Re: Multithreading isn't faster, than singlethreading?
« Reply #9 on: February 23, 2020, 10:08:19 am »
@Mr.Madguy

You wrote:
Quote
I just create N threads, that do follwing...
What is N and what exactly QueryState is doing? I'm asking this because creating multiple threads, are only useful when the work inside the thread is not processor intensive. For processor intensive task, the general rule that N should not exceed 2xCoreNumbers, in your case 8 threads.  If you exceed that number, threads will only slow you down.
I've already written it. My program is simple: it has some number of states, it processes them and then checks for some criteria and outputs to UI. I could make it without any threads at all. But I needed to monitor it's state and in order to do it without slowing whole process down and without blocking main UI, I decided to implement this process in separate thread. Then I remembered about mksquashfs tool from Linux, that was using all 8 cores of my processor to boost performance and decided to try something similar. How N is determined? I can simply pick any number in UI and watch SPS monitor to achieve maximum performance.

I could implement it via something like "Each of N threads processes every Nth state". But it was hard, because I needed to save current config to file. I would need to save state for every thread and changing number of threads would mess whole config up. So I decided to simply query global state in critical section, as it would be done in single threaded case. I.e. querying state works as if it would be single threaded, but several states are processed in their own threads at the same time. And yeah. Having 4 threads is fastest now. But only because CPU load is limited by 50%. In case of 100% I would be able to use all 8 threads.
@ GetMen

Hello  GetMen.

How do I find the Core-Numbers (or CPU-Numbers)?
For now I have given found this: http://www.delphigroups.info/2/b2/509701.html
Code: Pascal  [Select][+][-]
  1. TThread.ProcessorCount
« Last Edit: February 23, 2020, 10:17:45 am by Mr.Madguy »
Is it healthy for project not to have regular stable releases?
Just for fun: Code::Blocks, GCC 13 and DOS - is it possible?

Xor-el

  • Sr. Member
  • ****
  • Posts: 404
Re: Multithreading isn't faster, than singlethreading?
« Reply #10 on: February 23, 2020, 12:05:35 pm »
@Mr.Madguy, do note that Thread.ProcessorCount  defaults to 1 on all OSes other than Windows.
To get the true cpu count, I recommend using https://github.com/Xor-el/NumCPULib4Pascal

Thaddy

  • Hero Member
  • *****
  • Posts: 14197
  • Probably until I exterminate Putin.
Re: Multithreading isn't faster, than singlethreading?
« Reply #11 on: February 23, 2020, 12:30:26 pm »
@Mr. MadGuy
Where did you get the information that Windows restricts to 50%? That is not true. You should be able to reach close to 100% on all cores.
One way to improve performance is to set an affinity mask for the thread that does the GUI stuff since it should be the least critical. See msdn.
Note threads have almost always associated overhead, but not more than 10-20%, unless you lock too much.
Specialize a type, not a var.

zamronypj

  • Full Member
  • ***
  • Posts: 133
    • Fano Framework, Free Pascal web application framework
Re: Multithreading isn't faster, than singlethreading?
« Reply #12 on: February 23, 2020, 01:53:16 pm »
Switching from single threaded to multiple thread does not magically guarantee that application will run faster in multi core processor, especially if algorithm is not parallel in nature, for example, lot of entering critical section to synchronize access between multiple threads will cause multiple threads to wait, potentially reducing performance.

I do not know inner working of your function call, so I do not know which part causing slow down but you need to identify any locks and reduce them as many as possible
« Last Edit: February 23, 2020, 01:55:17 pm by zamronypj »
Fano Framework, Free Pascal web application framework https://fanoframework.github.io
Apache module executes Pascal program like scripting language https://zamronypj.github.io/mod_pascal/
Github https://github.com/zamronypj

 

TinyPortal © 2005-2018