I have simple program. It queries current state inside critical section and then processes it. If it matches some criteria - it's sent to UI via Synchronize. It happens not so often, so it shouldn't be bottleneck and limit overall performance of program. Every time, when state is queried, it's incremented. So, next time it will be new state. It also has counter, that counts every state increment. It's monitored and reset every second via TTimer event, so it shows states per second (SPS) ratio. I do it in thread to avoid UI blocking.
And I have 4 core + HT processor, so I wanted to try to boost performance via adding more threads. I know, how threads work, but I use them very rarely, so I've never encountered such problems. Problem is - performance decreases instead of increasing. Single threaded variant shows 1.2M SPS at 12% CPU load. First of all, I dunno, why it's not 100%. Multithreaded variant shows around 800K SPS at 20% CPU load. Adding more threads doesn't change anything - 20% load simply distributes between them without any SPS increase. But Task Manager and Process Explorer show, that at least threads distribute between cores.
I don't think, that state query and increase is bottleneck, as state processing should take much longer. So, adding more threads should show some performance increase. What is wrong with it?