- rvk with respect to the code you posted, thanks for the block size etc breakdown however what I was attempting was to emulate the equivalent of using TParallel from Delphi. The idea being that I shouldn't have to break it myself into block sizes and it would scale dynamically.
Well, then MTProc isn't the choice for you, is it. With MTProc you need to break down the tasks yourself in equal parts.
You could put the breakdown in a separate procedure so it looks for your code like it is one call (emulating what Delphi does). But there are other libraries too. Maybe there are some which already have this in them.
I also found another,
PasMP, which should be compatible with FPC and Delphi. But I didn't check if it has the break-down in it.
As far as I can see, TParallel.&For() has a separate index-manager which the workerthreads get their number for to execute. Sort of like an array in which the index is flagged as already processed. The upside of this, is that all threads are always busy to the end. With MTProc it could be possible that one thread finishes earlier than the others and sits there doing nothing until the others are done too. I'm not aware of such workerthread-queues in FPC (but I'm not that familiar with threading in FPC).
But emulating the TParallel.&For() should be that difficult to do in a small unit.
Edit:
I've attached a small project. The call from your main program is just
MultiThread(1, cMax, @DoLoop);
from parallel_thread.pas.
In parallel_thread.pas there are 8 (default) workerthreads which take an increasing index-number and execute the DoLoop. The only downside is that it's still slow because it acquires a lock for every index it needs to take. This is very inefficient.
IndexLock.Acquire;
Inc(CurrentIndex);
DoIndex := CurrentIndex;
IndexLock.Release;
(if you comment out the lock it does seem to work much faster but I don't think it's safe, although it does work for me)
Delphi does this via a "TStrideManager". So, to get more speed, something else must be created other then Acquiring and Releasing the lock 50.000.000 times. But it does proves the concept.
Somebody has an idea how to speed that up?