* * *

Author Topic: Competing Parallel Procedures API  (Read 1072 times)

schuler

  • Jr. Member
  • **
  • Posts: 72
Competing Parallel Procedures API
« on: June 04, 2018, 05:23:55 am »
:-) Hello :-)

Although I find mtprocs very useful, I needed something with a lot less overhead. 1 millisecond of overhead is far beyond the tolerance in my neural network API.

So, I coded something similar. All threads are kept alive waiting for an event to start. Got some inspiration here:
http://www.paradicesoftware.com/blog/2014/02/dont-use-suspend-and-resume-but-dont-poll-either/

I have a thread list with "always on" threads that are sent an event when I need them to start. This is the inner core:
Code: Pascal  [Select]
  1. procedure TNeuronThread.Execute;
  2. begin
  3.   while (not Terminated) do
  4.   begin
  5.     FNeuronStart.WaitFor(INFINITE);
  6.     FNeuronStart.ResetEvent;
  7.     if (FShouldStart) then
  8.     begin
  9.       FRunning := true;
  10.       FShouldStart := false;
  11.  
  12.       FProc(FIndex, FThreadNum);
  13.  
  14.       FRunning := false;
  15.       FNeuronFinish.SetEvent;
  16.       FProcFinished := true;
  17.     end;
  18.   end;
  19. end;
  20.  

This is how the thread list is implemented:
Code: Pascal  [Select]
  1.   TNeuronThreadList = class (specialize TFPGObjectList<TNeuronThread>)
  2.   private
  3.     FStarted: boolean;
  4.   public
  5.     constructor Create(pSize: integer);
  6.  
  7.     procedure StartEngine();
  8.     procedure StopEngine();
  9.  
  10.     procedure StartProc(pProc: TNeuronProc; pBlock: boolean = true);
  11.     procedure WaitForProc(); {$IFDEF Release} inline; {$ENDIF}
  12.   end;
  13.  

When creating the list, all threads are created:
Code: Pascal  [Select]
  1. constructor TNeuronThreadList.Create(pSize: integer);
  2. var
  3.   I: integer;
  4. begin
  5.   inherited Create(true);
  6.  
  7.   FStarted := false;
  8.  
  9.   for I := 1 to pSize do
  10.   begin
  11.     Self.Add( TNeuronThread.Create(true, I));
  12.   end;
  13. end;
  14.  

The waiting is implemented via events:
Code: Pascal  [Select]
  1. procedure TNeuronThread.WaitForProc();
  2. begin
  3.   FNeuronFinish.WaitFor(INFINITE);
  4.   FNeuronFinish.ResetEvent;
  5. end;
  6.  

All threads are launched together with:
Code: Pascal  [Select]
  1. procedure TNeuronThreadList.StartProc(pProc: TNeuronProc; pBlock: boolean = true);
  2. var
  3.   I: integer;
  4.   localCount: integer;
  5. begin
  6.   localCount := Count;
  7.   if not(FStarted) then StartEngine();
  8.   for I := 0 to localCount - 1 do
  9.   begin
  10.     Self.Items[I].StartProc(pProc, I, localCount);
  11.   end;
  12.  
  13.   if pBlock then WaitForProc();
  14. end;
  15.  

USAGE

This was the original code:
Code: Pascal  [Select]
  1.     for I := 0 to FNeurons.Count - 1 do
  2.     begin
  3.       ComputeNeuronFast(I);
  4.     end;
  5.  

This is the new parallel code:
Code: Pascal  [Select]
  1. procedure TNNetConvolution.ComputeNTL(index, threadnum: integer);
  2. var
  3.   I: integer;
  4. begin
  5.   for I := 0 to FNeurons.Count - 1 do
  6.   begin
  7.     if I mod threadnum = index then
  8.     begin
  9.       ComputeNeuronFast(I);
  10.     end;
  11.   end;
  12. end;
  13.  
  14. ...
  15.  
  16. fNTL.StartProc(@ComputeNTL); // calls above code
  17.  

The full source code is located here:
https://sourceforge.net/p/cai/svncode/HEAD/tree/trunk/lazarus/libs/ntl.pas

:-) Wish everyone happy coding :-)
« Last Edit: June 04, 2018, 08:44:15 am by schuler »

Pascal

  • Hero Member
  • *****
  • Posts: 664
Re: Competing Parallel Procedures API
« Reply #1 on: June 04, 2018, 05:46:00 am »
Looks good! I do it nearly the same way with my worker threads for my multithreaded COBOL parser :D
laz trunk - fpc trunk 32bit - Windows 10 Pro x64 (1803)

schuler

  • Jr. Member
  • **
  • Posts: 72
Re: Competing Parallel Procedures API
« Reply #2 on: June 04, 2018, 06:07:37 am »
Hi Pascal - it seems that this kind of class has been reimplemented for 100s of times by plenty of people... 

ASerge

  • Hero Member
  • *****
  • Posts: 767
Re: Competing Parallel Procedures API
« Reply #3 on: June 05, 2018, 05:44:03 am »
The full source code is located here:
https://sourceforge.net/p/cai/svncode/HEAD/tree/trunk/lazarus/libs/ntl.pas
Your code is not thread-safe. For eample:
StartProc can be called by external thread, so
Code: Pascal  [Select]
  1. procedure TNeuronThread.StartProc(pProc: TNeuronProc; pIndex,
  2.   pThreadNum: Integer);
  3. begin
  4.   FNeuronStart.ResetEvent;
  5.   FNeuronFinish.ResetEvent;
  6.   FProcFinished := false;
  7.   FProc := pProc; /// <- This can be replace proc in worked thread
  8.   FIndex := pIndex;
  9.   FThreadNum := pThreadNum;
  10.   FShouldStart := true;
  11.   FNeuronStart.SetEvent;
  12. end;
For example in this:
Code: Pascal  [Select]
  1. procedure TNeuronThread.Execute;
  2. begin
  3.   while not Terminated do
  4.   begin
  5.     FNeuronStart.WaitFor(INFINITE);
  6.     FNeuronStart.ResetEvent;
  7.     if FShouldStart then
  8.     begin
  9.       FRunning := true;
  10.       FShouldStart := false;
  11.       FProc(FIndex, FThreadNum); // <- executed other proc, when wanted
  12.       FRunning := false;
  13.       FNeuronFinish.SetEvent;
  14.       FProcFinished := true;
  15.     end;
  16.   end;
  17. end;

Access to thread variables must be either from only one thread or must be protected from reuse, for example, by using Interlocked functions.

schuler

  • Jr. Member
  • **
  • Posts: 72
Re: Competing Parallel Procedures API
« Reply #4 on: June 05, 2018, 09:35:00 pm »
 :) Hello ASerge :)

Very well spotted!!!

I'll make improvements and share.

 :) Wish everyone happy coding :)

schuler

  • Jr. Member
  • **
  • Posts: 72
Re: Competing Parallel Procedures API
« Reply #5 on: June 08, 2018, 07:21:22 am »
 :( Hello  :(

Sorry to bother you about this. But I regret to say that the overhead of TNeuronThreadList in linux is horrible. Although in Windows 10 the overhead seems negligible (I can double the speed by doubling computing cores), the overhead in linux makes it useless for my use case. To make things worse, my production environment is 100% Ubuntu 14.04 or 16.04.

I wonder if anyone reading this had the same problem. In windows 10, it flies.

Anyway, life continues.

If pascal coding was too easy, it wouldn't be as fun.

Long life to Free Pascal!

Thaddy

  • Hero Member
  • *****
  • Posts: 6126
Re: Competing Parallel Procedures API
« Reply #6 on: June 08, 2018, 08:30:05 am »
Maybe instead of:
Code: Pascal  [Select]
  1. TNeuronThreadList = class (specialize TFPGObjectList<TNeuronThread>)
use
Code: Text  [Select]
  1. //uses rtl-generics
  2. TNeuronThreadList = class (specialize TTThreadList<TNeuronThread>)
Although not yet as standard in 3.0.4, but in trunk, the library is available from Maciej (HNB) for 3.0.4
If you use rtl-generics instead of FGL you will see a considerable speed increase on all platforms.

(as an aside: I would also advice to use Delphi syntax, especially for generics: it is more readable and Objpas generics syntax currently has some issues, at least in trunk)
« Last Edit: June 08, 2018, 08:38:11 am by Thaddy »
I might not give the answer that you want me to.. Peter Green 1969

taazz

  • Hero Member
  • *****
  • Posts: 5079
Re: Competing Parallel Procedures API
« Reply #7 on: June 08, 2018, 08:34:36 am »
:( Hello  :(

Sorry to bother you about this. But I regret to say that the overhead of TNeuronThreadList in linux is horrible. Although in Windows 10 the overhead seems negligible (I can double the speed by doubling computing cores), the overhead in linux makes it useless for my use case. To make things worse, my production environment is 100% Ubuntu 14.04 or 16.04.

I wonder if anyone reading this had the same problem. In windows 10, it flies.

Anyway, life continues.

If pascal coding was too easy, it wouldn't be as fun.

Long life to Free Pascal!
last time I checked linux did not support events natively. There are various implementations that mimic them with variable success, I never evaluated any of them to be able to recommend something. Try to use semaphores (binary semaphores I guess) in linux and see if that makes any difference or search for the lightest lock available and try to use that. As a rule of thumb avoid interprocess able synchro mechanisms they usually drop down to kernel mode and are heavier than the non.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Thaddy

  • Hero Member
  • *****
  • Posts: 6126
Re: Competing Parallel Procedures API
« Reply #8 on: June 08, 2018, 09:04:41 am »
Last time I checked linux did not support events natively. There are various implementations that mimic them with variable success, I never evaluated any of them to be able to recommend something. Try to use semaphores (binary semaphores I guess) in linux and see if that makes any difference or search for the lightest lock available and try to use that. As a rule of thumb avoid interprocess able synchro mechanisms they usually drop down to kernel mode and are heavier than the non.
The TEventObj from syncobjs has no real performance issues on Unix, a.f.a.i.k. it is only a conditional var and a mutex.
[edit]
Code: Pascal  [Select]
  1.       TINTRTLEvent = record
  2.         condvar: pthread_cond_t;
  3.         mutex: pthread_mutex_t;
  4.         isset: boolean;
  5.        end;
Nothing heavy there..
« Last Edit: June 08, 2018, 09:46:13 am by Thaddy »
I might not give the answer that you want me to.. Peter Green 1969

schuler

  • Jr. Member
  • **
  • Posts: 72
Re: Competing Parallel Procedures API
« Reply #9 on: June 08, 2018, 10:18:17 am »
You guys rock. Will give a try and let you know.

taazz

  • Hero Member
  • *****
  • Posts: 5079
Re: Competing Parallel Procedures API
« Reply #10 on: June 08, 2018, 10:49:58 am »
Last time I checked linux did not support events natively. There are various implementations that mimic them with variable success, I never evaluated any of them to be able to recommend something. Try to use semaphores (binary semaphores I guess) in linux and see if that makes any difference or search for the lightest lock available and try to use that. As a rule of thumb avoid interprocess able synchro mechanisms they usually drop down to kernel mode and are heavier than the non.
The TEventObj from syncobjs has no real performance issues on Unix, a.f.a.i.k. it is only a conditional var and a mutex.
[edit]
Code: Pascal  [Select]
  1.       TINTRTLEvent = record
  2.         condvar: pthread_cond_t;
  3.         mutex: pthread_mutex_t;
  4.         isset: boolean;
  5.        end;
Nothing heavy there..
well I have no idea how heavy the condvar is, but that's the way I would go first if I would try to implement them my self too. I see no reason for the mutex and the isset field though
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Thaddy

  • Hero Member
  • *****
  • Posts: 6126
Re: Competing Parallel Procedures API
« Reply #11 on: June 08, 2018, 12:40:02 pm »
The IsSet is there because the mutex can only be owned by one thread, so the easiest way to check from another thread if the mutex is already acquired is to examine IsSet. That's cheaper than to try to acquire the mutex. Since it is a boolean it is read-safe without protection and it can only be written to by the thread that has the mutex. The condvar has only a very simple locked operation on init/destroy.

A good example for mutex and condition is in this man page (in C): https://linux.die.net/man/3/pthread_cond_init
« Last Edit: June 08, 2018, 12:46:31 pm by Thaddy »
I might not give the answer that you want me to.. Peter Green 1969

taazz

  • Hero Member
  • *****
  • Posts: 5079
Re: Competing Parallel Procedures API
« Reply #12 on: June 08, 2018, 01:42:03 pm »
The IsSet is there because the mutex can only be owned by one thread, so the easiest way to check from another thread if the mutex is already acquired is to examine IsSet. That's cheaper than to try to acquire the mutex. Since it is a boolean it is read-safe without protection and it can only be written to by the thread that has the mutex. The condvar has only a very simple locked operation on init/destroy.

A good example for mutex and condition is in this man page (in C): https://linux.die.net/man/3/pthread_cond_init
well if I understand you correctly then the mutex is used to force the threads to sleep not to protect access to the internal fields?
In that case, the point of an event is, for you to wait for it to happen, there is no need for a check if it is set, the wait on an event goes something along the lines of


Code: Pascal  [Select]
  1. procedure Event.Reset;
  2. Begin
  3.   mutex.acquire;// the event is locked aka set.
  4. end;
  5. procedure Event.Signal;
  6. begin
  7.   mutex.release;
  8. end;
  9. procedure Event.WaitFor;
  10. begin
  11.   mutex.acquire; //when the event is signaled the thread will be able to continue;
  12.   if not autoreset then mutex.release; // release the acquired lock so the next thread can follow;
  13.   {else the first in will lock until the next signal;}
  14. end;
  15.  
  16.  

well you get the point I guess, that is assuming the mutex is not thread aware, something along the lines of a binary semaphore. If it is thread aware aka only the thread that acquired the lock can release it then use semaphores or some other type of lock.


PS
  thanks for the link.
« Last Edit: June 08, 2018, 01:58:30 pm by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

Thaddy

  • Hero Member
  • *****
  • Posts: 6126
Re: Competing Parallel Procedures API
« Reply #13 on: June 08, 2018, 05:19:24 pm »
PS
  thanks for the link.
It is essentially the same code as TEventObj uses on the lowest level. Difference is the acquire code. TEventObj should be essentially faster because of that!
I might not give the answer that you want me to.. Peter Green 1969

taazz

  • Hero Member
  • *****
  • Posts: 5079
Re: Competing Parallel Procedures API
« Reply #14 on: June 09, 2018, 08:56:05 am »
PS
  thanks for the link.
It is essentially the same code as TEventObj uses on the lowest level. Difference is the acquire code. TEventObj should be essentially faster because of that!
any links to the profiling code for the TeventObj? I'd like to add it to my collection.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

 

Recent

Get Lazarus at SourceForge.net. Fast, secure and Free Open Source software downloads Open Hub project report for Lazarus