Recent

Author Topic: Lockfree_mpmc version 1.1  (Read 11058 times)

aminer

  • Hero Member
  • *****
  • Posts: 956
Lockfree_mpmc version 1.1
« on: December 12, 2011, 07:29:02 am »

Hello,


The scores of lockfree_mpmc are very good, look at the benchmarks bellow please..


Description:

Lock-free MPMC algorithm to handle queue FIFO proposed by Dariusz Mazur and Amine Moulay Ramdane ,
use only single CAS on push and single CAS on pop. It's based on a circular array.

Please look at more information and the benchmarks here:

http://pages.videotron.com/aminer/parallelqueue/parallelqueue1.htm


Language: FPC Pascal v2.2.0+ / Delphi 5+: http://www.freepascal.org/

Operating Systems: Win , Linux and Mac (x86).

Required FPC switches: -O3 -Sd -dFPC -dWin32 -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -DMSWINDOWS -$H+ -DDelphi



Thank you.

Amine Moulay Ramdane.


ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Lockfree_mpmc version 1.1
« Reply #1 on: December 12, 2011, 11:38:48 am »
Needs some "repairs" to be usable.

TParallelQueue.pop:
Code: [Select]
if index2<nbr1 then localindex:=index2+1
else localindex:=0;
if localindex > index1
                 then
                  begin
                   result:=false;
So, as soon as index1 wraps to 0 then there is nothing to pop anymore  :o

More fundamentally, a fifo is supposed to pop the items in the same order as they where pushed. The problem is that the parallel use of different fifo's won't work. Example using 4 fifo's (your benchmark):
Thread 1 pushes values 1,2,3,4,5,6 and gets pre-empted pushing 7 between
Code: [Select]
index1:=localindex;
and
Code: [Select]
result:=queues[localindex mod len].push(tm);
So the 4 fifos contain:
1234
56
The second thread pushes A,B,C,D,E,F. Since the 3rd fifo is "in use" by thread 1, A will be pushed starting with the next fifo: Result:
1234
56DA
BC E
F
If now a 3rd thread kicks in and reads the fifo, the result is clearly wrong. Even when thread 1 comes first and finishes the push of 7, the result is still wrong (D before A):
1234
56DA
BC7E
F
A similar example can be created while popping from the stack.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #2 on: December 12, 2011, 03:58:20 pm »

ludob wrote:
>The second thread pushes A,B,C,D,E,F. Since the 3rd fifo is "in use" by thread 1, >A will be pushed starting with the next fifo: Result:
>1 2 3 4
>5 6 D A
>B C  E
>F


I don't understand, how come the D is stored before A  in your example, i don't
understand it ? i think the second thread will push A,B,C,D,E,F in the order.
Can you explain more your example ?


Amine Moulay Ramdane.






ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Lockfree_mpmc version 1.1
« Reply #3 on: December 12, 2011, 04:39:37 pm »
Quote
I don't understand, how come the D is stored before A  in your example, i don't
understand it ?
Because thread 1, wanting to push data 7, has updated index1 in TParallelQueue.push but not pushed the data to the queues[localindex mod len] yet. This the assumption of the example. Thread 2 comes in, increases index1 and pushes A to the next queue (4 in my example). As a result, queue 3 has one item less. Thread 2 stores B, C and when storing D, queues[localindex mod len] , points to the same queue as thread 1 was going to use.  It has one element less and D is at the place where 7 should have been.
Code: [Select]
i think the second thread will push A,B,C,D,E,F in the order.
They are pushed in order but the unused slot in queue 3 from thread 1 leaves queue 3 with one element less. So D is in row 2 instead of row 3 of queue 3. When thread 1 comes back and finishes storing 7, all queues are back at the level they should be.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #4 on: December 12, 2011, 04:55:32 pm »

ludob wrote:
>if index2<nbr1 then localindex:=index2+1
>else localindex:=0;
>if localindex > index1
>                 then
>                  begin
>                   result:=false;

>>So, as soon as index1 wraps to 0 then there is nothing to pop anymore 


i have changed it with:

if index2 = index1
                 then
                  begin
                   result:=false;
                   count2:=0;
                   exit;
                  end;


Amine Moulay Ramdane.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #5 on: December 12, 2011, 05:01:34 pm »

ludob wrote:
> Because thread 1, wanting to push data 7, has updated >index1 in >TParallelQueue.push but not pushed the data to the queues>[localindex mod >len] yet. This the assumption of the example. Thread 2 comes >in, increases >index1 and pushes A to the next queue (4 in my example). As a >result, >queue 3 has one item less. Thread 2 stores B, C and when storing D, >queues>[localindex mod len] , points to the same queue as thread 1 was >going to use.  It has one element less and D is at the place where 7 should >have been.

I understand now..

result:=queues[localindex mod len].push(tm) etc. must be also in the critical
section... unfortunatly i haven't thought about this situation that
you described...


so just forget about parallelqueue...


Amine Moulay Ramdane






aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #6 on: December 12, 2011, 05:08:29 pm »


What do you think ludob about the other MPMC FIFO queue?

here it is:

http://pages.videotron.com/aminer/zip/lockfree_queue.zip


Amine Moulay Ramdane.





aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #7 on: December 12, 2011, 05:38:02 pm »


Please download it again , i have changed something inside the code...


http://pages.videotron.com/aminer/zip/lockfree_queue.zip


Amine Moulay Ramdane.

ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Lockfree_mpmc version 1.1
« Reply #8 on: December 12, 2011, 05:59:01 pm »
Quote
What do you think ludob about the other MPMC FIFO queue?
Well... some "small" problems:

TLockfree_MPMC.pop:
CAS isn't thread safe at all. Exemple: thread 1 wants to set head to 5, it pushes 5 on the stack and gets pre-empted by thread 2. Thread 2 pops from the fifo, head is still 4, and gets same data as thread 1 (=problem 1). Pops a few other items and leaves head at 15. Thread 1 is back and continues (remember 5 on the stack), continues with CAS and head is 5 (problem 2).

TLockfree_MPMC.push:
suppose we are at getlength -1. thread 1 pushes an item and gets pre-empted at setObject(tail,tm). Thread 2 pushes an item. getlength is still the same because tail hasn't been updated. It continues in the repeat loop and backs off. thread 2 continues and updates tail. thread 1 comes back and stores data but buffer is full. It doesn't test on getlength any more. Test on getlength should be done inside critical section.

I already made the comments before regarding the pointer to circular buffer that never gets reset. You replied that the pointer is masked. That is fine unless the unit is compiled with range checking on. 
Why not do a
Code: [Select]
tail:=(tail+1) and fMask; and forget about the mask in setObject and getObject. It is probably even faster and it will also work for a 64 bit cpu.



ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Lockfree_mpmc version 1.1
« Reply #9 on: December 12, 2011, 06:00:22 pm »
Quote
Please download it again , i have changed something inside the code...
That solves TLockfree_MPMC.push. :)

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #10 on: December 12, 2011, 06:28:29 pm »

Hello ludob,

I will ask you a question:

Please take a look at this lockfree queue:

 http://www.emadar.com/fpc/lockfree.htm


I am not satisfied with it...

As you will notice, it's not lockfree in the push() side ,  i will ask you
if you can modify it and use a DWCAS in the push side and avoid the
overflow of the buffer and give me the final source code ?


Amine Moulay Ramdane.




 

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #11 on: December 12, 2011, 06:31:38 pm »

Hello,

Or can you ludob write for us an efficient lockfree fifo queue based on
a circular buffer ?


Amine Moulay Ramdane.



aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #12 on: December 12, 2011, 07:19:02 pm »

Hello,

ludob,  i have modified fifoqueue_mpmc as you said
with tail:=(tail+1) and fmask ,  can you please tell me if all
is ok with the new source code ?

here is the source code:


http://pages.videotron.com/aminer/fifoqueue_mpmc.zip


Amine Moulay Ramdane.




aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc version 1.1
« Reply #13 on: December 12, 2011, 07:21:45 pm »


Sorry , here is the source code:

http://pages.videotron.com/aminer/zip/fifoqueue_mpmc.zip


Amine Moulay Ramdane.


ludob

  • Hero Member
  • *****
  • Posts: 1173
Re: Lockfree_mpmc version 1.1
« Reply #14 on: December 12, 2011, 08:54:51 pm »


Sorry , here is the source code:

http://pages.videotron.com/aminer/zip/fifoqueue_mpmc.zip


Amine Moulay Ramdane.


At first sight, much better. I don't see any obvious problems.

Quote
Or can you ludob write for us an efficient lockfree fifo queue based on
a circular buffer ?
A lock free fifo is more complex to implement and, yes, it is slower. That is the cost of having a maximum number of cycles to complete a push or a pop. Therefor, IMHO the  benchmark is biased in that you are comparing apples with pears. A lock isn't expensive at all in your test since you just leave the processor to another thread that is doing exactly the same. With four threads I would be surprised you fill up 4 cores. So all the lock free logic is just waste in this usage case, and it shows.
The lock free buffers you benchmark seem to be quite fast and are definitely the result of more than a few hours of programming. I have no reason to believe that I can knock together a more efficient or faster version in a moments time.

 

TinyPortal © 2005-2018