Hello all,
I have finally found why lockfree_mpmc doesn't scale...
you can get the the source code of lockfree_mpmc from:
http://pages.videotron.com/aminer/So please follow with me..
If you take a look at lockfree_mpmc object pascal
source code you will read this on the push side:
---
function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
var lasttail,newtemp:longword;
i,j:integer;
begin
if getlength >= fsize
then
begin
result:=false;
exit;
end;
result:=true;
newTemp:=LockedIncLong(temp);
lastTail:=newTemp-1;
setObject(lastTail,tm);
repeat
if CAS(tail,lasttail,newtemp)
then
begin
exit;
end;
asm pause end;
until false;
end;
---
When i have tested the push() side with 4 threads i have noticed that lockfree_mpmc
doesn't scale at all., in fact i have got a retrograde throughput, that means that
i got less throughput than on a single thread test.. and i have finally found
why lockfree_mpmc doesn't scale. When you are using a lockfree_mpmc
on a single thread test the CAS does read and update the variables on the
level 1 cache, and it's fast, but when you are using 4 threads it does get
too slow cause we are reading and updating from the L2 and from the memory.
I have thried to play with the affinity mask and i have found that when i am
using two threads on my tests and reading and updating from the same level 2 cache
it does scale a little bit more and i have got more throughput with two threads
on different cores and on the same level 2 cache than the single threadtest.
I have also modified lockfree_mpmc to not touch the CAS and
the cache when tail and lasttail are not equal by using the following code inside
the repeat until loop:
if tail <> lasttail
then
begin
continue;
end;
and it does give better performance with this method
here is the final code of the push() side of lockfree_mpmc..
i think i will modify the pop() side like that...
---
function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
var lasttail,newtemp:longword;
i,j:integer;
begin
if getlength >= fsize
then
begin
result:=false;
exit;
end;
result:=true;
newTemp:=LockedIncLong(temp);
lastTail:=newTemp-1;
setObject(lastTail,tm);
repeat
if tail <> lasttail
then
begin
continue;
end;
if CAS(tail,lasttail,newtemp)
then
begin
exit;
end;
asm pause end;
until false;
end;
---
But as i have said before lockfree_mpmc doesn't scale when we are
using different cores and WE ARE NOT sharing the same cache,
that means that on my Intel Core 2 Quad Q6600 it does scale only
when we are using 2 threads on different cores that shares the same
level2 cache.
Thank you.
Amine Moulay Ramdane.