Hello,
We have to be smart more than that...
So follow with me please...
The follwing Bakery concurrent FIFO queue algorithm have a weakness:
https://groups.google.com/d/topic/lock-free/acjQ3-89abE/discussionLook at the consumer function:
double consumer() {
uint32_t ver = XADD(&tail, 1);
cell& c = cells[ver & (N - 1)];
while (LOAD(&c.ver) != ver + 1) backoff();
double state = c.state;
STORE(&c.ver, ver + N);
return state;
}
As you have noticed if LOAD(&c.ver) = ver + 1
the thread will then pop up the item.. but this is a weakness
i will explain to you why...
With a CAS based method the pop() method can use a backoff mechanism
when the CAS fails under high contention and this will give
3x times more throughput than the bakery algorithm in the pop()
side, i have benchmarked it and it has been confirmed under
my benchmarks , the CAS based method with a backoff gives
35 millions of POPs per second on my Quacore, and the Bakery gives
12 millions of POPs per second on my Quadcore.
To elevate this weakness with the Bakery concurrent FIFO queue
you will have to choose another model like the following..
Imagine that you want to design a Threadpool that is efficient,
if you use a single FIFO queue for all the consumers this will not
be so fast and efficient, to be more efficient and fast you have have to use a concurrent FIFO queue for each consumers in your Threadpool
but the producer have to be a single thread that will
pop from a concurrent FIFO queue where multiple threads will
push there items, so this single producer thread of the Threadpool
engine will round robin between the Queues of each concumer
and put the items, so since you are using a Single producer thread
there will be variables like the tail variable above that will be accessed from only the local cache , so there will be no cache transfer for the tail variable etc. so this will be cheaper and faster and i think this will give as the same throughput as the CAS with a backoff on the pop side, a=nd this will give us 35 millions of POPs per second
on my Quadcore and this method can be applied to the Bakery concurrent
FIFO queue.
On my previous post i have said that the service rate of the pop() is limited by the arrival rate of the push() under high contention,
but that's not completly true cause with the Threadpool engine
since the service rate will be lowered on the consumers side
since they have to so some work, the service rate must be
faster and 3x times by applying the above method.
Thank you,
Amine Moulay Ramdane.