Recent

Author Topic: Lockfree_mpmc and scalability ...  (Read 8866 times)

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc and scalability ...
« Reply #15 on: May 29, 2012, 09:48:01 pm »

I wrote:
> In the two thread scenario, you have to do a load
> from the local L2 cache in [1] and [2] and this loads makes
> the S part of the Amadahl equation much bigger than
> the P part, it's why the two threads version doens't scale
> either.
>
> So in general i think it's not possible to make lockfree
> fifo queues to scale when the lockfree code is sharing variables
> between the cores, cause sharing variables is so expensive..


I mean the CAS and the sharing of the variables make the S part of
lockfree_mpmc much bigger than the S part and from the Amadahl equestion
this makes lockfree_mpmc not scalable.

So in general i think it's not possible to make lockfree
fifo queues to scale when the lockfree code is sharing variables
between the cores and you are using  CASes, cause sharing variables
and using CAS are so expensive..


Thank you.

Amine Moulay Ramdane.


aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Lockfree_mpmc and scalability ...
« Reply #16 on: May 30, 2012, 12:08:03 am »

Hello,

Even the following code inside push() method:

if getlength >= fsize
 then
 begin
 result:=false;
 exit;
 end;

If you hav e noticed getlength() method is  sharing
variables between the cores and making the 4 threads
test much slower than the single thread test on
Intel Core 2 Quad Q6600, and i have tested it on my computer,
it makes it much slower cause the cache to cache tranfer is costly
on Intel Core 2 Quad Q6600, but on new architechtures that have
and L3 cache and hypertransport, it gives the same throughput
on single thread and  four threads but it doesn't scale with four threads.
 .
So the following parts are sharing variables between the cores and making
the 4 threads test much slower than the single thread test.on Intel Core 2 Quad Q6600:

[1]  getlength()

[1] newTemp:=LockedIncLong(temp);  ...

[2] and CAS(tail,lasttail,newtemp) also...
 

It's why i have told you that the CAS  and sharing variables
are expensive.



Thank you.

Amine Moulay Ramdane.


BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Lockfree_mpmc and scalability ...
« Reply #17 on: May 30, 2012, 06:47:11 am »
Thanks Blaazen - on a Core i7 quad core, Windows Vista x64, I get:

push1.exe
Code: [Select]
throughput is: 6045617,03 push transations/s

push4.exe
Code: [Select]
throughput is: 6148518,32 push transations/s
On the same machine, with the new programs:
Code: [Select]
push1
10001

throughput is: 3533922,42 push transations/s

push1
10001

throughput is: 3764019,74 push transations/s

push1
10001

throughput is: 3514054,97 push transations/s

push1
10001

throughput is: 3455770,71 push transations/s

push2
20002

throughput is: 3594895,92 push transations/s

push2
20002

throughput is: 3763311,55 push transations/s

push2
20002

throughput is: 3549600,87 push transations/s

push2
20002

throughput is: 3691087,08 push transations/s

push4
40004

throughput is: 4635995,11 push transations/s

push4
40004

throughput is: 4705799,53 push transations/s

push4
40004

throughput is: 4596048,00 push transations/s

push4
40004

throughput is: 4694203,45 push transations/s
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

 

TinyPortal © 2005-2018