Recent

Author Topic: Enahnced scalable Anderson Lock  (Read 14332 times)

aminer

  • Hero Member
  • *****
  • Posts: 956
Enahnced scalable Anderson Lock
« on: October 09, 2013, 09:01:27 pm »

Hello,

I have updated my enhanced scalable Anderson Lock ,
i have corrected a bugin the enter() method

here is the what i have corrected:

==
procedure TALOCK.Enter;

var  count1,slot:long;
     
begin

repeat

count1:=LockedIncInt(count);

if count1<= FSize then break;

LockedDecInt(count);

while count>=fsize // <-- i have changed here count1 by count
do sleep(0);

sleep(0);

until false;

slot:=LockedIncInt(fcount2^.fcount2);
while Fcount1^[(slot-1) mod fsize].fcount1<>1
do sleep(0);
end;

===


Other than that if you have noticed the following

while count>=fsize
do sleep(0);

is important so that the Enter() method do not deadlock etc.
if the threads increment count beyong fsize...


You can download my enhanced scalable Anderson Lock from:

http://pages.videotron.com/aminer/


Thank you,
Amine Moulay Ramdane



aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #1 on: October 10, 2013, 01:27:04 am »

Hello,

I have come to another interresting subject, the Amdahl equation
that is equal to 1/(S+P/N)
(S:the percentage of the serial part,
P: percentage of the parallel part
N: the number of cores)

So we have to be smart, so follow with me,  i have read on some documents something that look like this: if the Serial part is 0.1% and the parallel part is 0.9% and we have 4 cores, so the Amdahl equatoin
will equal  to 1/ 0.1 + 0.225 = 3X , so this will scale to 3X, but i don't agree with this , cause i think
this Amdahl equation do not give a correct picture,so imagine that the serial part take 1 second and
the parallel part 9 seconds, that means S= 0.1% and P=0.9%,so
with 4 cores you will say that this will run four P parts in 9 seconds
and four S parts in 4 seconds this will equal 13 seconds, but the serial part will run in 4*10 , so the scalability will equal 40 seconds divide
by 13 seconds so this will scale to 3X, this is exactly what i have found
with the Amdahl equation. But the Amdhal equation is not
so precise and it doesn't give a correct picture, cause i think
the Amdhal equation is for only the ideal contention scenario as
i have just explain to you , but in a none contention scenario
you will for example run four S parts in 1 seconds and four P parts in 9 seconds this will give a scalability equal to 40 seconds divide by 10
so this will scale to 4X in a none contention scenario, hence if you have less contention it will scale better than 3X , so this is why i say that
the Amdahl equation doesn't give you a correct picture, and more than
that if in pratice the serial part is small and there is more randomness in the parallel part, you will have less contention i think, so the example that i just gave you will scale to much better than 3X , so hope you have undertood this important ideas that i am giving you. 
 


Thank you,
Amine Moulay Ramdane.





aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #2 on: October 10, 2013, 02:21:20 am »

Hello,

Please read this, they say:

"Amdahl's law, also known as Amdahl's argument,[1] is used to find the maximum expected improvement to an overall system"

read here:

http://en.wikipedia.org/wiki/Amdahl%27s_law


I think that's false, it's not the "maximum expected improvement to an overall system",  and i have explained to you why in my previous post , what i have explained is that the Amdahl equation gives you the scalability that you will have in an IDEAL CONTENTION SCENARIO , but if you have less contention it will scale much better, and in a none contention scenario it will have a perfect scalability,so  i have
proved to you and explainaed to you in my previous post that the  Amadhal equation doesn't
give a correct picture. Hope you have understood my arguments and my ideas against the Amdahl equation.



Thank you,
Amine Moulay Ramdane.



aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #3 on: October 10, 2013, 03:19:02 am »

Hello,

So if you have understood my ideas and my arguments aginst the Amdahl's equation, i have proved to you that the Amdahl's equation gives you the scalability that you will have in an IDEAL CONTENTION SCENARIO , but if you have less contention it will scale much better, and in a none contention scenario it will have a perfect scalability, so if you have a small serial part
let say of S = 0.1% so the Amdahl's equation says that your program will
scale to a maximum of 10, but that's false , it can scale much more
than that if you introduce randomness in the Parallel part to avoid
the ideal contention scenario, something like wait(random(something_like_the+time_of_your_bigger_locked_region)) or something like that and this will scale your application to much more than 10X, so the very important part is to introduce an optimal randomness of a more optimal wait , but you don't have to use sleep() for that,
just to loop around a pause asm instruction for example and this will give
you better scalability than 10X.


Thank you,
Amine Moulay Ramdane.


 

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #4 on: October 10, 2013, 03:20:26 am »

Hello,

So if you have understood my ideas and my arguments against the Amdahl's equation, i have proved to you that the Amdahl's equation gives you the scalability that you will have in an IDEAL CONTENTION SCENARIO , but if you have less contention it will scale much better, and in a none contention scenario it will have a perfect scalability, so if you have a small serial part
let say of S = 0.1% so the Amdahl's equation says that your program will
scale to a maximum of 10, but that's false , it can scale much more
than that if you introduce randomness in the Parallel part to avoid
the ideal contention scenario, something like wait(random(something_like_the+time_of_your_bigger_locked_region)) or something like that and this will scale your application to much more than 10X, so the very important part is to introduce an optimal randomness of a more optimal wait , but you don't have to use sleep() for that,
just to loop around a pause asm instruction for example and this will give
you better scalability than 10X.


Thank you,
Amine Moulay Ramdane.


 

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #5 on: October 10, 2013, 06:23:40 am »

Hello,

I will prove it to you again, so follow with me carefully please...

Let say we have 4 threads, and 4 cores, and let say that each
thread is running the same parallel code, and let say we have also a serial code inside some critical section, and let say that the parallel part is
0.1% and the serial part is 0.9%, so let say that the serial part
takes 1 second(it means 0.1%) and the parallel part takes 9 seconds(it means 0.9%), what i have tried to explain to you is that the Amadhl's law or equation  is not a correct law and it doesn't give a correct results,
here is why: so if the 4 threads are all looping and looping again
for a number of times running the same parallel code and the same serial code and let say that they are contending at the same time
for the critical section, this  is why i have called an ideal contention
scenario, so if they  are contending AT THE SAME TIME for the critical section(the serial part) they will run 4 serial parts in 4 seconds in one loop  and they will run 4 parallel parts in 9 seconds in one loop, hence it will take 13 seconds in one loop , but  if you run the parallel part and the serial part serially they will take 40 seconds, so the scalability will equal 40 seconds divide by 13 seconds = 3.07X, this is the result that gives us the Amdahl's law, it's 1 /(0.1 + 0.9/4) = 3.07X, but this is not the end of the story , so follow now carefully with me, so let divide the parallel part into 9 small parts  each equal to
1 seconds, and the serial part is equal to 1 second, so if the threads
are looping and not contending at the same time for the critical section and let say that when the threads are looping the first thread will be on the first small part equal to 1 seconds of the 9 small parts of the 9 seconds of the parallel part ,  the second thread will be on the second small part equal to 1 seconds of the 9 parts of the 9 seconds of the parallel part, and the third threads will be on the third small part equal to 1 second of the 9 parts of the 9 seconds of the parallel part , and the fourth thread will be on the fourth small part equal to 1 second of the 9 parts of the 9 seconds of the parallel part , so imagine the 4 threads
looping again and again and not changing there places like that , so
there will be no serial part at all, cause when each thread will be on the 1 second of the serial part the other threads will not be on the the same serial part, so this is a none contention scenario , so since
there is no serial part so the scalability will be perfect and
equal to 4X , so as you have noticed the Amdahl equation gives
the scalability of the ideal contention scenario all the threads
are contending at the same time so the scalability will equal 3.07X, but if they are not contending at all the scalability will be perfect
and equal to 4X, and if you have less contention this will scale
better than 3.07X,  so now i have proved to you that the Amdahl's law
doesn't give you a correct result.



Hope you have understood my arguments and my ideas against the
Amdahl's law.

And of course in my example each thread have the same number of loops and the same number of work, and each thread is running on a separate core in my example. so that you understand more my example.


Thank you,
Amine Moulay Ramdane.


aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #6 on: October 10, 2013, 06:06:06 pm »
Hello,

I think i know what is my mistake: the serial part in
the Amdahl's law is not the critical section, you must not
confuse the two.



Amine Moulay Ramdane.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #7 on: October 10, 2013, 06:33:33 pm »

I wrote:
> Hello,
>
> I think i know what is my mistake: the serial part in
> the Amdahl's law is not the critical section, you must not
> confuse the two.


But, Amine,  how can you say that the serial part of the Amdahl's law is not
the critical section ?


In my example if you don't have context switches , and you have the same
number of threads than the number of cores, and the time inside the critical
section is constant and the time inside the parallel part is constant, so the Amdahl's law can predict the worst case contention scenario , that means when all the threads are contending for the same critical section, hence you can predict the worst case contention scenario by just  calculating the time inside the critical section and the time inside the parallel part and doing the Amdahl calculation, this will give you the
exact result for the worst case contention scenario, so as you have
noticed the serial part of the Amdahl equation is not only the critical section, it's the critical section with a context, so you have to take
into consideration also the context, that means the context of the worst case contention scenario.


Thank you,
Amine Moulay Ramdane.




Scoops

  • Full Member
  • ***
  • Posts: 105
Re: Enahnced scalable Anderson Lock
« Reply #8 on: October 10, 2013, 06:46:58 pm »
Hi,

You made a mistake, we should'nt confuse things, we need to be sensible,
i think you should re-write the previous 6 posts again ? because i did'nt get
anything that you have written. No sorry i'll start again.

I'm a programmer but to be honest i can't get my head around your posts.
In my opinion what you need is to put out a nice GUI App not console, and
show the results of something, you are too vague with what your code is
supposed to do. (For me anyway, sorry i'm not perfect). I appreciate your
work and have looked at some of your code, i'm not belittling anything but
you should lighten up on the math and explain in  real terms what it does.
With real examples and results.


aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #9 on: October 10, 2013, 07:10:15 pm »

Hello,

But how can we use the Amdahl's law to predict scalability ?

You can not use the Amdahl's to predict the exact scalability,
i think , and i have explained it to you that the Amdahl's laws
in parallel computing gives you the scalability in the worst case contention scenario , so the Amdahl's calculation in parallel computing looks like the calculation of  the worst case complexity in computer science, so it can help to give the scalability
of the worst case contention scenario, so that help also
to predict worst case scalability, and if the serial part have not
a constant running time and the parallel part have not a constant runnning time you can try to find and choose a maximum bound of
the time to run the serial part and a maximum bound of the time to run the parallel part
and do the Amdahl's calculation and this will give you the scalability
for the worst case contention scenario, so the scalability will be equal
or greater than the scalability that you will find with the Amdahl's equation.



Thank you,
Amine Moulay Ramdne.


 

Scoops

  • Full Member
  • ***
  • Posts: 105
Re: Enahnced scalable Anderson Lock
« Reply #10 on: October 10, 2013, 07:29:59 pm »
That does'nt help me ?
Posting without even replying for me is called spam.

« Last Edit: October 10, 2013, 07:45:22 pm by Scoops »

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #11 on: October 10, 2013, 07:58:39 pm »

Hello,

But Amine, you have said that the Amdahl's law predict scalability
of the worst case contention scenario , so the Amdahl's calculation in parallel computing looks like the calculation of  the worst case complexity in computer science?

That's right, cause what is, in your opinion , the purpose of the Amdahl's law if it's not to predict worst case scalability by calculating the maximum bound of time time inside the critical sections and the maximum bound of the time inside the parallel part and doing the Amdahl's calculation to predict the worst case contention scenario
hence the worst case scalability, so i think this will predict that the scalability will be equal or greater to the result that we have
found with the Amdahl's law. This is the purpose of Amdahl's law:
it's also to predict the worst case scalability.



Thank you,
Amine Moulay Ramdane.
 


aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #12 on: October 10, 2013, 08:32:31 pm »

I have wrote:
"But Amine, you have said that the Amdahl's law predict scalability
of the worst case contention scenario , so the Amdahl's calculation in parallel computing looks like the calculation of  the worst case complexity in computer science?

That's right, cause what is, in your opinion , the purpose of the Amdahl's law if it's not to predict worst case scalability by calculating the maximum bound of time time inside the critical sections and the maximum bound of the time inside the parallel part and doing the Amdahl's calculation to predict the worst case contention scenario
hence the worst case scalability? so i think this will predict that the scalability will be equal or greater to the result that we have
found with the Amdahl's law. This is the purpose of Amdahl's law:
it's also to predict the worst case scalability."


To do a better scalability prediction, you have to use the
same number of threads than the number of cores, and each
thread must run a a separate core, to avoid context switches
and you have to use Locks that lower better the cache-coherence traffic,
such us the scalable Anderson Lock or the scalable MCS Lock
or my scalable and distributed fair Lock, and try to find a maximum
bound of the time inside your critical sections and maximum bound
inside of the time inside the parallel part , and do the Amdahl's calculation with them to find the result of the worst case scalability, so the scalability will equal or be greater to this result. I think  this is the purpose of the Amdahl's law: to predict the worst case scalability.



Thank you,
Amine Moulay Ramdne.



Scoops

  • Full Member
  • ***
  • Posts: 105
Re: Enahnced scalable Anderson Lock
« Reply #13 on: October 10, 2013, 08:36:01 pm »
That's all i need to know, thanks, you made your point, continue talking to yourself, bye.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Enahnced scalable Anderson Lock
« Reply #14 on: October 10, 2013, 09:35:43 pm »

Hello,

Lockfree or not to lockfree , that's the question ...

I have read here and there that lockfree algorithms are hard,
but that's not always the case, if you take a look at the following
FIFO queue that is waitfree on the push() and Lockfree on the pop()

http://pastebin.com/f72cc3cc1

You will read this:

===
public bool pop(out T state) {
      node cmp, cmptmp = m_tail;
      do {
        cmp = cmptmp;
        node next = cmp.m_next;
        if (next == null) {
          state = default(T);
          return false;
        }
        state = next.m_state;
        cmptmp = System.Threading.Interlocked.CompareExchange(ref m_tail, next, cmp);
      } while (cmp != cmptmp);
      return true;
    }
  };

===


Lockfree is easy, you will notice that between the

the "cmp = cmptmp;"

and the:

"cmptmp = System.Threading.Interlocked.CompareExchange(ref m_tail, next, cmp);"

It's as we had a critical section, cause if for example  two threads 
crossed the "cmp = cmptmp;"  , one of them will succeed and another
will retry and loop again, hence this lockfree mechanism is the same as a critical section,
but the difference between Lockfree and Lock algorithms is that
there can be no context switches inside the "cmptmp = System.Threading.Interlocked.CompareExchange(ref m_tail, next, cmp);"
but in lock algorithms you can have a context switch inside the critical section, other than that in lockfree algorithms if you have a problem with a thread and it has stopped right on "cmptmp = System.Threading.Interlocked.CompareExchange(ref m_tail, next, cmp);"
there will be still progress of the other threads , that's not the case
in Lock based algorithms , other than that Lockfree algorithms are
immune to  priority inversion, cause if a lower priority thread
is doing "System.Threading.Interlocked.CompareExchange(ref m_tail, next, cmp);" there will be no context switch , that's not the case with lock based algorithms , lock based algorithms are not immune to priority inversion, so as you have noticed Lockfree algorithms are more resiliente than Lock based algorithms, but this is not the end of the
story , Lockfree algorithms on the other hand can have too much cache-cohency traffic, much more than Lockfre based algorithms that uses scalable Locks, and more than that scalable Locks
are FIFO fair, and this is not the case with Lockfree algorithms , they
are not FIFO fair, hence they can  have starvation, so Lockfree algorithms
are not silver bullet.




Hope you have understood.



Thank you,
Amine Moulay Ramdane.

 





 

TinyPortal © 2005-2018