Recent

Author Topic: A new and scalable FIFO fair lock  (Read 29710 times)

aminer

  • Hero Member
  • *****
  • Posts: 956
My other new programming invention
« Reply #30 on: April 26, 2014, 05:36:45 pm »
Hello,

I have told you yesterday that i was thinking about my next programming invention, so here is one of my other new programming inventions, i have invented it today, my new algorithm is a concurrent FIFO queue that uses only an atomic increment on the push side and an atomic increment on the pop side, and it's starvation-free and it minimizes efficiently the cache-coherence traffic and it reduces efficiently the contention and it has more parallelism than the two locks algorithm.

Here is the explanation of my new algorithm:

http://pages.videotron.com/aminer/CQueue2.htm


And you can download the source code here:

http://pages.videotron.com/aminer/


Authors: Amine Moulay Ramdane


Description:

A very fast concurrent FIFO queue that satisfies many requirements: it has more parallelism than the two locks algorithm, it is FIFO fair , it's starvation-free on the push and the pop side, and it has more parallelism than the two locks algorithm, it minimizes efficiently the cache-coherence traffic and it is energy efficient on the pop() side when you set the wait parameter to true in the construtor: when there is no items in the queue it will not spin-wait , but it will block wait on my SemaMonitor, and when the wait parameter of the constructor is set to false, and it uses only an atomic increment on the push() side and an atomic increment on the pop() side, so it's very fast,  this concurrent FIFO queue is limited to 1000 threads on the push() side, if you want to higher that, just modify the "margin" constant in the source code, and the number of threads in the pop() side is limited by the length of the array.

You have 3 options for setting the kind of locks, just look inside defines.inc , if you want to set it for my array based lock called AMLock just uncomment the option AMLock inside defines.inc, if you want to set it for Ticket Spinlock just uncomment the option TicketSpinlock ,If you want to set it for Spinlock just uncomment the option Spinlock, the Spinlock gives better performance under contention it scored 6.4 millions of transactions per second on my 2.4 GHz Quadcore, the Ticket Spinlock option scored 3.2 millions of transactions per second on my 2.4 GHz Quadcore, the Spinlock scaled even if the number of threads are greater than the number of cores, the TicketSpinlock and AMLock don't scale when the number of threads are greater than the number of cores, the Ticket Spinlock and scalable AMLock are optimal when the number of threads are equal to the number of cores.

The size of the queue must be passed to the constructor and it must be a power of 2.

Look here at my reasonning to prove to you that my algorithm is correct: Concurrent FIFO queue.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems



Thank you,
Amine Moulay Ramdane.




aminer

  • Hero Member
  • *****
  • Posts: 956
Starvatoin freedom and the two locks algorithm
« Reply #31 on: April 27, 2014, 06:51:35 pm »
Hello,


I have just updated my concurrent FIFO queue that uses
a two locks algorithm, before i was doing something like this
on the push() side:

==

 function TWQueue.push(tm : long):boolean;
begin

result:=true;
lock1.enter;
if getlength >= fsize
  then
      begin
          result:=false;
         lock1.leave;
         exit;
      end;

    setObject(head,tm);
    head:=(head+1);
   lock1.leave;
if fwait then sema.signal;
end;

==

But the above pop() method is not starvation-free , when the queue is full and the "if getlength >= fsize" becomes true , the threads
will loop backa, hence this is not starvation-free, to be fully starvation-free here is what look my updated push() method:

==
function TWQueue.push(tm : long):boolean;
begin

result:=true;
lock1.enter;
if getlength >= fsize
  then
      begin
        while getlength >= fsize
         do sleep(0);
     end;

    setObject(head,tm);
    head:=(head+1);
   lock1.leave;
if fwait then sema.signal;
end;

===
 

When the a thread enters and the "if getlength >= fsize" is
true it will spin-wait inside the locked region until 
"if getlength >= fsize" is false, and when you set
the wait parameter to true my concurrent FIFO queue will become
fully starvation-free on both the push() side and on the pop() side.


You can download my updated concurrent FIFO queue version 1.02 that uses the two locks algorithm from:

http://pages.videotron.com/aminer/



Thank you,
Amine Moulay Ramdane.


aminer

  • Hero Member
  • *****
  • Posts: 956
Re: Starvatoin freedom and the two locks algorithm
« Reply #32 on: April 27, 2014, 06:56:34 pm »
Hello,

You have to set the wait parameter of the constructor to "true"
and set the option inside the file "defines.inc" to TicketSpinlock
or AMLock, so that my concurrent FIFO queue becomes fully
starvation-free.


Please take a look a=t the source code here:

http://pages.videotron.com/aminer/



Thank you,
Amine Moulay Ramdane.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #33 on: April 30, 2014, 10:16:07 pm »
Hello...


I have finally improved my concurrent FIFO queue and it's scoring now a much better throughput on the pop() side,
it was scoring before 3.4 millions of pop() transactions per second on my 2.4 GHz Quadcore, and now it has improved
to 5.7 millions of pop transactions per second ! it is now faster on the pop() side than the Chris Thomasson
concurrent FIFO queue that uses the bakery algorithm and that has scored 5.0 millions of pop() transactions per second
on my 2.4 GHz Quadcore, i have also improved it cause it now minimizes the cache-coherence traffic more efficiently than my previous algorithm.

I have included inside the zipfile two variants of my concurrent FIFO queue, there is a file called WQueue.pas that is only starvation-free on the pop() side and there is a file called WSFQueue.pas that is fully starvation-free.

If you want to look at my new algorithm, here it is, i have changed the previous algorithm on the pop() side:

http://pages.videotron.com/aminer/CQueue2.htm

You can download my very fast concurrent FIFO queue version 1.1
from:

http://pages.videotron.com/aminer/


Th Chriss Thomasson algorithm is here:

https://groups.google.com/forum/#!topic/lock-free/acjQ3-89abE/discussion



Thank you,
Amine Moulay Ramdane.


aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #34 on: May 06, 2014, 02:34:00 am »

Hello,


I have invented a new and more advanced algorithm of a very fast concurrent FIFO queue that has more parallelism than my previous algorithms, and this new algorithm scales better even if the number of threads are greater than the number of cores and my new algorithm has scored 20% more throughput on the pop() side than the following Chriss Thomasson algorithm that uses the bakery algorithm and the same
throughput on the push() side than the following Chriss Thomasson algorithm.


Here is the Chriss Thomasson algorithm:

https://groups.google.com/forum/#!topic/lock-free/acjQ3-89abE/discussion


Here is my new algorithm, the pop() method have changed:

http://pages.videotron.com/aminer/CQueue3.htm



A very fast concurrent FIFO Queue version 1.0

Authors: Amine Moulay Ramdane


Description:


A very fast concurrent FIFO queue that satisfies many requirements: it has more parallelism than the two locks algorithm, it is FIFO fair , it's starvation-free and it minimizes efficiently the cache-coherence traffic and it is energy efficient on the pop() side when you set the wait parameter to true in the construtor: when there is no items in the queue it will not spin-wait , but it will block wait on my SemaMonitor, and when the wait parameter of the constructor is set to false it uses only an atomic increment on the push() side and an atomic increment on the pop() side, so it's very fast, this concurrent FIFO queue is limited to 1000 threads on the push() side, if you want to higher that, just modify the "margin" constant in the source code, and the number of threads in the pop() side is limited by the length of the array, this very fast concurrent FIFO queue scales better even of the number of threads are greater than the number of cores.

You have 3 options for setting the kind of locks, just look inside defines.inc , if you want to set it for my array based lock called AMLock just uncomment the option AMLock inside defines.inc, if you want to set it for Ticket Spinlock just uncomment the option TicketSpinlock ,If you want to set it for Spinlock just uncomment the option Spinlock, the Spinlock option scored 8.5 millions of transactions per second on my 2.4 GHz Quadcore, the Ticket Spinlock option scored 8.5 millions of transactions per second on my 2.4 GHz Quadcore.

The size of the queue must be passed to the constructor and it must be a power of 2.

I have included inside the zipfile two variants of my concurrent FIFO queue, there is a file called WQueue.pas that is only starvation-free on the pop() side and there is a file called WSFQueue.pas that is fully starvation-free.

The size of the queue must be passed to the constructor and it must be a power of 2.



You can download the source code of my new algorithm of a very fast concurrent FIFO queue from:

http://pages.videotron.com/aminer/

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux (x86)...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems



Thank you,
Amine Moulay Ramdane.

vfclists

  • Hero Member
  • *****
  • Posts: 1147
    • HowTos Considered Harmful?
Re: A new and scalable FIFO fair lock
« Reply #35 on: May 06, 2014, 12:50:05 pm »
Hi,
I have noticed your regular updates about your fair lock and other concurrency related issues and would like to know what kind of areas they are applicable to. Can you give me an idea of what I might need something like that for?

Seeing these topics make me feel my Computer Science knowledge is inadequate.


Hello,

I was thinking more all those days about my scalable array based Lock
but the problem with it is that each element on the array
must have the same size of as the size of a cache-line, hence my scalable array based lock takes more memory, so i have decided to come with another new scalable FIFO fair lock that do not take too much memory and that is scalable, and you have to be smart please and have to know that on very small critical sections the TicketSpinlock is 2X times faster than my scalable FIFO fair lock, but with a little bit bigger critical section my scalable FIFO fair lock will have almost the same speed as the TicketSpinlock on a Quad Core, but will be faster when you will use more cores with mores threads with a little bigger critical section.


Description:

A scalable FIFO fair lock.

Please take a look a the test.pas Object Pascal demo inside the zipfile, compile and run it...


You can download scalable FIFO fair lock from:

http://pages.videotron.com/aminer/


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...


Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems



Thank you,
Amine Moulay Ramdane.
Lazarus 3.0/FPC 3.2.2

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #36 on: May 07, 2014, 02:44:39 am »
Hello,

I have improved more my previous algorithm of a very fast concurrent FIFO queue, the getlength() method was generating two much cache-line transfers, so i have optimize it more and getlength() is now generating much less cache-line transfers, so the throughput have improved and it
my new and improved algorithm has now 50% better throughput on the pop() side than the following Chris Thomasson concurrent FIFO queue that uses the bakery algorithm, and it has the same throughput on the push() side than the following Chriss Thomasson concurrent FIFO Queue.

Here is the Chris Thomasson algorithm:

https://groups.google.com/forum/#!topic/lock-free/acjQ3-89abE/discussion


And here is my new and improved algorithm:

http://pages.videotron.com/aminer/CQueue3.htm

I have included inside the zipfile two variants of my concurrent FIFO queue, there is a file called WQueue.pas that is only starvation-free on the pop() side and there is a file called WSFQueue.pas that is fully starvation-free.

You can download my new and improved algorithm of a very fast concurrent FIFO queue version 1.2 from:


http://pages.videotron.com/aminer/



Thank you,
Amine Moulay Ramdane.


Jurassic Pork

  • Hero Member
  • *****
  • Posts: 1290
Re: A new and scalable FIFO fair lock
« Reply #37 on: May 07, 2014, 02:54:11 am »
hello Aminer,

have you seen  :o the vfclists 's question ? i have the same question  :-[   
Quote
Can you give me an idea of what I might need something like that for?
« Last Edit: May 07, 2014, 02:59:11 am by Jurassic Pork »
Jurassic computer : Sinclair ZX81 - Zilog Z80A à 3,25 MHz - RAM 1 Ko - ROM 8 Ko

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #38 on: May 07, 2014, 03:29:08 am »

vfclists wrote:
>Hi,
>I have noticed your regular updates >about your fair lock and other >concurrency related issues and would >like to know what kind of areas they are >applicable to. Can you give me an idea of >what I might need something like that for?

>Seeing these topics make me feel my >Computer Science knowledge is >inadequate.


Jurassic Pork write:
>hello Aminer,

>have you seen  :o the vfclists 's question ? i >have the same


The concurrency area is very interresting,
i will explain to you more some concepts so that you  understand my scalable synchronization algorithms..

For exemple when a mutating integer variable (a variable that change) is used  from multiple threads, the content of this variable have to be transfered from one
core to another core, and this genrate cache-line transfer from one cor to the other, and this a cache-line transfer is expensive, so you have to optimize more your algorithm to reduce the number of variables that generate cache-line transfer
cause those variables will generate more contention and will slow your algorithm,
and if you are spin-waiting waiting for a specific content of a variable that is mutating you have to minimize efficienly the cache-coherence traffic, that means you have to do your best to transform your algorithm so that this spin-waiting on a specific content of a variable will  generate less cache-line transfers, this is what is doing my algorithms, my algorithms of my scalable sychronization algorithms and my algorithms of  efficient concurrent FIFO queues are efficient in the sense that they reduce the cache coherence traffic and they use less variables that generate cache-line transfers.


My scalable array based Lock called AMLock is a scalable Lock, scalable in the sense that with more and more cores its throughput will not drop..


My scalable RWLock is a sacalable and fast Multiple-Readers-Exclusive-Writer Lock that works across processes and threads,
scalable means that the readers will be run in parallel , so you will get more performance by parallelizing the reader side, and the reader side is scalable and efficient.

My new algorithm of a very fast concurrent FIFO queue here:

http://pages.videotron.com/aminer/CQueue3.htm


It is a concurrent FIFO queue that uses less variables that generate cache-line transfers and it minimizes the cache-coherence traffic.


Please take a look at my other projects here:

http://pages.videotron.com/aminer/



Thank you,
Amine Moulay Ramdane.






















aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #39 on: May 07, 2014, 04:17:25 am »
vfclists wrote:
>Hi,
>I have noticed your regular updates about your fair lock and other >concurrency related issues and would like to know what kind of areas >they are >applicable to. Can you give me an idea of >what I might need >something like that for?

>Seeing these topics make me feel my Computer Science knowledge is >inadequate.

Jurassic Pork wrote:
>hello Aminer,
>have you seen  :o the vfclists 's question ? i have the same


Read here:

http://forum.lazarus.freepascal.org/index.php/topic,24210.30.html


And my answer:

I have wrote:

"The concurrency area is very interresting, i will explain to you more some concepts so that you  understand my scalable synchronization algorithms..

For exemple when a mutating integer variable (a variable that change) is used  from multiple threads, the content of this variable have to be transfered from one core to another core, and this generate cache-line transfer from one cor to the other, and this a cache-line transfer is expensive, so you have to optimize more your algorithm to reduce the number of variables that generate cache-line transfers cause those variables will generate more contention and will slow your algorithm, and if you are spin-waiting waiting for a specific content of a variable that is mutating you have to minimize efficienly the cache-coherence traffic, that means you have to do your best to transform your algorithm so that this spin-waiting on a specific content of a variable will generate less cache-line transfers, this is what is doing my algorithms, my algorithms of my scalable sychronization algorithms and my algorithms of  efficient concurrent FIFO queues are efficient in the sense that they reduce the cache coherence traffic and they use less variables that generate cache-line transfers."


I will also add something important:

If you look at my very fast concurrent FIFO queue here:

http://pages.videotron.com/aminer/CQueue3.htm



As you have noticed i am not using a lock around the push()
and another lock around the pop(), and you have to understand
that for example the "setObject(lastHead,tm)" will be executed
in parallel and each thread will write the tm variable
in its write-back cache , that means it will not write immedialy
the content of the variable to the memory , but will write it latter,
so this parallelism will higher the throughput and this is what i have noticed on my benchmarks , i have got more throughput than the two-lock algorithm, but when you are using a lock around the push()
and a lock around the pop() you will not get this parallelism
so this will drop the throughput... and that's the same for
the pop() method , the "if fcount1^[lastTail and fMask].flag=1"
can be executed in parallel and the inter-communication between
the cores can be done in parallel so this will higher the throughput...
This is  why i have told you that my new algorithm of a very fast concurrent FIFO queue has more parallelism than the two-lock algorithm , and it has a much better throughput, and it will scale better with more and mor cores.


Please take a look at my other projects here:

http://pages.videotron.com/aminer/



Thank you,
Amine Moulay Ramdane.

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #40 on: May 07, 2014, 06:29:48 pm »
Hello,


I have set another website , you will find all my good projects here,  you are welcome:

https://sites.google.com/site/aminer68/



Thank you,
Amine Moulay Ramdane.

Scoops

  • Full Member
  • ***
  • Posts: 104
Re: A new and scalable FIFO fair lock
« Reply #41 on: May 07, 2014, 09:34:30 pm »
Hello,

How do you actually benchmark you code ?, compared to what  ?, i have tried but i can't
get my head around your posts or your code, with the pop this and push that and threads
and all the other gobbledegook, ok i'm not great at programming  and i want to learn i'm
sorry but i need a decent example of how this shit works or i will never use it, and i'm not
alone, as you can can see from your posts,,,,, you are talking to yourself, i don't say that
to be negative or agressive, i have asked before and i will ask again give me an app which
uses your code to help me.  (Or anyone else, please), The last time i asked you, you said that
you programmed a library not an app , i'm OK with that, but i'm still stuck, and others i think?

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #42 on: May 07, 2014, 09:49:31 pm »
Hello,

I have just updated my Winmenus to version 1.3, i have corrected
a bug and now it is stable, and now you can use the right arrow
and left arrow to scroll on the left or on the right, i have
also included a test.pas example it's a kind of file manager for
windows, and i have included a test1.pas example that is portable
on other systems.


WinMenus version 1.3
Author: Amine Moulay Ramdane


Description:

Drop-Down Menu widget using the Object Pascal CRT unit. Please look at the test.pas(a kind of file manager for windows) and test1.pas  examples inside the zip file.

Use the 'Delete' on the keyboard to delete the items
Use the 'Insert' on the keyboard to insert the items
and use the 'Up' and 'Down' and 'PageUp and 'PageDown' to scroll ..
and use the 'Tab' on the keyboard to switch between the Drop Down Menus
and 'Enter' to select an item..
and the 'Esc' on the keyboard to exit..
and the 'F1' on keyboard to delete all the items from the list and right arrow and left arrow to scroll on the left or on the right


You can download Winmenus 1.3 from:

https://sites.google.com/site/aminer68/winmenus


Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux , Unix...

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

-dUnix for Linux , Unix and Mac OSX

-dWindows for windows

Required Delphi switches: -DMSWINDOWS -$H+

For Delphi 7 to 2007 use -DDelphi switch

For Delphi XE-XE6 use the -DXE switch



Thank you,
Amine Moulay Ramdane.



Scoops

  • Full Member
  • ***
  • Posts: 104
Re: A new and scalable FIFO fair lock
« Reply #43 on: May 07, 2014, 09:52:51 pm »
Hello,

How do you actually benchmark you code ?, compared to what  ?, i have tried but i can't
get my head around your posts or your code, with the pop this and push that and threads
and all the other gobbledegook, ok i'm not great at programming  and i want to learn i'm
sorry but i need a decent example of how this shit works or i will never use it, and i'm not
alone, as you can can see from your posts,,,,, you are talking to yourself, i don't say that
to be negative or agressive, i have asked before and i will ask again give me an app which
uses your code to help me.  (Or anyone else, please), The last time i asked you, you said that
you programmed a library not an app , i'm OK with that, but i'm still stuck, and others i think?

aminer

  • Hero Member
  • *****
  • Posts: 956
Re: A new and scalable FIFO fair lock
« Reply #44 on: May 07, 2014, 10:37:58 pm »

Hello,

Winmenus is event driven, i have to explain all to you to understand more...


At first you have to create your Widget menus by executing something like this:

Menu1:=TMenu.create(5,5);


This will create a Widget menu at the coordinate (x,y) = (5,5)

After that you have to set your callbacks,
cause my Winmenus is event driven, so
you have to do it like this:

Menu1.SetCallbacks(test,insert,updown);


The SetCallback() method will set your callbacks, the first callbacks parameter is the callback that will be execute when the "Enter" key is pressed and here it is the "test" function , the second callback
is the callback that will executed when  the insert key is pressed and here above it is the function "insert',  and the third callback is the callback that will called when the up and down keys are pressed and here above it is the function "updown" , the remaining callbacks that you can
assign are the keys from F1 to F12.

After that you can add your items to the Menu by calling the AddItem()
method like this:

Menu1.AddItem(inttostr(i));


After that you will enter a loop like this , the template of this loop must look like the following, that's not difficult to understand:

Here it is:

===
repeat

textbackground(blue);
clrscr;
menu2.execute(false);
menu1.execute(false);

case i mod 2 of

1: ret:=Menu1.Execute(true);
0: ret:=Menu2.Execute(true);
end;
if ret=ctTab then inc(i);
 
 until ret=ctExit;

menu1.free;
menu2.free;

end.

==

When you execute menu1.execute(false);
with a parameter equal to false my Winmenus widget will draw your menu
without waiting for your input and events, when you set the parameter of the execute() method to true it will wait for your input and events, if the parameter of the execute method is true and the returned value of  the execute method is ctTab  that means you have pressed on the Tab key.. if the returned value is ctExit
that means you have pressed on the Escape key to exit.


That's all, please look at the test1.pas
example inside the zip and you will understand my Winmenus , it is very easy to work with.
 
You can download Winmenus 1.3 from:

https://sites.google.com/site/aminer68/winmenus



Thank you,
Amine Moulay Ramdane.



 

TinyPortal © 2005-2018