Hello,
I have implemented the Object Pascal Distributed Reader-Writer Mutex
based on the C++ Distributed Reader-Writer Mutex by Dmitry Vyukov,
in this post i will discuss the TomniMREW reader-writer lock by
Primoz Gabrijelcic.
Lets take a look at the source code and my explaination will follow:
====
unit MREW;
{$IFDEF FPC}
{$ASMMODE intel}
{$ENDIF FPC}
interface
uses sysutils,
classes;
{$I defines.inc}
type
{$IFDEF CPU64}
int = int64;
{$ENDIF CPU64}
{$IFDEF CPU32}
int = integer;
{$ENDIF CPU32}
typecache1 = array[0..15] of longword;
TOmniMREW = class
private
omrewReference: int; //Reference.Bit0 is 'writing in progress' flag
// cache:typecache1;
public
procedure EnterReadLock;
procedure EnterWriteLock;
procedure ExitReadLock;
procedure ExitWriteLock;
end; { TOmniMREW }
implementation
{ TOmniMREW }
function LockedExchangeAdd(var Target: Int; Value: Int): Integer;
asm
{$IFDEF CPU32}
// --> EAX Target
// EDX Value
// <-- EAX Result
MOV ECX, EAX
MOV EAX, EDX
// ECX Target
// EAX Value
LOCK XADD [ECX], EAX
{$ENDIF CPU32}
{$IFDEF CPU64}
// --> RCX Target
// EDX Value
// <-- EAX Result
MOV RAX, EDX
// RCX Target
// RAX Value
LOCK XADD [RCX], RAX
{$ENDIF CPU64}
end;
function CAS(var Target:int;Comp ,Exch : int): boolean;assembler;stdcall;
asm
{$IFDEF CPU64}
mov rax, comp
lock cmpxchg [Target], Exch
setz al
{$ENDIF CPU64}
{$IFDEF CPU32}
mov eax, comp
mov ecx,Target
mov edx,exch
lock cmpxchg [ecx], edx
setz al
{$ENDIF CPU32}
end; { CAS }
?
procedure TOmniMREW.EnterReadLock;
var
currentReference: int;
begin
//Wait on writer to reset write flag so Reference.Bit0 must be 0 than
increase Reference
repeat
currentReference := omrewReference AND NOT 1;
until CAS(omrewReference, currentReference,currentReference + 2);
end; { TOmniMREW.EnterReadLock }
procedure TOmniMREW.EnterWriteLock;
var
currentReference: int;
begin
//Wait on writer to reset write flag so omrewReference.Bit0 must be 0 then
set omrewReference.Bit0
repeat
currentReference := omrewReference AND NOT 1;
until CAS(omrewReference, currentReference,currentReference + 1);
//Now wait on all readers
repeat
until omrewReference = 1;
end; { TOmniMREW.EnterWriteLock }
procedure TOmniMREW.ExitReadLock;
begin
//Decrease omrewReference
LockedExchangeAdd(omrewReference, -2);
end; { TOmniMREW.ExitReadLock }
procedure TOmniMREW.ExitWriteLock;
begin
omrewReference := 0;
end; { TOmniMREW.ExitWriteLock }
end.
===
This method is using the following technique:
currentReference := omrewReference AND NOT 1;
So if omrewReference is 0 or multiple of 2 and is an even number
the reader can enter the MREW lock and increment omrewReference
by -2 , or the writer will enter and increment omrewReference by 1 and
this will stop the reader and the writer from crossing the CAS, and after
that
the writer will wait on the reader to exit by using a repeat loop
like this:
repeat
until omrewReference = 1;
and as you have noticed with this method the writers will not starve
forever,
and thie MREW lock is very fast also, but even if it's very fast it doesn't
scale
cause there is a single point of access on the CAS and this can cause
a lot of contention , also the inter-thread communication can be expensive
if
the reader's time under the MREW lock is not so signifant..
and about Distributed Reader-Writer Mutex, based on the Dmitry Vyukov C++
Distributed Reader-Writer Mutex , I have included the following
Reader-Writer
Mutexes inside this Distributed Reader-Writer mutex:
TOmniMREW a lightweight MREW that is very fast and TMultiReadExclusiveWrite
from JCL and now both of them can scale better with Distributed
Reader-Writer Mutex,
and i have modified the Dmitry Vyukov Distributed Reader-Writer Mutex, in
the first
version i have not used GetCurrentProcessor() but i have used
GetCurrentThreadID(),
and i have also provided you with a second version that scales better, to be
ableto use the second version please use the version2 in defines.inc, i have
given you a test.pas example for the first version and test1.pas for the
second version,
but don't forget to use version2 inside defines.inc, to use the second
version
just uncomment the version2 inside defines.inc and comment version1. I have
also
done a cache line alignement in TOmniMREW, this has allowed Drwlock to
scale better.
I have provided you with the source code, please take a look at the source
code to understand better.The Object Pascal Distributed Reader-Writer Mutex
is
based on the following C++ Distributed Reader-Writer Mutex by Dmitry Vyukov,
read more here:
http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutexI have also modified the Dmitry Vyukov's Distributed Reader-Writer Mutex to
use a variable number of MREWs, you can pass the number of MREWs to the
constructor like this:
drw:=TDRWLOCK.create(100);
You have four methods:
procedure wlock; // same as EnterWriteLock of TOmniMREW
procedure wunlock; // same as ExitWriteLock
procedure rlock; // same as EnterReadLock
procedure runlock; // same as ExitReadLock
and you have to pass the number of MREWs(multiple-readers-exclusive-writer)
to the constructor like this:
drw:=TDRWLOCK.create(200); // here we are creating 200 MEWs
Here is some scalability numbers:
I have used TOmniMREW of the Omnithread library and used only
EnterReadLock() and ExitReadLock() with four threads on a quad cores and
TOmniMREW gave a negative scalability of -5.51x
And when i have used the second version of Distributed Reader-Writer Mutex
using only rlock() and runlock() , it gave me +3.94x scalability with four
threads on four cores. So now it's scaling.
And about the second version , don't forget to initialize the number that
you pass to rlock() and runlock() to 0 before calling rlock() and
runlock() .
In the previous versions i have aligned the array elements on cache line
bounderies like have done Dmitry Vyukov, and it didn't work correctly when i
have tested the second version, so i have thought about that and after that
i have decided to not align the array elements on cache line bounderied but
just
add a cache line padding to TOmniMREW for example and this time it has
worked
perfectly and now the second version is scaling perfectly..
And if you have noticed , there is still a weakness with the Dmitry Vyukov
C++ Distributed Reader-Writer Mutex, cause since he is using
GetCurrentProcessorNumber() he is limiting the array of rwlocks to the
number
of avaiblable cores and this is not good i think , cause if you have many
more
threads than the avaiblable cores and there is high contention this will
cause the
performance to degrade, so i have decided to change that in my
implementation and
i have used a variable number of rwlocks/MREWs so that you can lower more
the
contention and this is better for scalability.
.You can download Distributed Reader-Writer Mutex version 1.03 from:
http://pages.videotron.com/aminer/Thank you,
Amine Moulay Ramdane.