You helped me finish a lock free producer / consumer ring buffer] implementation I needed ! ( at least after a lot of tests, I think it is lock free .... )
Well you don't call any "locks", "critical sections" or similar. So it's lock free.
It probably is dead-lock free too. Though that depends on how the code is used. After all, you don't have a "wait" method. Just an "Empty" and "Full" property. And the calling code needs to wait on those.
Since you say "lock free", I assume there are multiple threads using this?
Is that ONE, and no more than just ONE thread for writing?
And
Is that ONE, and no more than just ONE thread for reading?
I had a look at the sources...
Assuming it is ONE/ONE thread...
Even then, it's not save....
function TRingBuffer.GetEmpty: boolean; inline;
begin
Result := FReadIndex = FWriteIndex;
end;
GetEmpty may be ok, if you don't care that it can falsely report "empty", after all that is no different to reporting empty, but something being pushed, while the caller still is looking at the boolean result of this. (and the same for falsely reporting "not empty")
function TRingBuffer.GetCapacity: PtrUInt; inline;
begin
if FReadIndex > FWriteIndex then
Result := FWriteIndex + (PtrUInt.MaxValue - FReadIndex)
else
Result := FWriteIndex - FReadIndex;
end;
That can definitely go wrong.
- If you ore the ONLY reader, then between the "if greater" and the "Result :=" the other thread can have changed FWriteIndex.
- And if you are the ONLY writer, then it's vice versa.
- And if you have multiple readers or writers ....
If you know (reader or writer) that one value is save => get a copy of the other value into a local var => so it will not change.
That however means, you must know if GetCapacity is called by the Reader or Writer.
Well actually, get a copy of both, and it works never mind if you are reader or writer (so long as there is only one of each)
function TRingBuffer.GetCapacity: PtrUInt; inline;
var r, w: PtrUInt;
begin
r := FReadIndex;
w := FWriteIndex;
{$PUSH}{$R-}{$Q-}
result := (w - r) and and (FMemorySize - 1); // it seems from MaskIndex, that memsize is a 2^n value.
{$POP}
end;
Example: MemSize = 1024; ; WriteIndex = 1020; ReadIndex = 1021
w - r = -1 // -1 in unsigned $ffffffffff // $ffffffffff and 1023 = 1023
And indeed 1023 bytes are in use.
Of course, at the time of the "Result :=" the original may have changed... But it could change at any time after GetCapacity, while you still act on the result...
However:
- If the writer get GetCapacity, the writer only cares that the value will not get less. And *the* (or even any) reader will only increase the Capacity.
- If a reader calls GetCapacity, it only cares that it is less than the total available maximum (i.e. there is data to be read). And the writer can only reduce the value.
So that is ok in both cases.
(Not sure if GetCapacity needs a ReadBarrier)
I haven't looked very deep, and also it can't be told without knowing how this is used by calling code....
But if this is running on any modern Intel/Amd CPU...
They do some unexpected stuff.
They may access variable from memory in a different order, than your code specifies. (Even if FPC kept the order in the generated asm).
They will honour dependencies (like "if (foo <> nil) and (foo.xxx) then".
But with your indexes, they could access them in unexpected order.
It is a while since I looked at this.... I may have it wrong again (it is mind boggling). But IIRC.
procedure TRingBuffer.WriteByte(const AValue: byte);
begin
pbyte(FMemoryData)[MaskIndex(FWriteIndex)] := AValue;
WriteBarrier; // Make sure the above byte is in memory, before you increment the index
{$PUSH}
{$Q-}
Inc(FWriteIndex);
{$POP}
end;
function TRingBuffer.ReadByte: byte;
begin
Result := pbyte(FMemoryData)[MaskIndex(FReadIndex)];
ReadBarrier; // make sure you actually read this, before the other thread gets the increased ReadIndex, and thinks it can overwrite the data.
{$PUSH}
{$Q-}
Inc(FReadIndex);
{$POP}
end;
And more stuff like this.
Using "Interlocked...." commands (if they translate directly into asm, like they do on intel/amd), is really a lot simpler.
But you may have one or two extra barriers, and in an extreme case blocking you cpu from optimizing by computing ahead, and loosing one or two cpu cycles....
If you have more than one thread either reading or writing (on the same buffer), then there is plenty more...