Recent

Author Topic: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?  (Read 1156 times)

kapibara

  • Hero Member
  • *****
  • Posts: 656
Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« on: December 04, 2025, 09:33:25 am »
Hi friends, I asked ChatGPT to write down my question. Both him and me tried hard to implement a LMAX Disruptor queue:

I’m investigating whether a true LMAX-style Disruptor can be implemented correctly in FreePascal (FPC 3.3.1 trunk, Linux x86_64), and I’ve run into several practical issues with the FPC memory model, atomics, and inlined assembly.

For context:

The LMAX Disruptor is a multi-producer, multi-consumer, lock-free ring buffer using:
    • CAS on sequences
    • strict ordering constraints
    • memory fences / acquire-release semantics
    • volatile loads
    • spin-wait loops on sequence counters

My question is:

Is it realistically possible to implement a correct Disruptor-style queue in FPC today?

Specifically I’m trying to understand:
1. Atomic operations
FreePascal provides InterlockedCompareExchange64 and friends, but:
    • using InterlockedCompareExchange64(x,0,0) as an atomic load seems optimized away
    • CAS inside spin-loops sometimes results in stale values here
    • loads inside loops appear to be reordered or cached
    • Interlocked operations are not clearly documented as memory barriers

Is there any reliable way in FPC to perform:
    • a sequentially consistent atomic load?
    • a release-store?
    • an acquire-load?
    • a full memory fence?

2. Memory fences
Trying these:
    • asm mfence end;
    • inline assembly inside inline functions
    • volatile keyword
    • TInterlocked.MemoryBarrier
    • FPC_ATOMICFENCE (not available)

…results in either compilation errors or does not produce correct ordering under high contention.
Does FPC expose any officially supported memory barrier or fence on x86_64 that guarantees visibility between threads?

3. Volatile semantics
C/Java rely on volatile semantics.
FreePascal appears to have no equivalent attribute.
Is there any supported way to force a variable to always be reloaded from memory (not cached in a register)?

4. Inline assembler
Simple inline asm like:
asm lfence end;
inside an inline function produces errors:
assembler not supported inside inline procedure
syntax error in operand
Even when ASMMODE INTEL is enabled.
Is inline assembler inside inline functions doomed in FPC, or am I missing a required mode switch?

5. High-contention CAS correctness
Under real load (multiple producer threads), the CAS spin-wait:
while slot.Sequence <> expected do ...
sometimes never sees updated values (i.e. stale memory reads).
Does FPC guarantee sequential consistency for InterlockedXXXX routines?
And if not, is there any way to enforce it?

Summary
I am trying to determine:
    • Is it feasible to implement a correct, high-performance LMAX Disruptor in FreePascal?
    • If yes, what exact primitives should be used for:
        ◦ atomic loads
        ◦ atomic stores
        ◦ fences
        ◦ volatile reads
    • If no, what are the current limitations?

Any low-level guidance from compiler/runtime developers would be extremely appreciated.
Thanks!
Lazarus trunk / fpc 3.2.2 / Kubuntu 24.04 - 64 bit

jamie

  • Hero Member
  • *****
  • Posts: 7493
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #1 on: December 04, 2025, 12:14:33 pm »
Lots of fancy words for such a simple task
The only true wisdom is knowing you know nothing

Khrys

  • Sr. Member
  • ****
  • Posts: 383
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #2 on: December 04, 2025, 01:24:57 pm »
The notion of acquire & release semantics stems from the C++ memory model (targeting an abstract machine).
As far as I know, FPC doesn't formally define a memory model in the way C++ does.

ReadBarrier(), WriteBarrier()  etc. are a thing and there's a  volatile  intrinsic in trunk, but apart from that I'm afraid you'll have to resort to inline assembly.

Functions that contain inline assembly currently can't be inlined. You can write assembly-only functions like this, though:

Code: Pascal  [Select][+][-]
  1. function Zero(): Integer; assembler; nostackframe;
  2. asm
  3.   xor eax, eax
  4. end;

kapibara

  • Hero Member
  • *****
  • Posts: 656
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #3 on: December 04, 2025, 01:41:57 pm »
@Jamie: Haha, yes I see what you mean, ChatGPT tends to be wordy. It turned out MPMC (multi producer, multi consumer) is not needed at all. And thats lucky because it's probably impossible to implement in Freepascal.

A SPMC (single producer, multi consumer) ringbuffer is what was needed and it's now up an running. Maybe someone can check if it needs optimization:

Code: Pascal  [Select][+][-]
  1. unit RingBufferSPMC;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. interface
  6.  
  7. uses
  8.   Classes, SysUtils;
  9.  
  10. type
  11.   PRingCursor = ^TRingCursor;
  12.   TRingCursor = record
  13.     Sequence: Int64;   // last consumed sequence
  14.   end;
  15.  
  16.   TRingEvent = record
  17.     Sequence: Int64;
  18.     Payload : Pointer; // user-provided pointer
  19.   end;
  20.   PRingEvent = ^TRingEvent;
  21.  
  22.   { TRingBufferSPMC }
  23.  
  24.   TRingBufferSPMC = class
  25.   private
  26.     FSize     : Int64;     // MUST be power of two
  27.     FMask     : Int64;
  28.     FBuffer   : array of TRingEvent;
  29.  
  30.     FNextSeq  : Int64;     // next slot to publish
  31.     FPublished: Int64;     // highest published seq
  32.  
  33.     FConsumers: array of PRingCursor;
  34.  
  35.     function MinConsumerSequence: Int64;
  36.   public
  37.     constructor Create(ABufferSizePowerOfTwo: Integer);
  38.     function  CreateCursor: PRingCursor;
  39.  
  40.     // Producer:
  41.     procedure Publish(APayload: Pointer); // waits if ring is full
  42.  
  43.     // Consumer helpers:
  44.     function  GetEvent(ASequence: Int64): PRingEvent;
  45.     function  LastPublished: Int64;
  46.   end;
  47.  
  48. implementation
  49.  
  50. constructor TRingBufferSPMC.Create(ABufferSizePowerOfTwo: Integer);
  51. var
  52.   I: Integer;
  53. begin
  54.   FSize := ABufferSizePowerOfTwo;
  55.   FMask := FSize - 1;
  56.   SetLength(FBuffer, FSize);
  57.  
  58.   FNextSeq   := 0;
  59.   FPublished := -1;
  60.  
  61.   for I := 0 to FSize - 1 do
  62.   begin
  63.     FBuffer[I].Sequence := -1;
  64.     FBuffer[I].Payload  := nil;
  65.   end;
  66. end;
  67.  
  68. function TRingBufferSPMC.CreateCursor: PRingCursor;
  69. var
  70.   C: PRingCursor;
  71.   N: Integer;
  72. begin
  73.   New(C);
  74.   C^.Sequence := -1;
  75.  
  76.   N := Length(FConsumers);
  77.   SetLength(FConsumers, N+1);
  78.   FConsumers[N] := C;
  79.  
  80.   Result := C;
  81. end;
  82.  
  83. function TRingBufferSPMC.MinConsumerSequence: Int64;
  84. var
  85.   I: Integer;
  86.   C: PRingCursor;
  87. begin
  88.   Result := High(Int64);
  89.   if Length(FConsumers) = 0 then
  90.   begin
  91.     // no consumers: treat as always free
  92.     Result := FNextSeq;
  93.     Exit;
  94.   end;
  95.  
  96.   for I := 0 to High(FConsumers) do
  97.   begin
  98.     C := FConsumers[I];
  99.     if C^.Sequence < Result then
  100.       Result := C^.Sequence;
  101.   end;
  102. end;
  103.  
  104. procedure TRingBufferSPMC.Publish(APayload: Pointer);
  105. var
  106.   Seq, MinSeq: Int64;
  107.   Index      : Int64;
  108. begin
  109.   while True do
  110.   begin
  111.     Seq := FNextSeq;
  112.     MinSeq := MinConsumerSequence;
  113.  
  114.     // backpressure: wait if ring is full
  115.     if (Seq - MinSeq) < FSize then
  116.       Break
  117.     else
  118.       Sleep(0); // yield CPU (option: Sleep(1))
  119.   end;
  120.  
  121.   Index := Seq and FMask;
  122.  
  123.   // write event
  124.   FBuffer[Index].Payload  := APayload;
  125.   FBuffer[Index].Sequence := Seq;
  126.  
  127.   FPublished := Seq;
  128.   Inc(FNextSeq);
  129. end;
  130.  
  131. function TRingBufferSPMC.LastPublished: Int64;
  132. begin
  133.   Result := FPublished;
  134. end;
  135.  
  136. function TRingBufferSPMC.GetEvent(ASequence: Int64): PRingEvent;
  137. var
  138.   Index: Int64;
  139. begin
  140.   Index := ASequence and FMask;
  141.   Result := @FBuffer[Index];
  142. end;
  143.  
  144. end.
  145.  
« Last Edit: December 04, 2025, 01:56:04 pm by kapibara »
Lazarus trunk / fpc 3.2.2 / Kubuntu 24.04 - 64 bit

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12026
  • Debugger - SynEdit - and more
    • wiki
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #4 on: December 04, 2025, 02:13:55 pm »
    • using InterlockedCompareExchange64(x,0,0) as an atomic load seems optimized away

That seems strange. I would consider that a bug. Maybe you can provide an example (as simplified as possible) that demonstrates this to happen?

- Also then at which level of optimization (and using {$Optimize no...} => which part of the optimizer)?.
- Does it happen with all versions of FPC 3.2.2 / 3.2.3 or 3.2.4RC / 3.3.1?

3.2.2 has some bugs in the optimizer.
3.3.1 depends on the exact commit....
3.2.3/3.2.4 do currently not have any that I know

Well, they all have an issues when doing inline with methods that themself contain inlined code....



I do have some code, that uses Interlocked, and Read/WriteBarrier. And I tested it with all sort of optimizations, and afaik it works very well (at least the part related to the interlocked/barrier).
And afaik lots of others are using it, and haven't complained (in a way that would relate to this). That code happens to be in FpDebug, so afaik really lots of people use it. (and it switches off the peephole opt in 3.2.2 for some parts of the code, due to a 3.2.2 bug, but that wasn't affecting the thread parts actually).

At least i386/x86.


About volatile / not using registers.

Afaik (but not documented) taking a pointer to a variable prevents it from being moved to a register. (because the compiler wont know when it may get changed).
And then, if that is correct, globals shouldn't go to registers either.

I don't know, if anything except local vars go into registers at all... But again, not documented, not long term safe.



Did I miss it, or did you not specify your CPU type?

cdbc

  • Hero Member
  • *****
  • Posts: 2573
    • http://www.cdbc.dk
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #5 on: December 04, 2025, 02:28:29 pm »
Hi
@kapibara: You are allocating elements 1 at the time...
If you feel like it, you could take a gander at my little 'IContainers' library, especially I/TCirQueue in 'iqueues.pas', just for inspiration...
Regards Benny
If it ain't broke, don't fix it ;)
PCLinuxOS(rolling release) 64bit -> KDE6/QT6 -> FPC Release -> Lazarus Release &  FPC Main -> Lazarus Main

kapibara

  • Hero Member
  • *****
  • Posts: 656
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #6 on: December 04, 2025, 04:44:49 pm »
@Martin_fr

CPU is Intel i5-11600K, Using only latest trunk (both laz/fpc). Good news: Read/WriteBarrier works fine, I'll bake it into the ringbuffer. Which already works, but the AI thinks Read/WriteBarrier should be used anyway for safety.

Tried to find the test programs with "optimized away" but I made so many and deleted most so unfortunately I cant find.

Thanks @Khrys and @cdbc for suggestions.
Lazarus trunk / fpc 3.2.2 / Kubuntu 24.04 - 64 bit

Martin_fr

  • Administrator
  • Hero Member
  • *
  • Posts: 12026
  • Debugger - SynEdit - and more
    • wiki
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #7 on: December 04, 2025, 05:07:21 pm »
CPU is Intel i5-11600K, Using only latest trunk (both laz/fpc). Good news: Read/WriteBarrier works fine, I'll bake it into the ringbuffer. Which already works, but the AI thinks Read/WriteBarrier should be used anyway for safety.

I haven't checked the docs, but I am certain that Interlocked do the appropriate barriers. On Intel they translate directly into asm "lock" statements. On other platform depending on what the CPU supports, they may translate into a critical section. (IIRC, I saw that somewhere in the compiler source / not 100% sure).

That means explicit Read-, Write- or ReadWriteBarriers are only needed if you use other statements to access the variable/memory.

And that is where you may have to be careful about optimization.
AFAIK, if doing dirty reads
Code: Pascal  [Select][+][-]
  1. while SomeLockVar > 0 do
  2.   for i := 0 to 99999 do ; // spent a bit of time
can get optimized to nothing.
For the compiler both loops are empty.

BTW you can use stuff like  {$optimization NOregvar} (check docs for exact wording) to avoid stuff going into registers. Not sure if {$PUSH} works with that.....
And same to disable peephole, which can remove a lot of code, if that code does nothing.
And some others.



Quote
Tried to find the test programs with "optimized away" but I made so many and deleted most so unfortunately I cant find.
Well, too bad. I can only think of 2 explanations.
It was either some oversight on your side, or a compiler bug. If the latter it would have been good to test if it still exists in 3.2.3 or 3.3.1.

My advice for stuff like that, use a revision control system. If I get an issue like that, I usually (well not always, but if I suspect it could be the compiler) create a local branch, and snapshot everything. Then I can keep working, and later return to it (also give it a fresh start of testing, on the next day).



jamie

  • Hero Member
  • *****
  • Posts: 7493
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #8 on: December 05, 2025, 12:41:14 am »
This does not look much like Tight and effective code?

https://github.com/hedzr/go-ringbuf

jamie
The only true wisdom is knowing you know nothing

Khrys

  • Sr. Member
  • ****
  • Posts: 383
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #9 on: December 05, 2025, 07:19:25 am »
CPU is Intel i5-11600K

Which already works

I'll wager that x86 CPUs' memory model is too strong for properly testing lock-free code; whenever I need to test such things I run it on an ARM machine as well (actually just my phone with  fpc  installed inside Termux :D)
The final boss regarding memory ordering is probably the DEC Alpha, though... :o

Thaddy

  • Hero Member
  • *****
  • Posts: 18707
  • To Europe: simply sell USA bonds: dollar collapses
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #10 on: December 05, 2025, 07:39:42 am »
memory fencing is implemented in the system unit
- readbarrier
- writebarrier
- readwritebarrier
- readdependencybarrier.

These operate directly on the CPU if supported.
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

jamie

  • Hero Member
  • *****
  • Posts: 7493
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #11 on: December 05, 2025, 03:40:34 pm »
I find with all these nice features, for example the "InterlockedExchange" etc, how can they actually work properly if the items you are exchanging are being prepped for the parameters of a function call to the actual swap code without some block?

 In otherwards, when I examine the ASM code, I don't see inlining or any compiler intrinsic taking place to block any possible influence of registers being changed before the exchange?

 even the TCritcalSection code can suffer from this, however, The TRTLCriticalSection seems to be better immune to this.

Jamie

 


 
The only true wisdom is knowing you know nothing

PascalDragon

  • Hero Member
  • *****
  • Posts: 6283
  • Compiler Developer
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #12 on: December 08, 2025, 11:00:36 pm »
I find with all these nice features, for example the "InterlockedExchange" etc, how can they actually work properly if the items you are exchanging are being prepped for the parameters of a function call to the actual swap code without some block?

 In otherwards, when I examine the ASM code, I don't see inlining or any compiler intrinsic taking place to block any possible influence of registers being changed before the exchange?

Normally you'd use InterlockedCompareExchange or AtomicCmpXchg, because the exchange will only happen if the value you swap is indeed the one you expected. And usually you loop until this works out (normally that should only be few iterations if any).

Thaddy

  • Hero Member
  • *****
  • Posts: 18707
  • To Europe: simply sell USA bonds: dollar collapses
Re: Possible LMAX Disruptor (MPMC ring buffer) in FreePascal?
« Reply #13 on: December 09, 2025, 07:53:41 am »
Note that reg-reg operations and instructions like mov rax, [mem] or mov [mem], rax for naturally-aligned addresses are always  atomic, so the compiler will/may optimize them. It may be that is what you see in the assembler output?
Can you give a small example where you think you saw it and think it was wrong? If it is any of the above, don't worry.
« Last Edit: December 09, 2025, 07:58:34 am by Thaddy »
If Europe sells their USA bonds the USD will collapse. Europe can affort that given average state debts. The USA can't affort that. Just an advice...

 

TinyPortal © 2005-2018