Thanks, this looks really good! (I haven't tested it.) I didn't know about the
AddRef operator. Some nitpicks:
Runs on both Windows and Linux (x86_64)
It doesn't, because InterlockedIncrement (etc) is part of the Windows API. Since this code is only running on x86-64, which has a fairly strong memory model (Intel® 64 and IA-32 Software Developer's Manual, volume 3 chapter 9) we
should be able to use the assembly instructions
MOV for reading,
LOCK INC for incrementing, and
LOCK DEC for decrementing. Don't quote me on this, though: I'm not an x86-64 assembly programmer, and certainly should not be trusted as an authority for mission-critical code.
If you do this, you
must check the flags in
TThunk.DecreaseReferenceCount (perhaps indirectly – see
Why did InterlockedIncrement/Decrement only return the sign of the result? for the NT 3.51 / 95 design).
Do not read the memory a second time to check it. Why? Assume there are two references, and consider the following sequence:
- Thread A decreases reference count (2 to 1)
- Thread B decreases reference count (1 to 0)
- Thread B detects ReferenceCount = 0, frees the resources.
- Thread A detects ReferenceCount = 0: double free!
This is not a bug in your code, only one that might be introduced by the careless programmer. If you do a straightforward
InterlockedIncrement shim (see
How does InterlockedIncrement work internally?), I
believe your code should work fine.
Your debug code performs non-atomic reads, which is UB in some high-level languages and
might be a bug on processors with a weaker memory model (but I think is fine on x86-64). For efficiency on some other architectures, you should use
InterlockedIncrementAcquire and
InterlockedDecrementRelease (does not affect x86-64).
Your code assumes that
mmap and
mprotect are implemented as numbered syscalls. This is a valid assumption for Linux, but
most UNIX-like OSs don't have a stable syscall ABI, if they even implement these as discrete syscalls. Seeing as you're only using POSIX functions, there's no need to restrict yourself to Linux! Adding a libc fallback mode for
{$ELSEIF UNIX} might be nice. Alternatively, you should make it explicitly fail to compile on unsupported systems.
Fulfillment of W^X - the code buffer is never writable and executable at the same time
Your implementation is unsafe: we can read
.Code, access
.Data, then try to use
.Code and get fireworks. Rather than storing
Executable: Boolean, could you store a separate "executable reference count", and then forbid
.Data accesses while there's an outstanding
.Code access? (You could implement that by having
.Code return an advanced record wrapper around a
PThunkInstance (or, perhaps better, a
^SizeInt Pointer pair), though I'm not sure whether this would end up expensive. It's free in Rust, but
rustc does heavier optimisations than
fpc.)