Thanks a lot for the detailed feedback! I think I overreached a bit with my code and should've stuck to a minimal example of a new reference-counted type specifically in Pascal (which operators to override etc.) and put a warning on my lack of expertise
Runs on both Windows and Linux (x86_64)
It doesn't, because InterlockedIncrement (etc) is part of the Windows API. Since this code is only running on x86-64, which has a fairly strong memory model (Intel® 64 and IA-32 Software Developer's Manual, volume 3 chapter 9) we should be able to use the assembly instructions MOV for reading, LOCK INC for incrementing, and LOCK DEC for decrementing. Don't quote me on this, though: I'm not an x86-64 assembly programmer, and certainly should not be trusted as an authority for mission-critical code.
If you do this, you must check the flags in TThunk.DecreaseReferenceCount (perhaps indirectly – see Why did InterlockedIncrement/Decrement only return the sign of the result? for the NT 3.51 / 95 design). Do not read the memory a second time to check it. Why? Assume there are two references, and consider the following sequence:
- Thread A decreases reference count (2 to 1)
- Thread B decreases reference count (1 to 0)
- Thread B detects ReferenceCount = 0, frees the resources.
- Thread A detects ReferenceCount = 0: double free!
This is not a bug in your code, only one that might be introduced by the careless programmer. If you do a straightforward InterlockedIncrement shim (see How does InterlockedIncrement work internally?), I believe your code should work fine.
I intended to use FPC's own version of this function (unfortunately named exactly the same as the Windows one), which so far hasn't provided me with any such nasty surprises - IOW I didn't even think of this as a weak point
Your debug code performs non-atomic reads, which is UB in some high-level languages and might be a bug on processors with a weaker memory model (but I think is fine on x86-64). For efficiency on some other architectures, you should use InterlockedIncrementAcquire and InterlockedDecrementRelease (does not affect x86-64).
Yeah, I should've removed the "debug code" prior to uploading... it was just a dirty hack to verify the reference-counting aspect, with no considerations about multi-threading
Your code assumes that mmap and mprotect are implemented as numbered syscalls. This is a valid assumption for Linux, but most UNIX-like OSs don't have a stable syscall ABI, if they even implement these as discrete syscalls. Seeing as you're only using POSIX functions, there's no need to restrict yourself to Linux! Adding a libc fallback mode for {$ELSEIF UNIX} might be nice. Alternatively, you should make it explicitly fail to compile on unsupported systems.
I regret including this part the most... being inexperienced with non-toy UNIX development (especially in Free Pascal, and only considering Linux) and being in a rush when I wrote this, when I couldn't find an FPC declaration of
mprotect within a few Google searches, I jumped straight to the generic syscall wrapper
Fulfillment of W^X - the code buffer is never writable and executable at the same time
Your implementation is unsafe: we can read .Code, access .Data, then try to use .Code and get fireworks. Rather than storing Executable: Boolean, could you store a separate "executable reference count", and then forbid .Data accesses while there's an outstanding .Code access? (You could implement that by having .Code return an advanced record wrapper around a PThunkInstance (or, perhaps better, a ^SizeInt Pointer pair), though I'm not sure whether this would end up expensive. It's free in Rust, but rustc does heavier optimisations than fpc.)
I was well aware of this... should've at least pointed it out
I added a warning to my original reply for future visitors of this thread. I'll try not to oversell the quality of my answers in the future