I noticed the InterlockedExchange isn't intrinsic which should simply with 1 or 2 lines of code inserted in the ASM stream.
At least on a Intel type processor.
I should look at Delphi, it seems that I've done this before and remember seeing it inlined!
Only in very old versions maybe, before NUMA x86. but since afaik neither Delphi nor FPC inline assembler functions that is somewhat doubtful anyway.
But newer versions also support server processors that support multiple memory systems which need to be over an API call, so that the NUMA aware kernel can correct it. If you have a non NUMA kernel it simply points to the x86 primitive.
However I believe the function also corrects some cases when the lock goes over a page border, by making sure that both pages are paged in when the lock executes. So there might be a simple AND testing around it additionally.