• using InterlockedCompareExchange64(x,0,0) as an atomic load seems optimized away
That seems strange. I would consider that a bug. Maybe you can provide an example (as simplified as possible) that demonstrates this to happen?
- Also then at which level of optimization (and using {$Optimize no...} => which part of the optimizer)?.
- Does it happen with all versions of FPC 3.2.2 / 3.2.3 or 3.2.4RC / 3.3.1?
3.2.2 has some bugs in the optimizer.
3.3.1 depends on the exact commit....
3.2.3/3.2.4 do currently not have any that I know
Well, they all have an issues when doing inline with methods that themself contain inlined code....
I do have some code, that uses Interlocked, and Read/WriteBarrier. And I tested it with all sort of optimizations, and afaik it works very well (at least the part related to the interlocked/barrier).
And afaik lots of others are using it, and haven't complained (in a way that would relate to this). That code happens to be in FpDebug, so afaik really lots of people use it. (and it switches off the peephole opt in 3.2.2 for some parts of the code, due to a 3.2.2 bug, but that wasn't affecting the thread parts actually).
At least i386/x86.
About volatile / not using registers.
Afaik (but not documented) taking a pointer to a variable prevents it from being moved to a register. (because the compiler wont know when it may get changed).
And then, if that is correct, globals shouldn't go to registers either.
I don't know, if anything except local vars go into registers at all... But again, not documented, not long term safe.
Did I miss it, or did you not specify your CPU type?