do ;
Is a spinlock.
sleep(0);
Is not. Because - and entirely supported by what you wrote - nobody knows what is waiting in the thread queue and ready to go.
So where " do ; " will hog the CPU until pre-emption, " do sleep(0); " will yield if there's something to yield to.
Yes, I know, but this is what explains the performance differences. Sleep(0) only yields the CPU if there is something else in the queue (also it's not just on the queue, modern schedulers try to schedule with CPU affinity and other stuff in mind, so there can be something in the queue but is prioritized on another CPU and therefore not chosen to interrupt your process). Critical sections on the other hand will always yield the CPU for at least one time slice.
So while you may observe a performance benefit on a system without much load compared to a critical section, there are two caveats to this. First, in this situation you are spinning your CPU to 100%, so if you trade of reaction time for energy consumption, which depending on your environment can be an issue.
The second caveat is, that it is only faster in the case of low load. In the case of high load, when Sleep(0) will actually yield, there shouldn't be much of a performance difference. Because Sleep(0) is an unconditional yield, compared to the critical section, where the OS knows that this is waiting for the release of the CS, it could be that the OS will prioritize the process comming out of the CS over one that just called sleep, meaning that in this situation it may actually be slower.
So using a sleep(0) loop is unreliably faster when there is a low load on the system, and potentially even slower than a CS. When you try to have a faster locking mechanism than classical critical sections, this is usually solved is actually differently, for example the Windows API Critical Section has a combination of Spin Wait and Sleep Wait, where at first it tries for a pre-defined time to spin lock (e.g. for a few nanoseconds), and if the lock is not freed during the spin phase, the thread will be put to sleep, which due to it yielding to the scheduler may add an additional multi millisecond delay.
If your goal is to have a better performance than regular critical sections without relying solely on spin locks, this is the way to go, because it is reliable. In comparison your method is reliant on the external state of the system and how much load is on that, something you cannot control or parametize to ensure a consisten performance