The examples you shown are kind of slow too. The fast implementation is checking the CPU flag as mentioned.
Actually, yes and no.
If you take the (quite limited) positive number only (Qword or Dword ...) then you end up with 2 < compares. If (and I don't know if...) the compiler generates branch-less code for that (at least the first) then you end up with
- 2 cmp
- 2 flag tests
- 1 branch at the end
versus
- 1 flag test
- 1 branch
On the paper that looks massive. But given the way modern CPU do prediction and elsewhat internal optimizations, you have to very special hand written asm code surrounding that to actually get a difference (and run it millions of times in a tight loop with very little else in that loop).
It may differ when you compile for a RISC target....
And well, despite the above, if it was avail, and I had code that needs the test => I would gladly use it. Probably desire it, too. So yes, I do backup the idea that it would be good to have.
Only, it will probably gain you at best (very optimistically) one or two percent.
(that is in an app wich does more than only adding that one percent is watered down, but the idea is that many such optimization, each contributing a percent on a different part of the code, and you will get some noticeable result)
Btw, if you do need to get something faster (as in by all means, and even if just a tiny bit) => try FPC 3.3.1. Some of my testcases (doesn't matter, I just use them to bench/compare diff fpc versions) had 3% or 4% gains, just from being compiled with 3.3.1 (at -O4 vs -O4 with 3.2.2 or 3.2.3)