@backprop
I think you have enough material to do your work.
If you don't have any other question about the topic I stop here.
Last suggestion: like @marcov suggest with his link, use the coreinfo (or better coreinfo64) utility of sysinternals to know if your system has TSC invariant.
And if you decide to use RDTSC (that is not a bad choice at all), construct something like TStopWatch with RDTSC to use in your applications, so you can adapt this to all platforms you use with only one design in you sources.
Thank you for your input, I appreciate it. I already have my own class made some 20 years ago which do such measurements with Delphi, as I mentioned and I copy and paste here some code from it. But main difference today is so many cores and hyper threads in CPU and advanced optimization at CPU level. Only what I needed actually is proper way to use RDTSC, basically on Linux and FPC/Lazarus. And as I use FPC/Lazarus just for fun and my personal use, I do not see much of the problems regarding porting code and using on other platforms nor I care much about Windows, especially Windows 11.
I missing few more things here... Since I'm more in C/C++ today, what I missing here is volatile declaration for variables. As well, I would like to turn off optimization just for the part of desired code. Hope both is possible...