BsfQWORD, BsrQWORD are working as expected at full speed (same environment). Something going wrong with PopCnt.
BTW PopCnt is overloaded, Bsf, Bsr family - not.
Here is original call to RTL PopCnt -
time ~ 15s.
Inc(c, PopCnt(a.bs[i] and b.bs[i]));
I made little change,
same timevar
t: QWORD;
...
t := a.bs[i] and b.bs[i];
t := PopCnt(t);
Inc(c, t);
And now black magic - another change.
Time ~ 6.5s.
var
t: QWORD;
...
t := a.bs[i] and b.bs[i];
if t <> 0 then t := PopCnt(t);
Inc(c, t);
Thanks @ccrause , @ASerge.