Whole point - is to use shifting, that is as fast, as + and -, instead of mul, div and especially **. And not per-bit shifting, as we're only interested in every 10 bits.
Which don't cost much outside a loop, particularly on a superscalar processor, and in a use case where they'll be dwarfed by the complexity of the output processing.
Note this from the Stanford link:
register unsigned int r = 0; // result of log2(v) will go here
for (i = 4; i >= 0; i--) // unroll for speed...
{
if (v & b[i])
{
v >>= S[i];
r |= S[i];
}
}
and bear in mind that most CPUs will have a barrel shifter to handle the >> operation efficiently. Some, of course, have only /one/ barrel shifter shared between two scalar ALUs, but with common sense in a simple situation like this that's no great impediment.
MarkMLl