I think that the problem is that there's no arithmetic support for Int64, Int64 gets promoted to a Multi_Int_XV and then the operation is performed in Multi_Int_XV precision
That's more or less what is going on.
The divisor (b) always gets promoted to Multi_Int_XV, however...
When the value is < (2^32), it fits into a single 32-bit word in the Multi_Int_XV array. Then it gets processed by a "fast track" loop in the division function.
But when the value is >= (2^32), it requires two or more 32-bit words in the Multi_Int_XV array. Then it gets processed by the much more complex & slower "long division" loop.
I've never timed them before, and it has surprised me just how much slower the "long division" loop is compared to "fast track" loop. I'm going to do some more investigation to see if there are some efficiency gains that I might have missed, or bugs I might have introduced when I "re-engineered" the Knuth algorithm. I don't think there can be any logic bugs, but there might be some "efficiency bugs" (if that makes sense).