Recent

Author Topic: New Big Integer library is almost finished  (Read 18480 times)

MathMan

  • Sr. Member
  • ****
  • Posts: 405
Re: New Big Integer library is almost finished
« Reply #210 on: November 25, 2024, 11:32:56 pm »
checking if a dividend / divisor pair entering a division is the same as last time division was activated
The reason for that is to enable existing code to work without changes... in my experience code like this is quite common:
Code: Pascal  [Select][+][-]
  1. d:= (x div y);
  2. m:= (x mod y);
And I wanted that code run as fast as possible and not have to be rewritten by the user. The overhead will be small.

I can understand that, but some mantras simply don't translate well from short to long integers. In the special case above it is not even a good idea for 'standard' integers - imho it is actually a layer 8 problem. First, there is a DivMod function in FPC (albeit only for UInt32) which delivers quotient and remainder. Even if that was not available the following is faster (and people doing arbitrary precision stuff are aware of this)

Code: Pascal  [Select][+][-]
  1. d:= (x div y);
  2. m:= x - y*d;

When you calculate the quotient for the 'Div' operator you automatically calculate the remainder too - you're just not exposing it to the user. If you look at how I addressed this, then you'll find that I also made a DivMod function available to the user. The 'Div' and 'Mod' operators then use it and flush one of the two down the drain. Looking for repeated patterns in core routines is just making the average case slow and should be avoided.

I ran some tests comparing your code to mine. Add, subtract & multiply all take approximately the same time (my code is about 25% slower but I'm happy with that).

Not sure what you measured, but on my laptop I get the following info from speedtest5 for v4.8.5

Code: Pascal  [Select][+][-]
  1. time elapsed is 0,34 seconds for 100 iterations of divide with 16384 bit integers
  2. time elapsed is 0,37 seconds for 100000 iterations of multiply with 16384 bit integers
  3. time elapsed is 1,14 seconds for 100000 iterations of add with 16384 bit integers
  4. time elapsed is 1,09 seconds for 100000 iterations of subtract with 16384 bit integers

The 1,14 seconds for 100000 iterations translate to ~26.000 clock cycles for a 16384 bit integer add. Doing the same sized addition with my bench for ArbInt I get ~1.700 clock cycles. Figures for sub are comparable - mul is worse.

I did some tests of dynamic arrays vs new/free of pointer to array. It looks to me like dynamic arrays are faster, and they get more faster as the size of the array increases (if that makes sense). The main reason for this is that new of pointer to array requires the memory to be initialised, whereas a dynamic array has its memory zero'd automagically. (There might be a trick to fix this that I'm not aware of). So I don't think my dynamic arrays are a problem.

If you look at my implementation you'll nowhere find any initialisation (zeroing) of the memory I allocate and free via pointers. Not neccessary, as I always know how many limbs in the associated memory for the data is valid. There are the 'Limit' and 'Used' figures - 'Limit' states how many limbs can actually be used via the memory pointer and 'Used' states how many of these actually contribute to the number being represented by the ArbInt. So every time you re-size your dynamic array the implied initialisation of the array contributes negative to the speed.

As you point out, my division routine is my biggest bottleneck.  Can you explain in a nutshell why your division runs so much quicker, please?

I looked at your code, and the only thing I could see is a reference to doing each calculation loop on 2 or 3 limbs/words instead just one (as I am doing). Is that the essence of what you are doing?

I'll try

- I always work with full limbs vs your routine working on half limbs <= more than twice the time because you also have twice as many memory reads & writes
- I divide by multiplication with an inverse (per limb step) vs your routine using 'Div' (per half limb step) which is much slower
- I have way less correction steps 1 in 2^63 vs 1 in 2^16 in your code <= every correction induces another full addition of divisor to (running) remainder
- I update the top three limbs of the (running) remainder separate which shortens the update of the rest via costly SubMulMag1 by two limb <= this is especially efficient for very short divisions. However, for an m/n division the main loop in DivModMagBase runs for m-n iterations.

I am considering attempting to steal your code and translate it line-by-line to work with my data structures.  But am a bit uneasy about this because of the difficulties I had translating the standard Knuth algorithm from C. Also, would you object if I did this?

Thanks.

Feel free.

ad1mt

  • Sr. Member
  • ****
  • Posts: 327
    • Mark Taylor's Home Page
Re: New Big Integer library is almost finished
« Reply #211 on: November 26, 2024, 05:44:54 pm »
I divide by multiplication with an inverse
I do understand the theory of this.
What I don't understand is how you create an inverse, when you're dealing with integers.
Please can you explain, preferably with some pascal code?
Thanks.

MathMan

  • Sr. Member
  • ****
  • Posts: 405
Re: New Big Integer library is almost finished
« Reply #212 on: November 26, 2024, 07:29:12 pm »
I divide by multiplication with an inverse
I do understand the theory of this.
What I don't understand is how you create an inverse, when you're dealing with integers.
Please can you explain, preferably with some pascal code?
Thanks.

There are two different Euclidean inverses used
  A: if you do division by a single limb divisor
  B: if you do division by a divisor with two or more limb

A is calculated in the function 'Inv2by1' and B in the function 'Inv3by2' - so much for the Pascal source.

How does it work from a mathematical point of view? The following example is for UInt32 limb type.

For A assume you want to divide by the divisor ('Dvsr') $000089ab. To minimise correction steps in the actual division later on, you first scale it up so that the most significant bit is set (you should have come across this in the Knuth algorithm too). In this case 'Scale' is 16 bit and the new 'Dvsr' is $89ab0000. Now you change the perspective on that number. Instead of seeing it as an integer 'Dvsr' you consider it as a fraction 0.'Dvsr'. For this one an inverse 1.'Inv' can be calculated via Newton approximation in three steps. This is done in 'Inv2by1', where you can also find a reference to the more detailed paper describing the process in detail. As the original 'Dvsr' was scaled by 16 bit this must also be done with the dividend, to get the correct quotient. This is done on the fly in the function 'DivModMag1D'. If the original 'Dvsr' is already of a fitting form - i.e. 'Dvsr'=$9abcdef0 no scaling is required, the 'Inv' is calculated the same way but the division is done via 'DivModMag1N'.

The 'Inv' calculated in A could be used in case of 'Dvsr' with size 2 or more limbs - simply picking the leading limb of 'Dvsr' and feed it into A. Due to Algebra this is not an optimal choice. It will produce way too many correction steps in the actual division calculation - about 1 in 3 single corrections and 1 in 100 double correction, iirc. To get a better 'Inv' the two leading limb of 'Dvsr' are taken, again scaled, and considered as representing the double limb fraction 0.'DvsrHi''DvsrLo'. For these the 'InvHi' of the double limb inverse 1.'InvHi''InvLo' is calculated in the function 'Inv3by2'. It takes the limb inverse from A as starting point and does another correction round of Newton approximation to get the correct value. This value is then used in 'DivModMagBase' for the actual calculation.

Regards,
MathMan

srvaldez

  • Full Member
  • ***
  • Posts: 117
Re: New Big Integer library is almost finished
« Reply #213 on: November 28, 2024, 11:08:36 am »
hello ad1mt
I came across a site that may interest you https://www.bearssl.org/bigint.html
near the end of that page it presents some implementation details

ad1mt

  • Sr. Member
  • ****
  • Posts: 327
    • Mark Taylor's Home Page
Re: New Big Integer library is almost finished
« Reply #214 on: November 29, 2024, 09:59:47 am »
I came across a site that may interest you https://www.bearssl.org/bigint.html
near the end of that page it presents some implementation details
@srvaldez thanks.

 

TinyPortal © 2005-2018