Forum > General
How to convert Real to Big Integer type
ad1mt:
I'm struggling to figure out the best method to convert from a real type to a big integer type.
The only way I can think of easily & reliably doing it, is to:
1- use the Int function to truncate the factional/decimal part.
2- use the FloatToDecimal procedure to return a TFloatRec record.
3- use TFloatRec.Digits to get a string containing the most significant digits
4- use TFloatRec.Exponent to get a count of total number of digits
5- adjust the digits from step 3, by adding zeros to the (least significant) end, so that the length equals the number of digits from step 4.
I don't like the conversion to an intermediate string type, it seems to me there must be a cleaner way.
Then, there are the problems of dealing with rounding errors in the least significant end of the real number. For example, here's a small test program I wrote
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program test_real_3;uses sysutils,math;varR :real;I :longint;R_FLOATREC :TFloatRec;beginR:= (1); for i:= 1 to 308 do R:= R*10;R:= Int(R);writeln('Int(R) = ',R);FloatToDecimal(R_FLOATREC, R, 99, 0);writeln('R_FLOATREC.digits = ',R_FLOATREC.digits,' R_FLOATREC.Exponent = ',R_FLOATREC.Exponent);end.The program creates a real number with the largest allowable number of digits.
The output is
--- Code: Text [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---Int(R) = 9.9999999999999981E+307R_FLOATREC.digits = 99999999999999981 R_FLOATREC.Exponent = 308The problem here is that because of rounding errors at the least significant end, the real variable does not match the original value. Is there a way round this? Do I just have to accept that the converted value might look significantly different to the original value? Is there some clever mathematical trick that might be able to recover the original (correct) value? Or maybe get closer to it than 99999999999999981000... ? Because that's a difference of 19x(10^291)... a very significant number. I've tried the Round function, but it can't handle real numbers with extremely large exponents.
Here's another small test program I wrote
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program test_real_4;uses sysutils,math;varR,R_MAN :real;I,R_EXP :longint;beginR:= (1); for i:= 1 to 308 do R:= R*10;R:= Int(R);writeln('Int(R) = ',R);frexp(R,R_MAN,R_EXP);writeln('Frexp man = ',R_MAN,' exp = ',R_EXP);end.The output is
--- Code: Text [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---Int(R) = 9.9999999999999981E+307Frexp man = 5.5626846462680024E-001 exp = 1024This is calling Frexp, but the mantissa and exponent do not make sense to me, when I compare them with the original value, or compare them with the values returned in the previous test program.
Is the mantissa in decimal, but the expononent binary? Does this mean that (in this particular example) the mantissa must be multiplied by (2 ^ 1024), or shifted left (up) by 1024 bits?
I realise that I've probably stumbled into a hornets nest of a problem, and that these problems are unavoidable side-effects of real number types. So I'm not expecting anyone to write my code for me! Just some advice will do.
Thanks.
ad1mt:
I've made some progress. By specifying the precision parameter to FloatToDecimal so that it is slightly less than the available precision of the real format, it rounds in a way that solves the problem in the cases that worried me.
For example, for double, when I do this:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---FloatToDecimal(R_FLOATREC, Double_Var, 15, 0);it returns "1" in the digits instead of "99999999999999981" that I was getting before.
BeniBela:
The digits of a float are (mantissa * 2^exponent)
With arbitrary precision arithmetic you can just calculate that
I have implemented that here: https://github.com/benibela/bigdecimalmath/blob/master/bigdecimalmath.pas#L983-L1232
The rounding is the hard part.
With arbitrary precision, it is also easy. I just calculate the next largest/smallest number, and compare them. However, it is calcualting with 300 digits, when it somehow could do the calculation with 20 digits, which is slower
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program test_real_3;uses sysutils,math,bigdecimalmath;varR :real;I :longint;R_FLOATREC :TFloatRec;beginR:= (1); for i:= 1 to 308 do R:= R*10;R:= Int(R); writeln('Int(R) = ',R);FloatToDecimal(R_FLOATREC, R, 99, 0);writeln('R_FLOATREC.digits = ',R_FLOATREC.digits,' R_FLOATREC.Exponent = ',R_FLOATREC.Exponent); writeln(FloatToBigDecimal(r,bdffExact).toString(bdfExponent));writeln(FloatToBigDecimal(r,bdffShortest).toString(bdfExponent));end.
--- Code: ---Int(R) = 9.9999999999999981E+307
R_FLOATREC.digits = 99999999999999981 R_FLOATREC.Exponent = 308
9.9999999999999981139503267596847425176765179308926185662298078548582170379439067165044410288854031049481594743364161622187121841818187648603927125262209438639553681654618823985640760188731793867961170022535129351893330180773705244319986644578003569234231285691342840034082734135647456849389933411990123839488E307
9.999999999999998E307
--- End code ---
--- Quote from: ad1mt on December 05, 2023, 03:22:00 pm ---The problem here is that because of rounding errors at the least significant end, the real variable does not match the original value
--- End quote ---
You do not even have the original value
The multiplication is not so accurate
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program test_real_5;uses sysutils,math,bigdecimalmath;varR :real;I :longint;R_FLOATREC :TFloatRec;beginR:= 1e308; writeln('Int(R) = ',R);FloatToDecimal(R_FLOATREC, R, 99, 0);writeln('R_FLOATREC.digits = ',R_FLOATREC.digits,' R_FLOATREC.Exponent = ',R_FLOATREC.Exponent); writeln(FloatToBigDecimal(r,bdffExact).toString(bdfExponent));writeln(FloatToBigDecimal(r,bdffShortest).toString(bdfExponent));end.
--- Code: ---Int(R) = 1.0000000000000000E+308
R_FLOATREC.digits = 1 R_FLOATREC.Exponent = 309
1.00000000000000001097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336E308
1.0E308
--- End code ---
MathMan:
I think you are approaching this overly complex, as you are mixing the string representation with "the value" of a floating point number - that is it's binary representation. Iirc you are using an array of uint as your basic data type for arbitrary precision integers. In that case conversion from any floating point format to big integer is pretty straight forward.
First, pls take a look at this https://en.wikipedia.org/wiki/IEEE_754 to understand the data format of binary floating point numbers. The conversion algorithm is always the same regardless of the precision and looks like
* extract the mantissa bits from the fp number
- you need a union type defined for this, that lets you access the fp number as a uint
* prepend the hidden 1 bit of the mantissa at the right bit position - i.e. 23 for binary32 or 53 for binary64
* extract the binary exponent, similar to extracting the mantissa
* analyse the exponent to see how much of the mantissa is above the point and how much is below the point
* depending on rounding mode and value below the floating point adjust the value above the floating point
- maybe you want to start with simple truncation here, because then you can simply ignore this step
* do a final shift left or right, depending on the binary exponent, to make the integer mantissa value represent the correct value
And that's it. All in all the above should translate to less then 50 LoC, if you want to include rounding. Less, if you simply truncate.
Cheers,
MathMan
kwyan:
The inaccuracy is not due to the multiplication. The root cause is the real number is not an exact representation of the original number. As you said that these problems are unavoidable side-effects of real number types.
SINGLE (a kind of real number in pascal) may have 7–8 significant digits only while DOUBLE may have 15–16 significant digits.
If the real number is 1.23 (3 significant digits in decimal number), after converted to SINGLE (in any programming language), the value may be, for example, 1.230000001239455871. The REAL of 7-8 significant digits means difference between the REAL and the actual value will be after 7-8 significant digits (In my example, the difference starts at 10th digits). The difference may be more or less than the original value.
If you do a lot of mathematical operation (e.g. ADD) of a REAL, the inaccuracy will be increased (cumulative effect). For example, this is an addition:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program Project1;const R1 : single = 1.0/97.0;var R :single; I :longint;begin R := (0); for i:= 1 to 970000 do R:=R+R1; writeln('R = ',R); R:= Int(R); writeln('Int(R) = ',R);end.
The result is:
--- Code: Text [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---R = 1.003609473E+04Int(R) = 1.003600000E+04
Change the data type to double, the accuracy is increased:
--- Code: Pascal [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---program Project1;const R1 : double = 1.0/97.0;var R :double; I :longint;begin R := (0); for i:= 1 to 970000 do R:=R+R1; writeln('R = ',R); R:= Int(R); writeln('Int(R) = ',R);end.
The result is:
--- Code: Text [+][-]window.onload = function(){var x1 = document.getElementById("main_content_section"); if (x1) { var x = document.getElementsByClassName("geshi");for (var i = 0; i < x.length; i++) { x[i].style.maxHeight='none'; x[i].style.height = Math.min(x[i].clientHeight+15,306)+'px'; x[i].style.resize = "vertical";}};} ---[code=plain]R = 9.9999996926635504E+003Int(R) = 9.9990000000000000E+003
So my recommendation is:
For accounting system, never use real number.
For scientific calculation, accept the inaccuracy of real number.
Navigation
[0] Message Index
[#] Next page