Recent

Author Topic: How to compare floating point numbers to a certain number of decimals only?  (Read 611 times)

Sander

  • New Member
  • *
  • Posts: 17
Dear people,
This concerns the FPC compiler only, I don't use Lazarus for development.
FPC 3.2.2, the environment is Win32.

My program had a bug which was traced to two double precision floats being deemed not equal,
while they "should" have been equal.
I have fixed the bug, but it led to some investigation which revealed something I would like to be able to understand.

Apparently, the number 1.9000000000000000 does not exist as a single, double or extended floating point type.

The attached is the text output of my test program which uses strtofloat() to convert a text number into a float. The float is then examined with writeln() or FloatToStrF().

As a side issue,
FloatToStrF(floatnumber,ffexponent,decimals+1,exponents) gives the same output as
writeln(floatnumber) with the caveat that FloatToStrF() produces 16 decimals for an extended
float, whereas writeln()  produces 20.
The test output was produced with writeln().

The end goal is to be able to qualify float comparisons to a certain precision, but I thought I would start at the beginning.
The beginning seems to be confusing already, any pointing in the right direction would be appreciated.

Sander



 
« Last Edit: May 11, 2026, 05:08:14 am by Sander »

Thaddy

  • Hero Member
  • *****
  • Posts: 19273
  • Glad to be alive.
Why?

https://www.freepascal.org/docs-html/rtl/math/samevalue.html

The epsilon value indicates the precision.
The format functions are for display and are not really meant for rounding.

Note that you should always use SameValue with explicit Epsilon, because otherwise E=0.0 is assumed and that is frankly nonsense. Example from the web:
Code: Pascal  [Select][+][-]
  1. uses Math;
  2. var
  3.   x, y: Double;
  4. begin
  5.   x := 0.1 + 0.2;
  6.   y := 0.3;
  7.   if SameValue(x, y, 1E-10) then
  8.     WriteLn('x and y are considered equal')
  9.   else
  10.     WriteLn('x and y are not equal');
  11. end.
« Last Edit: May 11, 2026, 06:47:15 am by Thaddy »
objects are fine constructs. You can even initialize them with constructors.

valdir.marcos

  • Hero Member
  • *****
  • Posts: 1285
Dear people,
This concerns the FPC compiler only, I don't use Lazarus for development.
FPC 3.2.2, the environment is Win32.

My program had a bug which was traced to two double precision floats being deemed not equal,
while they "should" have been equal.

I have fixed the bug, but it led to some investigation which revealed something I would like to be able to understand.

Apparently, the number 1.9000000000000000 does not exist as a single, double or extended floating point type.

The attached is the text output of my test program which uses strtofloat() to convert a text number into a float. The float is then examined with writeln() or FloatToStrF().

As a side issue,
FloatToStrF(floatnumber,ffexponent,decimals+1,exponents) gives the same output as
writeln(floatnumber) with the caveat that FloatToStrF() produces 16 decimals for an extended
float, whereas writeln()  produces 20.
The test output was produced with writeln().

The end goal is to be able to qualify float comparisons to a certain precision, but I thought I would start at the beginning.
The beginning seems to be confusing already, any pointing in the right direction would be appreciated.


Sander

The file "FloatTest.txt" you have attached shows only the result...

Code: Bash  [Select][+][-]
  1. $ cat ~/Downloads/FloatTest.txt
  2.  
  3. Test of text to floating point conversion
  4. FreePascal compiler version 3.2.2
  5. A one decimal text number is converted to a float with strtofloat()
  6. The resulting floating point number is then examined with writeln()
  7.  
  8. Question 1: Why is 1.9 never 1.90000000000 in any data type?
  9.  
  10. FloatToStrF(d,ffexponent,decimals+1,exponents) gives the same as writeln(d)
  11. Hoever, FloatToStrF() produces only 16 decimals where writeln produces 20 ?
  12.  
  13. Datatype = single, 9 decimal places, 2 exponents:
  14. 1.0 ->  1.000000000E+00
  15. 1.1 ->  1.100000024E+00
  16. 1.2 ->  1.200000048E+00
  17. 1.3 ->  1.299999952E+00
  18. 1.4 ->  1.399999976E+00
  19. 1.5 ->  1.500000000E+00
  20. 1.6 ->  1.600000024E+00
  21. 1.7 ->  1.700000048E+00
  22. 1.8 ->  1.799999952E+00
  23. 1.9 ->  1.899999976E+00
  24.  
  25. Datatype = double, 16 decimal places, 3 exponents:
  26. 1.0 ->  1.0000000000000000E+000
  27. 1.1 ->  1.1000000000000001E+000
  28. 1.2 ->  1.2000000000000000E+000
  29. 1.3 ->  1.3000000000000000E+000
  30. 1.4 ->  1.3999999999999999E+000
  31. 1.5 ->  1.5000000000000000E+000
  32. 1.6 ->  1.6000000000000001E+000
  33. 1.7 ->  1.7000000000000000E+000
  34. 1.8 ->  1.8000000000000000E+000
  35. 1.9 ->  1.8999999999999999E+000
  36.  
  37. Datatype = extended, 20 decimal places, 4 exponents:
  38. 1.0 ->  1.00000000000000000000E+0000
  39. 1.1 ->  1.10000000000000000002E+0000
  40. 1.2 ->  1.20000000000000000004E+0000
  41. 1.3 ->  1.29999999999999999996E+0000
  42. 1.4 ->  1.39999999999999999998E+0000
  43. 1.5 ->  1.50000000000000000000E+0000
  44. 1.6 ->  1.60000000000000000002E+0000
  45. 1.7 ->  1.70000000000000000004E+0000
  46. 1.8 ->  1.79999999999999999996E+0000
  47. 1.9 ->  1.89999999999999999998E+0000



It'd be easier to help if you show some code.

For example:


Code: Pascal  [Select][+][-]
  1. program TestingFloatNumber;
  2.  
  3. {$mode objfpc}{$H+}
  4.  
  5. uses
  6.   SysUtils;
  7.  
  8. var
  9.   fSingle: Single;
  10.   fDouble: Double;
  11.   fExtended: Double;
  12.  
  13. begin
  14.   fSingle   := 1.9;
  15.   fDouble   := 1.9;
  16.   fExtended := 1.9;
  17.  
  18.   WriteLn;
  19.   WriteLn('Single: ', fSingle);
  20.   WriteLn('Double: ', fDouble);
  21.   WriteLn('Extended: ', fExtended);
  22.  
  23.   WriteLn;
  24.   WriteLn('Copy(FloatToStr(Single), 1, 8): ', Copy(FloatToStr(fSingle), 1, 8));
  25.   WriteLn('Copy(FloatToStr(Double), 1, 8): ', Copy(FloatToStr(fDouble), 1, 8));
  26.   WriteLn('Copy(FloatToStr(Extended), 1, 8): ', Copy(FloatToStr(fExtended), 1, 8));
  27.  
  28.   WriteLn;
  29.   WriteLn('FormatFloat(''0.000000'', Single): ', FormatFloat('0.000000', fSingle));
  30.   WriteLn('FormatFloat(''0.000000'', Double): ', FormatFloat('0.000000', fDouble));
  31.   WriteLn('FormatFloat(''0.000000'', Extended): ', FormatFloat('0.000000', fExtended));
  32. end.

Code: Bash  [Select][+][-]
  1. Free Pascal Compiler version 3.2.3 [2026/05/08] for x86_64
  2. Copyright (c) 1993-2026 by Florian Klaempfl and others
  3. Target OS: Linux for x86-64
  4. Compiling TestingFloatNumber.pas
  5. Linking TestingFloatNumber
  6. 33 lines compiled, 0.1 sec
  7.  
  8. Single:  1.899999976E+00
  9. Double:  1.8999999999999999E+000
  10. Extended:  1.8999999999999999E+000
  11.  
  12. Copy(FloatToStr(Single), 1, 8): 1.899999
  13. Copy(FloatToStr(Double), 1, 8): 1.9
  14. Copy(FloatToStr(Extended), 1, 8): 1.9
  15.  
  16. FormatFloat('0.000000', Single): 1.900000
  17. FormatFloat('0.000000', Double): 1.900000
  18. FormatFloat('0.000000', Extended): 1.900000

Khrys

  • Sr. Member
  • ****
  • Posts: 458
My program had a bug which was traced to two double precision floats being deemed not equal,
while they "should" have been equal.

You should never compare floats exactly (i.e. with  =  or  <>unless you can prove that the problem at hand side-steps precision issues.

Apparently, the number 1.9000000000000000 does not exist as a single, double or extended floating point type.

1.9  is indeed not representable as a binary IEEE-754 float of any precision.

Similiary to how integers are stored as a sum of powers of 2 (with non-negative exponents, e.g.  9 = %1001 = 1·2³ + 0·2² + 0·2¹ + 1·2⁰), floating-point numbers are made up of a sign, a scaling factor (a binary integer) and a fractional part, which (crucially) is a binary fraction as opposed to a decimal fraction. Such fractions are also stored as a sum of powers of two, but with negative exponents only (plus an implicit 1 in most cases). Examples would be  1.5  = 1 + 2⁻¹ or  1.375  = 1 + 2⁻² + 2⁻³.

This means that any given rational number can generally be represented exactly only if that number can be written as a fraction with a denominator that is a power of 2; in your case,  1.9 = 19/10,  and since 19 is a prime number, the fraction is irreducible and as such can't be expressed with a power-of-2 denominator. This is analogous to fractions like  1/3  not having a finite decimal representation. Note that precision also plays into this - large denominators simply require more storage bits.

Another implication of the anatomy of floating-point numbers is that they are distributed exponentially (not uniformly) along the real number line; there are about as many floats between -1 and +1 as there are in the remaining space between negative and positive infinity. In other words, the "step size" between consecutive floats grows along with their value - for example (applicable only for 32-bit  Single),  between  8388608 (2^23)  and  16777216 (2^24),  only integers are representable (step size 1.0), while between  16777216  and  33554432 (2^25),  only even integers are exact (step size 2.0).
« Last Edit: May 13, 2026, 07:21:59 am by Khrys »

Paolo

  • Hero Member
  • *****
  • Posts: 726
To be sure what is going on compare binary number raprsentation and not its string (human readable) representation. String value could seem equal whereas binary (exact stored value) no.

Sander

  • New Member
  • *
  • Posts: 17
Gentlemen, Thank you for your replies.

Khrys:  Your response has provided the understanding I was looking for.

The other suggestions on how to deal with the issue of precision in floating point numbers are appreciated.
I did a "multiply,round and divide" thing as a quick fix for my bug, it is good to see some other options.

Thanks again,
Sander

Thaddy

  • Hero Member
  • *****
  • Posts: 19273
  • Glad to be alive.
Sander, you should simply use SameValue with the Epsilon you expect: it is there in plain sight.
objects are fine constructs. You can even initialize them with constructors.

creaothceann

  • Sr. Member
  • ****
  • Posts: 377
This helped me understand floating-point numbers the most: https://fabiensanglard.net/floating_point_visually_explained/index.html

tetrastes

  • Hero Member
  • *****
  • Posts: 768
1.9  is indeed not representable as an IEEE-754 float of any precision.
To be precise, in IEEE 754 binary float. IEEE 754-2008 introduced three decimal floating point formats.

Warfley

  • Hero Member
  • *****
  • Posts: 2067
Floating point numbers represent numbers in base 2 to a certain precision (48 bit precision for double). Not all fractions can be represented finitely in a given base. For example, she humans use base 10, and the fraction 1/3 in base 10 is the infinite series 0.3333... But in base 3 it can be represented as 0.1
Doing a bit of maths you can see that numbers can be represented finitely in a base if their dividend shares the same prime factors as the base. Base 10 has the prime factors 2 and 5 (2*5=10), so as it does not include 3, 1/3 can not be represented as a finite number in base 10. 5 is, so 1/5=0.2 is finite.

So take 1.9 it's 1+9/10=1+9/(2*5). Floats use Base 2, so as base 5 is required to display 0.9 finitely, it cannot be represented as a float.
So the PC does rounding to the nearest 48 bit number

Khrys

  • Sr. Member
  • ****
  • Posts: 458
1.9  is indeed not representable as an IEEE-754 float of any precision.
To be precise, in IEEE 754 binary float. IEEE 754-2008 introduced three decimal floating point formats.

Good point - I edited my answer to include that.

So the PC does rounding to the nearest 48 bit number

Ahh yes, rounding - another can of worms! Just remember that  SysResetFPU  and  SetExceptionMask  exist if you encounter floating-point results that differ between threads running identical code (or after loading some shared library, or when e.g. SQLite3 suddenly starts raising FPU exceptions...).

Small nitpick: I think you mixed up dividend and divisor  ;)

Sander

  • New Member
  • *
  • Posts: 17
1.9  is indeed not representable as an IEEE-754 float of any precision.
To be precise, in IEEE 754 binary float. IEEE 754-2008 introduced three decimal floating point formats.

Good point - I edited my answer to include that.

So the PC does rounding to the nearest 48 bit number

Ahh yes, rounding - another can of worms! Just remember that  SysResetFPU  and  SetExceptionMask  exist if you encounter floating-point results that differ between threads running identical code (or after loading some shared library, or when e.g. SQLite3 suddenly starts raising FPU exceptions...).

Small nitpick: I think you mixed up dividend and divisor  ;)


Yes, Wikipedia also had something to say about decimal IEEE floats. 
I suppose they are implementable with a large speed penalty, since they are not binary based.

I am very impressed with the willingness of the Lazarus community members to spend their time answering questions which often must seem trivial to you.
I am not a programmer at all, the last software I wrote before retirement was in the DOS days, using TurboPacal.
Recently, I picked it up again to write myself a data acquisition program, which led to the float issue.

Thanks again,
Sander





 

TinyPortal © 2005-2018