=================================================================
Exploring approximations to "one tenth" in binary floating point
=================================================================
- Ian! D. Allen - idallen@idallen.ca - www.idallen.com
Reference: http://www.exploringbinary.com/binary-converter/
Just as 1/3 = 0.333333+ in decimal, many fractions repeat forever
in binary, including the very popular 1/10 (one tenth: 0.1 decimal).
"It may come as a surprise that terminating decimal fractions can have
repeating expansions in binary. It is for this reason that many are
surprised to discover that 0.1 + ... + 0.1, (10 additions) differs from
1 in floating point arithmetic. In fact, the only binary fractions with
terminating expansions are of the form of an integer divided by a power
of 2, which 1/10 is not." - http://en.wikipedia.org/wiki/Binary_numeral_system
The simple number 0.1 decimal translated to binary is a repeating
binary fraction that never terminates. Truncated to 28 bits, 0.1
decimal is 0.000110011001100110011001100 (repeating) in binary. If you
convert that 28-bit truncated binary back to decimal, you get only
0.0999999940395355224609375, not 0.1. (Naturally, truncation loses some
of the value.) That's a difference of 0.0000000059604644775390625:
0.000110011001100110011001100(2) = 0.0999999940395355224609375(10)
difference from 0.1 -> 0.0000000059604644775390625(10)
If you convert 0.1 decimal to binary by rounding up instead of
truncating, you go up to the next higher binary value, which is
0.000110011001100110011001101 (28 bits) and the decimal for that is
0.100000001490116119384765625, not 0.1. (Since we rounded up, the value
is a bit more than the desired 1.0 value.) That's a difference of only
0.000000001490116119384765625:
0.000110011001100110011001101(2) = 0.100000001490116119384765625(10)
difference from 0.1 -> 0.000000001490116119384765625(10)
Both truncating down and rounding up produce incorrect values. Rounding
up produces an error of only 0.000000001490116119384765625, which is
smaller than the truncated value error of 0.0000000059604644775390625,
so this rounded-up binary value is a better approximation to 0.1 than
the simple truncation, and so the rounded-up approximation is what you
get when you enter 0.1 into your computer. The value is a bit too big.
Increasing the number of bits of precision available, e.g. by using
double-precision floating point, will make the converted approximation
closer to the original 0.1, but because 0.1 decimal is a repeating binary
fraction, it can never be exactly represented in a binary number system.
No finite amount of binary precision will ever capture it exactly.
The IEEE 754 Single Precision Floating Point format can only store 23+1
(normalized) binary bits of precision. A one-bit change in the least
significant bit of the approximation of 0.1 corresponds to the one-bit
changes made above. IEEE rounds up 0.1 to use 0x1.99999a0p-4.
Below is what the above looks like when programmed in a Java program.
First, some normalization and conversion of the above to hexadecimal:
0.0001100110011001100110011000 = 1.10011001100110011001100 * 2**-4
= 0x1.9999980p-4 hexadecimal
= 0.0999999940395355224609375 decimal
difference from 0.1 -> 0.0000000059604644775390625(10)
0.0001100110011001100110011010 = 1.10011001100110011001101 * 2**-4
= 0x1.99999a0p-4 hexadecimal
= 0.100000001490116119384765625 decimal
difference from 0.1 -> 0.000000001490116119384765625(10)
public class calculateMachineEpsilonFloat {
public static void main(String[] args) {
float machEps = 1.0f;
float f;
// 0.1 is encoded in binary in a 23(+1)-bit mantissa as 0x1.99999a0p-4
// because Java (IEEE 754 floating point) chooses the closest value,
// and the closest approximation to 0.1 involves rounding up;
// this approximation to 0.1 prints it as "a bit more than 0.1"
f = 0.1f;
System.out.println( "f: " + f );
System.out.printf( "f: %.40f\n", f ); // print 40 digits of precision
System.out.printf( "f: %.40a\n", f ); // print 40 digits of precision
f: 0.1
f: 0.1000000014901161200000000000000000000000
f: 0x1.99999a0000000000000000000000000000000000p-4
[ error is 0.000000001490116120 ]
// to find the float that is just less than 0.1, truncate
// the repeating binary fraction instead of rounding up;
// this prints the approximation to that is "a bit less than 0.1"
f = 0x1.9999980p-4f; // compare last digit with 0x1.99999a0p-4
System.out.println( "f: " + f );
System.out.printf( "f: %.40f\n", f ); // print 40 digits of precision
System.out.printf( "f: %.40a\n", f ); // print 40 digits of precision
f: 0.099999994
f: 0.0999999940395355200000000000000000000000
f: 0x1.9999980000000000000000000000000000000000p-4
[ error is 0.000000005960464480, which is more than 0.000000001490116120 ]
// look at the difference between 0x1.99999a0p-4 and 0x1.9999980p-4f
f = 0.1f - 0x1.9999980p-4f; // subtract truncation from the rounding
System.out.println( "f: " + f );
System.out.printf( "f: %.40f\n", f ); // print 40 digits of precision
System.out.printf( "f: %.40a\n", f ); // print 40 digits of precision
f: 7.4505806E-9
f: 0.0000000074505805969238280000000000000000
f: 0x1.0000000000000000000000000000000000000000p-27
[ note that the difference is a single bit, viewed in hexadecimal ]
// a loop that adds up ten tenths and doesn't get 1.0
f = 0.0f;
do {
f = f + 0.1f;
System.out.println( "count up f: " + f );
System.out.printf( "count up f: %.40f\n", f );
System.out.printf( "count up f: %.40a\n", f );
}
while ( f < 1.0f );
count up f: 0.1
count up f: 0.1000000014901161200000000000000000000000
count up f: 0x1.99999a0000000000000000000000000000000000p-4
count up f: 0.2
count up f: 0.2000000029802322400000000000000000000000
count up f: 0x1.99999a0000000000000000000000000000000000p-3
count up f: 0.3
count up f: 0.3000000119209289600000000000000000000000
count up f: 0x1.3333340000000000000000000000000000000000p-2
count up f: 0.4
count up f: 0.4000000059604645000000000000000000000000
count up f: 0x1.99999a0000000000000000000000000000000000p-2
count up f: 0.5
count up f: 0.5000000000000000000000000000000000000000
count up f: 0x1.0000000000000000000000000000000000000000p-1
count up f: 0.6
count up f: 0.6000000238418579000000000000000000000000
count up f: 0x1.3333340000000000000000000000000000000000p-1
count up f: 0.70000005
count up f: 0.7000000476837158000000000000000000000000
count up f: 0x1.6666680000000000000000000000000000000000p-1
count up f: 0.8000001
count up f: 0.8000000715255737000000000000000000000000
count up f: 0x1.99999c0000000000000000000000000000000000p-1
count up f: 0.9000001
count up f: 0.9000000953674316000000000000000000000000
count up f: 0x1.ccccd00000000000000000000000000000000000p-1
count up f: 1.0000001
count up f: 1.0000001192092896000000000000000000000000
count up f: 0x1.0000020000000000000000000000000000000000p0
In floating point arithmetic, the machine epsilon (also called macheps,
machine precision, or unit roundoff) is, for a particular floating point
unit, the difference between 1 and the smallest exactly representable
number greater than one. (The machine epsilon may sometimes be defined
as the smallest positive number which, when added to 1, yields a result
other than one.) We can calculate it using a loop:
do {
System.out.println( "Calculating Machine epsilon: " + machEps );
System.out.printf( "Calculating Machine epsilon: %.40f\n", machEps );
System.out.printf( "Calculating Machine epsilon: %.40a\n", machEps );
machEps /= 2.0f;
}
while ((float)(1.0 + (machEps/2.0)) != 1.0);
System.out.println( "Calculated Machine epsilon: " + machEps );
System.out.printf( "Calculated Machine epsilon: %.40f\n", machEps );
System.out.printf( "Calculated Machine epsilon: %.40a\n", machEps );
}
}
Calculating Machine epsilon: 1.0
Calculating Machine epsilon: 1.0000000000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p0
Calculating Machine epsilon: 0.5
Calculating Machine epsilon: 0.5000000000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-1
Calculating Machine epsilon: 0.25
Calculating Machine epsilon: 0.2500000000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-2
Calculating Machine epsilon: 0.125
Calculating Machine epsilon: 0.1250000000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-3
Calculating Machine epsilon: 0.0625
Calculating Machine epsilon: 0.0625000000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-4
Calculating Machine epsilon: 0.03125
Calculating Machine epsilon: 0.0312500000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-5
Calculating Machine epsilon: 0.015625
Calculating Machine epsilon: 0.0156250000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-6
Calculating Machine epsilon: 0.0078125
Calculating Machine epsilon: 0.0078125000000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-7
Calculating Machine epsilon: 0.00390625
Calculating Machine epsilon: 0.0039062500000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-8
Calculating Machine epsilon: 0.001953125
Calculating Machine epsilon: 0.0019531250000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-9
Calculating Machine epsilon: 9.765625E-4
Calculating Machine epsilon: 0.0009765625000000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-10
Calculating Machine epsilon: 4.8828125E-4
Calculating Machine epsilon: 0.0004882812500000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-11
Calculating Machine epsilon: 2.4414062E-4
Calculating Machine epsilon: 0.0002441406250000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-12
Calculating Machine epsilon: 1.2207031E-4
Calculating Machine epsilon: 0.0001220703125000000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-13
Calculating Machine epsilon: 6.1035156E-5
Calculating Machine epsilon: 0.0000610351562500000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-14
Calculating Machine epsilon: 3.0517578E-5
Calculating Machine epsilon: 0.0000305175781250000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-15
Calculating Machine epsilon: 1.5258789E-5
Calculating Machine epsilon: 0.0000152587890625000000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-16
Calculating Machine epsilon: 7.6293945E-6
Calculating Machine epsilon: 0.0000076293945312500000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-17
Calculating Machine epsilon: 3.8146973E-6
Calculating Machine epsilon: 0.0000038146972656250000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-18
Calculating Machine epsilon: 1.9073486E-6
Calculating Machine epsilon: 0.0000019073486328125000000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-19
Calculating Machine epsilon: 9.536743E-7
Calculating Machine epsilon: 0.0000009536743164062500000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-20
Calculating Machine epsilon: 4.7683716E-7
Calculating Machine epsilon: 0.0000004768371582031250000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-21
Calculating Machine epsilon: 2.3841858E-7
Calculating Machine epsilon: 0.0000002384185791015625000000000000000000
Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-22
Calculated Machine epsilon: 1.1920929E-7
Calculated Machine epsilon: 0.0000001192092895507812500000000000000000
Calculated Machine epsilon: 0x1.0000000000000000000000000000000000000000p-23
An IEEE 754 single precision floating point number has 23+1=24
bits of mantissa, including the leading unit digit. The number 1 is
represented in binary with an unbiased exponent of 0 and a mantissa of
1.00000000000000000000000. The next largest representable number has an
exponent of 0 and a mantissa of 1.00000000000000000000001 which is one
binary bit bigger. The difference between these numbers is in the last
(rightmost) binary bit 0.00000000000000000000001, or 2^-23. This is the
machine epsilon of a floating point unit which uses IEEE single-precision
floating point arithmetic. It is the smallest value that can be added
to one to yield a value other than one. Anything smaller has no effect
when added to one.
--
| Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada
| Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/