================================================================= Exploring approximations to "one tenth" in binary floating point ================================================================= - Ian! D. Allen - idallen@idallen.ca - www.idallen.com Reference: http://www.exploringbinary.com/binary-converter/ Just as 1/3 = 0.333333+ in decimal, many fractions repeat forever in binary, including the very popular 1/10 (one tenth: 0.1 decimal). "It may come as a surprise that terminating decimal fractions can have repeating expansions in binary. It is for this reason that many are surprised to discover that 0.1 + ... + 0.1, (10 additions) differs from 1 in floating point arithmetic. In fact, the only binary fractions with terminating expansions are of the form of an integer divided by a power of 2, which 1/10 is not." - http://en.wikipedia.org/wiki/Binary_numeral_system The simple number 0.1 decimal translated to binary is a repeating binary fraction that never terminates. Truncated to 28 bits, 0.1 decimal is 0.000110011001100110011001100 (repeating) in binary. If you convert that 28-bit truncated binary back to decimal, you get only 0.0999999940395355224609375, not 0.1. (Naturally, truncation loses some of the value.) That's a difference of 0.0000000059604644775390625: 0.000110011001100110011001100(2) = 0.0999999940395355224609375(10) difference from 0.1 -> 0.0000000059604644775390625(10) If you convert 0.1 decimal to binary by rounding up instead of truncating, you go up to the next higher binary value, which is 0.000110011001100110011001101 (28 bits) and the decimal for that is 0.100000001490116119384765625, not 0.1. (Since we rounded up, the value is a bit more than the desired 1.0 value.) That's a difference of only 0.000000001490116119384765625: 0.000110011001100110011001101(2) = 0.100000001490116119384765625(10) difference from 0.1 -> 0.000000001490116119384765625(10) Both truncating down and rounding up produce incorrect values. Rounding up produces an error of only 0.000000001490116119384765625, which is smaller than the truncated value error of 0.0000000059604644775390625, so this rounded-up binary value is a better approximation to 0.1 than the simple truncation, and so the rounded-up approximation is what you get when you enter 0.1 into your computer. The value is a bit too big. Increasing the number of bits of precision available, e.g. by using double-precision floating point, will make the converted approximation closer to the original 0.1, but because 0.1 decimal is a repeating binary fraction, it can never be exactly represented in a binary number system. No finite amount of binary precision will ever capture it exactly. The IEEE 754 Single Precision Floating Point format can only store 23+1 (normalized) binary bits of precision. A one-bit change in the least significant bit of the approximation of 0.1 corresponds to the one-bit changes made above. IEEE rounds up 0.1 to use 0x1.99999a0p-4. Below is what the above looks like when programmed in a Java program. First, some normalization and conversion of the above to hexadecimal: 0.0001100110011001100110011000 = 1.10011001100110011001100 * 2**-4 = 0x1.9999980p-4 hexadecimal = 0.0999999940395355224609375 decimal difference from 0.1 -> 0.0000000059604644775390625(10) 0.0001100110011001100110011010 = 1.10011001100110011001101 * 2**-4 = 0x1.99999a0p-4 hexadecimal = 0.100000001490116119384765625 decimal difference from 0.1 -> 0.000000001490116119384765625(10) public class calculateMachineEpsilonFloat { public static void main(String[] args) { float machEps = 1.0f; float f; // 0.1 is encoded in binary in a 23(+1)-bit mantissa as 0x1.99999a0p-4 // because Java (IEEE 754 floating point) chooses the closest value, // and the closest approximation to 0.1 involves rounding up; // this approximation to 0.1 prints it as "a bit more than 0.1" f = 0.1f; System.out.println( "f: " + f ); System.out.printf( "f: %.40f\n", f ); // print 40 digits of precision System.out.printf( "f: %.40a\n", f ); // print 40 digits of precision f: 0.1 f: 0.1000000014901161200000000000000000000000 f: 0x1.99999a0000000000000000000000000000000000p-4 [ error is 0.000000001490116120 ] // to find the float that is just less than 0.1, truncate // the repeating binary fraction instead of rounding up; // this prints the approximation to that is "a bit less than 0.1" f = 0x1.9999980p-4f; // compare last digit with 0x1.99999a0p-4 System.out.println( "f: " + f ); System.out.printf( "f: %.40f\n", f ); // print 40 digits of precision System.out.printf( "f: %.40a\n", f ); // print 40 digits of precision f: 0.099999994 f: 0.0999999940395355200000000000000000000000 f: 0x1.9999980000000000000000000000000000000000p-4 [ error is 0.000000005960464480, which is more than 0.000000001490116120 ] // look at the difference between 0x1.99999a0p-4 and 0x1.9999980p-4f f = 0.1f - 0x1.9999980p-4f; // subtract truncation from the rounding System.out.println( "f: " + f ); System.out.printf( "f: %.40f\n", f ); // print 40 digits of precision System.out.printf( "f: %.40a\n", f ); // print 40 digits of precision f: 7.4505806E-9 f: 0.0000000074505805969238280000000000000000 f: 0x1.0000000000000000000000000000000000000000p-27 [ note that the difference is a single bit, viewed in hexadecimal ] // a loop that adds up ten tenths and doesn't get 1.0 f = 0.0f; do { f = f + 0.1f; System.out.println( "count up f: " + f ); System.out.printf( "count up f: %.40f\n", f ); System.out.printf( "count up f: %.40a\n", f ); } while ( f < 1.0f ); count up f: 0.1 count up f: 0.1000000014901161200000000000000000000000 count up f: 0x1.99999a0000000000000000000000000000000000p-4 count up f: 0.2 count up f: 0.2000000029802322400000000000000000000000 count up f: 0x1.99999a0000000000000000000000000000000000p-3 count up f: 0.3 count up f: 0.3000000119209289600000000000000000000000 count up f: 0x1.3333340000000000000000000000000000000000p-2 count up f: 0.4 count up f: 0.4000000059604645000000000000000000000000 count up f: 0x1.99999a0000000000000000000000000000000000p-2 count up f: 0.5 count up f: 0.5000000000000000000000000000000000000000 count up f: 0x1.0000000000000000000000000000000000000000p-1 count up f: 0.6 count up f: 0.6000000238418579000000000000000000000000 count up f: 0x1.3333340000000000000000000000000000000000p-1 count up f: 0.70000005 count up f: 0.7000000476837158000000000000000000000000 count up f: 0x1.6666680000000000000000000000000000000000p-1 count up f: 0.8000001 count up f: 0.8000000715255737000000000000000000000000 count up f: 0x1.99999c0000000000000000000000000000000000p-1 count up f: 0.9000001 count up f: 0.9000000953674316000000000000000000000000 count up f: 0x1.ccccd00000000000000000000000000000000000p-1 count up f: 1.0000001 count up f: 1.0000001192092896000000000000000000000000 count up f: 0x1.0000020000000000000000000000000000000000p0 In floating point arithmetic, the machine epsilon (also called macheps, machine precision, or unit roundoff) is, for a particular floating point unit, the difference between 1 and the smallest exactly representable number greater than one. (The machine epsilon may sometimes be defined as the smallest positive number which, when added to 1, yields a result other than one.) We can calculate it using a loop: do { System.out.println( "Calculating Machine epsilon: " + machEps ); System.out.printf( "Calculating Machine epsilon: %.40f\n", machEps ); System.out.printf( "Calculating Machine epsilon: %.40a\n", machEps ); machEps /= 2.0f; } while ((float)(1.0 + (machEps/2.0)) != 1.0); System.out.println( "Calculated Machine epsilon: " + machEps ); System.out.printf( "Calculated Machine epsilon: %.40f\n", machEps ); System.out.printf( "Calculated Machine epsilon: %.40a\n", machEps ); } } Calculating Machine epsilon: 1.0 Calculating Machine epsilon: 1.0000000000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p0 Calculating Machine epsilon: 0.5 Calculating Machine epsilon: 0.5000000000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-1 Calculating Machine epsilon: 0.25 Calculating Machine epsilon: 0.2500000000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-2 Calculating Machine epsilon: 0.125 Calculating Machine epsilon: 0.1250000000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-3 Calculating Machine epsilon: 0.0625 Calculating Machine epsilon: 0.0625000000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-4 Calculating Machine epsilon: 0.03125 Calculating Machine epsilon: 0.0312500000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-5 Calculating Machine epsilon: 0.015625 Calculating Machine epsilon: 0.0156250000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-6 Calculating Machine epsilon: 0.0078125 Calculating Machine epsilon: 0.0078125000000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-7 Calculating Machine epsilon: 0.00390625 Calculating Machine epsilon: 0.0039062500000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-8 Calculating Machine epsilon: 0.001953125 Calculating Machine epsilon: 0.0019531250000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-9 Calculating Machine epsilon: 9.765625E-4 Calculating Machine epsilon: 0.0009765625000000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-10 Calculating Machine epsilon: 4.8828125E-4 Calculating Machine epsilon: 0.0004882812500000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-11 Calculating Machine epsilon: 2.4414062E-4 Calculating Machine epsilon: 0.0002441406250000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-12 Calculating Machine epsilon: 1.2207031E-4 Calculating Machine epsilon: 0.0001220703125000000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-13 Calculating Machine epsilon: 6.1035156E-5 Calculating Machine epsilon: 0.0000610351562500000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-14 Calculating Machine epsilon: 3.0517578E-5 Calculating Machine epsilon: 0.0000305175781250000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-15 Calculating Machine epsilon: 1.5258789E-5 Calculating Machine epsilon: 0.0000152587890625000000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-16 Calculating Machine epsilon: 7.6293945E-6 Calculating Machine epsilon: 0.0000076293945312500000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-17 Calculating Machine epsilon: 3.8146973E-6 Calculating Machine epsilon: 0.0000038146972656250000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-18 Calculating Machine epsilon: 1.9073486E-6 Calculating Machine epsilon: 0.0000019073486328125000000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-19 Calculating Machine epsilon: 9.536743E-7 Calculating Machine epsilon: 0.0000009536743164062500000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-20 Calculating Machine epsilon: 4.7683716E-7 Calculating Machine epsilon: 0.0000004768371582031250000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-21 Calculating Machine epsilon: 2.3841858E-7 Calculating Machine epsilon: 0.0000002384185791015625000000000000000000 Calculating Machine epsilon: 0x1.0000000000000000000000000000000000000000p-22 Calculated Machine epsilon: 1.1920929E-7 Calculated Machine epsilon: 0.0000001192092895507812500000000000000000 Calculated Machine epsilon: 0x1.0000000000000000000000000000000000000000p-23 An IEEE 754 single precision floating point number has 23+1=24 bits of mantissa, including the leading unit digit. The number 1 is represented in binary with an unbiased exponent of 0 and a mantissa of 1.00000000000000000000000. The next largest representable number has an exponent of 0 and a mantissa of 1.00000000000000000000001 which is one binary bit bigger. The difference between these numbers is in the last (rightmost) binary bit 0.00000000000000000000001, or 2^-23. This is the machine epsilon of a floating point unit which uses IEEE single-precision floating point arithmetic. It is the smallest value that can be added to one to yield a value other than one. Anything smaller has no effect when added to one. -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/