============================================================ Binary Integer Mathematics, unsigned, two's complement, etc. ============================================================ - Ian! D. Allen - idallen@idallen.ca - www.idallen.com How the computer adds integer numbers ------------------------------------- When the ALU inside the CPU adds one to any integer value, signed or unsigned, the same thing always happens. Basic ALU integer math doesn't care about sign or negative numbers. There is no concept, at the ALU level, of "signed" or "unsigned". It's all just bits being added, and the simple rules for binary addition apply. In four bits binary, the math (adding one to each value) looks like this: CPU Addition Logic for Four Bit Binary Numbers ------------------------------------------------ 0000 plus one (0001) is 0001 plus one is 0010 plus one is 0011 plus one is 0100 plus one is 0101 plus one is 0110 plus one is 0111 plus one is (also causes the Overflow flag to be set in the CPU) 1000 plus one is 1001 plus one is 1010 plus one is 1011 plus one is 1100 plus one is 1101 plus one is 1110 plus one is 1111 plus one is (also causes Carry flag to be set in the CPU) 0000 plus one is 0001 plus one is 0010 plus one is ...etc... The list is is really a ring or wheel, joined at the ends. There is no real beginning or end to the repeating sequence of bit patterns; adding one just moves to the next bit pattern around the ring. Adding two moves two places around the ring, etc. Subtraction moves the other way. The list works like a car odometer; when the maximum value (1111) is reached, the odometer rolls over to zero (0000) and starts over again. Rolling backward from 0000 gets you to 1111. (For another view of how this works, see "Odometers" and "Positive and Negative in Binary", showing the number ring, in: http://www.cs.nmsu.edu/~pfeiffer/hc11/notes/neg.html ) As Pfeifer says, there is no way to tell whether the bit pattern 0101 is the result of adding eight to 0010 (i.e. adding one eight times) or the result of subtracting eight from 0010 (i.e. subtracting one eight times). Both move the same distance around the ring and end up at the same place. Four bits have only 2**4 (16) possible bit patterns; that means they can represent only 16 different things. Eight bits can only represent 2**8 (256) different numbers, etc. Ten bits, 1024 numbers. Unsigned Numbers ---------------- If all 16 of these four-bit patterns represent positive numbers, the range of numbers is 0 to 15 decimal. The smallest number is zero with binary bit pattern 0000 and the largest number is 15 with bit pattern 1111: UNSIGNED 4-BIT NUMBERS ------------------------ 0000 = 0 # unsigned numbers start at all bits turned off (zero) 0001 = 1 0010 = 2 0011 = 3 0100 = 4 0101 = 5 0110 = 6 0111 = 7 1000 = 8 1001 = 9 1010 = 10 1011 = 11 1100 = 12 1101 = 13 1110 = 14 1111 = 15 # unsigned numbers end with all bits turned on (15 decimal) In four bits, adding one to 15 doesn't give 16, it wraps around to start over at zero (0000) again. Subtracting one from zero wraps around to give the bit pattern for 15 (1111). Wrapping around zero means the math is wrong. As long as the adding/subtracting stays within the numbers 0 (0000) to 15 (1111) without crossing zero, the unsigned math is correct. Any four-bit unsigned math that goes below zero or above 15 is wrong. Signed Numbers -------------- If we choose to have some of these 16 four-bit patterns represent negative numbers, we can't use as many bit patterns for positive numbers. We have to steal some bit patterns to use as negative numbers. Most systems for negative numbers divide up the patterns into half positive and half negative. If the top (leftmost, most significant) bit is on, we can decide that the bit pattern represents a negative number. We call this top, leftmost bit the "sign" bit. If half the four-bit patterns have the sign bit on and will be used for negative numbers, that leaves only eight bit patterns to represent positive numbers (with zero being called "positive"), and the range of eight positive numbers is now only 0 to +7 (a total of eight numbers): 0000 0001 0010 0011 0100 0101 0110 0111 The other eight bit patterns, with sign bit on, represent eight negative numbers, usually (but not always) the numbers from -8 up to -1: 1000 1001 1010 1011 1100 1101 1110 1111 Where the 16 unsigned four-bit numbers range from 0 to 15, the 16 signed four-bit numbers are half-negative and half-positive and range from -8 up to zero up to +7. (Note how the range includes -8 but only +7, since zero is considered a "positive" number here.) Two's Complement Signed Numbers: Which bit patterns represent each of the negative numbers from -8 up to -1? If we subtract one from zero (0000), the result is the bit pattern 1111; so, 1111 must be the bit pattern for -1. (Adding one to 1111 gives zero, as expected.) If we subtract one from 1111 (-1) we get 1110; so, 1110 must be the bit pattern for -2, etc., down to -8 with bit pattern 1000. This is the "two's complement" system for negative numbers. Subtracting one from -8 (1000) doesn't give -9, it wraps around to give the bit pattern for +7 (0111). Adding one to +7 (0111) doesn't give +8, it wraps around to start over at -8 (1000) again. Two's Complement Signed 4-bit Numbers --------------------------------------- 1000 = -8 # signed numbers start with only the sign bit turned on 1001 = -7 1010 = -6 1011 = -5 1100 = -4 1101 = -3 1110 = -2 1111 = -1 0000 = 0 0001 = +1 0010 = +2 0011 = +3 0100 = +4 0101 = +5 0110 = +6 0111 = +7 # signed numbers end with everything *except* the sign bit on With the introduction of two's complement negative numbers, the smallest four-bit number is -8 with bit pattern 1000 and the largest number is +7 with bit pattern 0111. Adding one to -1 (1111) correctly gives zero (0000). As long as the adding/subtracting stays within the range of numbers -8 (1000) to +7 (0111), the signed math is correct. Any four-bit signed math that goes below -8 or above +7 is wrong. The wrapping-around is true for all two's complement number systems - adding one to the highest positive number wraps around to give the most negative number. The most negative number always has an absolute value one larger than the largest positive number, since zero is included with the positive numbers. Thus, if you know that the largest positive 8-bit two's complement value is +127, you immediately know that the smallest negative value must be -128. For four-bit unsigned numbers, the range of 16 bit patterns starts at zero (0000) and goes up to 15 (1111). Unsigned math that steps outside this range (goes below 0000 or above 1111) is wrong - the answer wraps around incorrectly. Crossing from 0111 to 1000 is not an error in unsigned math. Crossing from 1111 to 0000 is an error in unsigned math. For four-bit signed two's complement numbers, the range of 16 bit patterns starts at -8 (1000) and goes up to +7 (0111). Signed math that steps outside this range (goes below 1000 or above 0111) is wrong - the answer wraps around incorrectly. Crossing from 1111 to 0000 is not an error in signed math. Crossing from 0111 to 1000 is an error in signed math. Note how unsigned math treats the list of 16 consecutive bit patterns as going from 0 to 15 (0000 to 1111) and signed math treats the list as the 16 patterns going from -8 to +7 (1000 to 0111). It's the same list of sixteen bit patterns, but the start and end points for "right answer" mathematics is different for signed and unsigned. Inside the CPU, for addition (or subtraction) of these 16 four-bit numbers, it doesn't matter if the bit pattern is considered signed or unsigned - when you say "x = x + 1" the ALU inside the CPU just adds one to the bits in memory and possibly sets some CPU flags. Signed, unsigned, same thing. The CPU knows nothing about negative numbers; it only does binary addition to bit patterns. Rule: When adding binary, octal, or hexadecimal bit patterns together, just do the binary math and express the result in the allowed number of bits, throwing away the carry if it doesn't fit. Examples of binary integer math: binary 4 bits: 0000 + 0111 = 0111 hexadecimal 4 bits: 0h + 7h = 7h binary 4 bits: 0100 + 0100 = 1000 hexadecimal 4 bits: 4h + 4h = 8h binary 4 bits: 1010 + 1010 = 0100 (no room for the carry in 4 bits) hexadecimal 4 bits: Ah + Ah = 4h (no room for the carry in 4 bits) binary 4 bits: 1100 + 1100 = 1000 (no room for the carry in 4 bits) hexadecimal 4 bits: Ch + Ch = 8h (no room for the carry in 4 bits) Hex math calculator: http://www.csgnetwork.com/hexaddsubcalc.html hex 16 bits: 1ABCh + 0001h = 1ABDh hex 16 bits: 1ABCh + 0009h = 1AC5h hex 16 bits: 1ABCh + 9C3Eh = B6FAh hex 16 bits: 1ABCh + FFFFh = 1ABBh (no room for carry in 16 bits) hex 16 bits: 1ABCh + F000h = 0ABCh (no room for carry in 16 bits) You must pay careful attention to the number of bits used in the mathematics and never generate an answer that has more than this number of bits. If the math generates a carry out of the leftmost (top, most significant) bit, that carry is "thrown away" and a Carry flag inside the CPU is turned on indicating that this happened. Status Flags for binary integer mathematics =========================================== The ALU in the CPU has two math status flags in it: the "Carry" flag and the "Overflow" flag. These flags are set or unset after every integer math operation. The flags can be interpreted by you, the programmer, to have meaning depending on whether you are treating the bit patterns as unsigned (all-positive), or signed (positive and negative numbers). The CPU doesn't know or care about signed/unsigned - it always does the math and always sets the two flags. You (your program) have to care. Carry Flag - for unsigned integer math -------------------------------------- The "Carry" flag is set in the ALU whenever any binary mathematics causes a "carry" out of the top (leftmost, sign) bit, e.g. in four bit binary math: 1000+1000=0000 (and the carry flag is set). The carry flag indicates that the answer didn't fit in the number of bits we are using - there was "carry" that would have required one more bit to represent. You can think of that "carry" out of the top bit as setting the Carry flag to ON. If there is no carry, the Carry flag is OFF. If we treat the bits in a word as "unsigned" numbers (all positive, no negative), we must watch to see if our arithmetic causes the "carry" flag to be set, indicating the (unsigned) result is wrong. (We don't care about the ALU Overflow flag when doing unsigned math. The overflow flag is only relevant to signed numbers, not unsigned. See below.) 4-bit example: 0100 + 0100 = 1000 (no carry - the answer fits in 4 bits) - The carry flag is turned OFF, no carry, since no carry was generated - If our program asked the CPU to do unsigned math, the fact that the Carry flag is off means that the answer is correct for unsigned math. - In unsigned decimal: 4 + 4 = 8 (carry OFF == correct answer) 4-bit example: 1000 + 1000 = 0000 (and the carry flag comes on in the CPU) - The carry flag is turned ON, since carry was generated - If our program asked the CPU to do unsigned math, the fact that the Carry flag is ON means that the answer is wrong for unsigned math. - In unsigned decimal: 8 + 8 = 0 (carry ON == wrong answer) - If we were allowed 5 bits, the answer would be correct as 10000(2) = 16(10) 4-bit example: 1111 + 0011 = 0010 (and the carry flag comes on in the CPU) - The carry flag is turned ON, since carry was generated - If our program asked the CPU to do unsigned math, the fact that the Carry flag is ON means that the answer is wrong for unsigned math. - In unsigned decimal: 15 + 3 = 2 (carry ON == wrong answer) - If we were allowed 5 bits, the answer would be correct as 10010(2) = 18(10) The carry flag indicates that the unsigned binary math answer needs more bits. Overflow Flag - for two's complement integer math ------------------------------------------------- The "Overflow" flag is set by the CPU whenever the ALU adds two numbers together with the same sign bit, and the result has the opposite sign bit, e.g. in four bit binary math: 0100+0100=1000 (and the overflow flag is set). Just as with the Carry flag, the CPU always sets the overflow flag after a binary math operation. Just as the Carry flag tells your program that the binary math is wrong for unsigned arithmetic, the Overflow flag has the corresponding meaning for signed arithmetic. The overflow flag indicates that the signed math is wrong - you should never add two numbers of the same sign get an answer with the opposite sign. The math went outside the valid range of signed numbers. The ALU always sets both the Carry and Overflow flags when doing integer mathematics on two bit patterns. Your program must choose which flag to check depending on whether your program was doing signed or unsigned math. The sign bit (leftmost bit) and overflow flag only have meaning if we choose to interpret the bit patterns using the two's complement number system, which assigns half the bit patterns to negative numbers. If we treat the bits in a word as "two's complement" signed values, we must watch to see if our arithmetic causes the "overflow" flag to be set, indicating the (two's complement) result is wrong. (We don't care about the carry flag when doing signed, two's complement math. The carry flag is only relevant to unsigned numbers, not signed.) 4-bit example: 0100 + 0100 = 1000 (and the overflow flag comes on in the CPU) - The overflow flag is turned ON, since the sign bit changed - If our program asked the CPU to do signed math, the fact that the Overflow flag is ON means that the answer is wrong for signed math. - In signed decimal: 4 + 4 = -8 (overflow flag ON == wrong answer) - Note that for this same math, if our program was doing unsigned math, the fact that the Carry flag is OFF means that the answer would be right! - In unsigned decimal: 4 + 4 = 8 (carry OFF == right answer) Some mathematics on bit patterns gives answers that are wrong both for unsigned and two's complement numbers, in which case *both* the carry and overflow flags will be set in the CPU: 4-bit example: 1010 + 1010 = 0100 (and both carry and overflow come on) - In unsigned decimal (no negative numbers): 10 + 10 = 4 (WRONG) - In signed decimal (two's complement): -6 + -6 = +4 (WRONG) The ALU always sets both the carry and overflow flags when doing integer mathematics on two bit patterns. Your program must choose which flag to check depending on whether your program was doing signed or unsigned math. When the CPU does math on bit patterns, it doesn't know or care whether the bits represent signed or unsigned integers. The CPU just does the math and sets the flags. Your program knows whether it was using the CPU for signed or unsigned math, and so your program must check the correct flag. ========================= C Language Implementation ========================= In C language with 32-bit integers: signed int sx = 2000000000; signed int sy = 2000000001; signed int sz = sx + sy; unsigned int ux = 2000000000; unsigned int uy = 2000000001; unsigned int uz = ux + uy; printf("sz is 0x%x which could be %d or %u\n", sz, sz, sz); printf("uz is 0x%x which could be %d or %u\n", uz, uz, uz); Output (first "0x" output is in hexadecimal): sz is 0xEE6B2801 which could be -294967295 or 4000000001 uz is 0xEE6B2801 which could be -294967295 or 4000000001 The hex value 0xEE6B2801 when turned into 32-bit binary looks like this: 11101110011010110010100000000001 E E 6 B 2 8 0 1 See - absolutely no difference in the two bit patterns in memory. The bit patterns in memory for signed and unsigned math are the same; because, there is no difference in what the ALU does. The ALU doesn't care. Adding/subtracting don't care about sign; it is all pure binary math on bit patterns. As you see above, once we have some bits in memory, we can tell our program to interpret the bits in either an unsigned manner ("%u" - all the bit patterns represent non-negative numbers) or in a signed manner ("%d" - about half of the bit patterns will be displayed as negative, with leading minus). Signed/unsigned doesn't matter at the ALU level in computer math. Where a signed/unsigned declaration plays a big part is in *comparing* numbers (and in bit-shifting; but, save that for another day). If we compare the bit pattern 0xEE6B2801 (above) with zero, is the bit pattern less than zero or much greater than zero? *That* answer depends on whether we declared the memory location (the variable) to be signed or unsigned. If we declare a 32-bit variable as unsigned, numeric comparisons treat all 4,294,967,296 bit patterns from zero up to 0xFFFFFFFF (11111111111111111111111111111111) as non-negative numbers (zero to 4,294,967,295 for 32-bit numbers). Comparisons (unsigned) will never say any of these patterns are less than zero: unsigned int x = 0xFFFFFFFF; if ( x < 0 ) ( ... /* this is always FALSE for unsigned numbers */ If we declare a 32-bit variable as signed, we are saying to our compiler that numeric comparisons using that variable will treat about half of those 4,294,967,296 bit patterns as negative numbers. Any bit patterns in signed variables that have the leftmost bit (the sign bit) set, will be interpreted and compared as "less than zero": signed int x = 0xFFFFFFFF; if ( x < 0 ) ( ... /* TRUE because x is signed and the sign bit is on */ More C code: signed int sz = 0xEE6B2801; unsigned int uz = 0xEE6B2801; /* * Both sz and uz contain the same bit pattern 0xEE6B2801. * Signed sz interprets the bits as -294967295(10). * Unsigned uz interprets the bits as 4000000001(10). */ if ( sz < 0 ) printf("sz is less than zero\n"); else printf("sz is not less than zero\n"); if ( uz < 0 ) printf("uz is less than zero\n"); else printf("uz is not less than zero\n"); Output: sz is less than zero uz is not less than zero Remember that sz and uz contain exactly the same bit pattern! It is our program that interprets the results. Because we declared the sz variable to be signed, the code generated by the compiler to test whether the bit pattern in sz is less than zero tests the sign bit (leftmost bit) on the sz memory, notices that is is on ('1') and declares the bit pattern as "less than zero" (signed). All the bit patterns with the sign bit on will be interpreted as negative numbers. Because we declared the uz variable to be unsigned, the code generated by the compiler to test whether the bit pattern in uz is less than zero has no work to do at all - unsigned numbers are never less than zero. The compiler arranges to print: "uz is not less than zero" As a programmer, declaring integer variables as signed/unsigned is just a convenient way of having the compiler conspire with us to treat half the bit patterns as "less than zero". Inside the CPU doing math (adding/subtracting), the declaration of signed/unsigned doesn't make any difference - the CPU just does the math and sets the Carry and Overflow flags. For mathematical comparisons (and bit-shifting), declaring signed or unsigned makes a big difference. -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/