============================================================
Binary Integer Mathematics, unsigned, two's complement, etc.
============================================================
- Ian! D. Allen - idallen@idallen.ca - www.idallen.com

How the computer adds integer numbers
-------------------------------------

When the ALU inside the CPU adds one to any integer value, signed or
unsigned, the same thing always happens.  Basic ALU integer math doesn't
care about sign or negative numbers.  There is no concept, at the ALU
level, of "signed" or "unsigned".  It's all just bits being added,
and the simple rules for binary addition apply.

In four bits binary, the math (adding one to each value) looks like this:

    CPU Addition Logic for Four Bit Binary Numbers
   ------------------------------------------------
    0000 plus one (0001) is
    0001 plus one is
    0010 plus one is
    0011 plus one is
    0100 plus one is
    0101 plus one is
    0110 plus one is
    0111 plus one is (also causes the Overflow flag to be set in the CPU)
    1000 plus one is
    1001 plus one is
    1010 plus one is
    1011 plus one is
    1100 plus one is
    1101 plus one is
    1110 plus one is
    1111 plus one is (also causes Carry flag to be set in the CPU)
    0000 plus one is
    0001 plus one is
    0010 plus one is  ...etc...

The list is is really a ring or wheel, joined at the ends.  There is no
real beginning or end to the repeating sequence of bit patterns; adding
one just moves to the next bit pattern around the ring.  Adding two
moves two places around the ring, etc.  Subtraction moves the other way.
The list works like a car odometer; when the maximum value (1111) is
reached, the odometer rolls over to zero (0000) and starts over again.
Rolling backward from 0000 gets you to 1111.

(For another view of how this works, see "Odometers" and
"Positive and Negative in Binary", showing the number ring, in:
http://www.cs.nmsu.edu/~pfeiffer/hc11/notes/neg.html )

As Pfeifer says, there is no way to tell whether the bit pattern 0101 is
the result of adding eight to 0010 (i.e. adding one eight times) or the
result of subtracting eight from 0010 (i.e. subtracting one eight times).
Both move the same distance around the ring and end up at the same place.

Four bits have only 2**4 (16) possible bit patterns; that means they can
represent only 16 different things.  Eight bits can only represent 2**8
(256) different numbers, etc.  Ten bits, 1024 numbers.

Unsigned Numbers
----------------

If all 16 of these four-bit patterns represent positive numbers, the range
of numbers is 0 to 15 decimal.  The smallest number is zero with binary
bit pattern 0000 and the largest number is 15 with bit pattern 1111:

    UNSIGNED 4-BIT NUMBERS
   ------------------------
    0000 =  0 # unsigned numbers start at all bits turned off (zero)
    0001 =  1
    0010 =  2
    0011 =  3
    0100 =  4
    0101 =  5
    0110 =  6
    0111 =  7
    1000 =  8
    1001 =  9
    1010 = 10
    1011 = 11
    1100 = 12
    1101 = 13
    1110 = 14
    1111 = 15 # unsigned numbers end with all bits turned on (15 decimal)

In four bits, adding one to 15 doesn't give 16, it wraps around to start
over at zero (0000) again.  Subtracting one from zero wraps around to
give the bit pattern for 15 (1111).  Wrapping around zero means the math
is wrong.  As long as the adding/subtracting stays within the numbers 0
(0000) to 15 (1111) without crossing zero, the unsigned math is correct.
Any four-bit unsigned math that goes below zero or above 15 is wrong.

Signed Numbers
--------------

If we choose to have some of these 16 four-bit patterns represent negative
numbers, we can't use as many bit patterns for positive numbers.  We have
to steal some bit patterns to use as negative numbers.

Most systems for negative numbers divide up the patterns into half
positive and half negative.  If the top (leftmost, most significant) bit
is on, we can decide that the bit pattern represents a negative number.
We call this top, leftmost bit the "sign" bit.

If half the four-bit patterns have the sign bit on and will be used
for negative numbers, that leaves only eight bit patterns to represent
positive numbers (with zero being called "positive"), and the range of
eight positive numbers is now only 0 to +7 (a total of eight numbers):

    0000 0001 0010 0011 0100 0101 0110 0111

The other eight bit patterns, with sign bit on, represent eight negative
numbers, usually (but not always) the numbers from -8 up to -1:

    1000 1001 1010 1011 1100 1101 1110 1111

Where the 16 unsigned four-bit numbers range from 0 to 15, the 16 signed
four-bit numbers are half-negative and half-positive and range from
-8 up to zero up to +7.  (Note how the range includes -8 but only +7,
since zero is considered a "positive" number here.)

Two's Complement Signed Numbers:

Which bit patterns represent each of the negative numbers from -8 up
to -1?  If we subtract one from zero (0000), the result is the bit
pattern 1111; so, 1111 must be the bit pattern for -1.  (Adding one to
1111 gives zero, as expected.)  If we subtract one from 1111 (-1) we get
1110; so, 1110 must be the bit pattern for -2, etc., down to -8 with bit
pattern 1000.  This is the "two's complement" system for negative numbers.
Subtracting one from -8 (1000) doesn't give -9, it wraps around to give
the bit pattern for +7 (0111).  Adding one to +7 (0111) doesn't give +8,
it wraps around to start over at -8 (1000) again.

    Two's Complement Signed 4-bit Numbers
   ---------------------------------------
    1000 = -8 # signed numbers start with only the sign bit turned on
    1001 = -7
    1010 = -6
    1011 = -5
    1100 = -4
    1101 = -3
    1110 = -2
    1111 = -1
    0000 =  0
    0001 = +1
    0010 = +2
    0011 = +3
    0100 = +4
    0101 = +5
    0110 = +6
    0111 = +7 # signed numbers end with everything *except* the sign bit on

With the introduction of two's complement negative numbers, the smallest
four-bit number is -8 with bit pattern 1000 and the largest number is
+7 with bit pattern 0111.  Adding one to -1 (1111) correctly gives zero
(0000).  As long as the adding/subtracting stays within the range of
numbers -8 (1000) to +7 (0111), the signed math is correct.  Any four-bit
signed math that goes below -8 or above +7 is wrong.

The wrapping-around is true for all two's complement number systems -
adding one to the highest positive number wraps around to give the most
negative number.  The most negative number always has an absolute value
one larger than the largest positive number, since zero is included with
the positive numbers.  Thus, if you know that the largest positive 8-bit
two's complement value is +127, you immediately know that the smallest
negative value must be -128.

For four-bit unsigned numbers, the range of 16 bit patterns starts at zero
(0000) and goes up to 15 (1111).  Unsigned math that steps outside this
range (goes below 0000 or above 1111) is wrong - the answer wraps around
incorrectly.  Crossing from 0111 to 1000 is not an error in unsigned math.
Crossing from 1111 to 0000 is an error in unsigned math.

For four-bit signed two's complement numbers, the range of 16 bit patterns
starts at -8 (1000) and goes up to +7 (0111).  Signed math that steps
outside this range (goes below 1000 or above 0111) is wrong - the answer
wraps around incorrectly.  Crossing from 1111 to 0000 is not an error
in signed math.  Crossing from 0111 to 1000 is an error in signed math.

Note how unsigned math treats the list of 16 consecutive bit patterns
as going from 0 to 15 (0000 to 1111) and signed math treats the list as
the 16 patterns going from -8 to +7 (1000 to 0111).  It's the same list
of sixteen bit patterns, but the start and end points for "right answer"
mathematics is different for signed and unsigned.

Inside the CPU, for addition (or subtraction) of these 16 four-bit
numbers, it doesn't matter if the bit pattern is considered signed or
unsigned - when you say "x = x + 1" the ALU inside the CPU just adds
one to the bits in memory and possibly sets some CPU flags.  Signed,
unsigned, same thing.  The CPU knows nothing about negative numbers;
it only does binary addition to bit patterns.

Rule: When adding binary, octal, or hexadecimal bit patterns together,
just do the binary math and express the result in the allowed number of
bits, throwing away the carry if it doesn't fit.

Examples of binary integer math:

   binary 4 bits:      0000 + 0111 = 0111
   hexadecimal 4 bits:   0h +   7h =   7h

   binary 4 bits:      0100 + 0100 = 1000
   hexadecimal 4 bits:   4h +   4h =   8h

   binary 4 bits:      1010 + 1010 = 0100  (no room for the carry in 4 bits)
   hexadecimal 4 bits:   Ah +   Ah =   4h  (no room for the carry in 4 bits)

   binary 4 bits:      1100 + 1100 = 1000  (no room for the carry in 4 bits)
   hexadecimal 4 bits:   Ch +   Ch =   8h  (no room for the carry in 4 bits)

   Hex math calculator: http://www.csgnetwork.com/hexaddsubcalc.html

   hex 16 bits: 1ABCh + 0001h = 1ABDh
   hex 16 bits: 1ABCh + 0009h = 1AC5h
   hex 16 bits: 1ABCh + 9C3Eh = B6FAh
   hex 16 bits: 1ABCh + FFFFh = 1ABBh (no room for carry in 16 bits)
   hex 16 bits: 1ABCh + F000h = 0ABCh (no room for carry in 16 bits)

You must pay careful attention to the number of bits used in the
mathematics and never generate an answer that has more than this number
of bits.  If the math generates a carry out of the leftmost (top, most
significant) bit, that carry is "thrown away" and a Carry flag inside
the CPU is turned on indicating that this happened.

Status Flags for binary integer mathematics
===========================================

The ALU in the CPU has two math status flags in it: the "Carry" flag and
the "Overflow" flag.  These flags are set or unset after every integer
math operation.  The flags can be interpreted by you, the programmer,
to have meaning depending on whether you are treating the bit patterns
as unsigned (all-positive), or signed (positive and negative numbers).
The CPU doesn't know or care about signed/unsigned - it always does the
math and always sets the two flags.  You (your program) have to care.

Carry Flag - for unsigned integer math
--------------------------------------

The "Carry" flag is set in the ALU whenever any binary mathematics
causes a "carry" out of the top (leftmost, sign) bit, e.g. in four bit
binary math: 1000+1000=0000 (and the carry flag is set).  The carry flag
indicates that the answer didn't fit in the number of bits we are using -
there was "carry" that would have required one more bit to represent.
You can think of that "carry" out of the top bit as setting the Carry
flag to ON.  If there is no carry, the Carry flag is OFF.

If we treat the bits in a word as "unsigned" numbers (all positive,
no negative), we must watch to see if our arithmetic causes the "carry"
flag to be set, indicating the (unsigned) result is wrong.  (We don't
care about the ALU Overflow flag when doing unsigned math.  The overflow
flag is only relevant to signed numbers, not unsigned.  See below.)

4-bit example:   0100 + 0100 = 1000 (no carry - the answer fits in 4 bits)
 - The carry flag is turned OFF, no carry, since no carry was generated
 - If our program asked the CPU to do unsigned math, the fact that the
   Carry flag is off means that the answer is correct for unsigned math.
   - In unsigned decimal: 4 + 4 = 8 (carry OFF == correct answer)

4-bit example:   1000 + 1000 = 0000 (and the carry flag comes on in the CPU)
 - The carry flag is turned ON, since carry was generated
 - If our program asked the CPU to do unsigned math, the fact that the
   Carry flag is ON means that the answer is wrong for unsigned math.
   - In unsigned decimal: 8 + 8 = 0 (carry ON == wrong answer)
   - If we were allowed 5 bits, the answer would be correct as 10000(2) = 16(10)

4-bit example:   1111 + 0011 = 0010 (and the carry flag comes on in the CPU)
 - The carry flag is turned ON, since carry was generated
 - If our program asked the CPU to do unsigned math, the fact that the
   Carry flag is ON means that the answer is wrong for unsigned math.
   - In unsigned decimal: 15 + 3 = 2 (carry ON == wrong answer)
   - If we were allowed 5 bits, the answer would be correct as 10010(2) = 18(10)

The carry flag indicates that the unsigned binary math answer needs more bits.

Overflow Flag - for two's complement integer math
-------------------------------------------------

The "Overflow" flag is set by the CPU whenever the ALU adds two numbers
together with the same sign bit, and the result has the opposite sign
bit, e.g. in four bit binary math: 0100+0100=1000 (and the overflow flag
is set).  Just as with the Carry flag, the CPU always sets the overflow
flag after a binary math operation.

Just as the Carry flag tells your program that the binary math is wrong
for unsigned arithmetic, the Overflow flag has the corresponding meaning
for signed arithmetic.  The overflow flag indicates that the signed
math is wrong - you should never add two numbers of the same sign get
an answer with the opposite sign.  The math went outside the valid range
of signed numbers.

The ALU always sets both the Carry and Overflow flags when doing integer
mathematics on two bit patterns.  Your program must choose which flag to
check depending on whether your program was doing signed or unsigned math.

The sign bit (leftmost bit) and overflow flag only have meaning if we
choose to interpret the bit patterns using the two's complement number
system, which assigns half the bit patterns to negative numbers.

If we treat the bits in a word as "two's complement" signed values,
we must watch to see if our arithmetic causes the "overflow" flag to
be set, indicating the (two's complement) result is wrong.  (We don't
care about the carry flag when doing signed, two's complement math.
The carry flag is only relevant to unsigned numbers, not signed.)

4-bit example:   0100 + 0100 = 1000 (and the overflow flag comes on in the CPU)
 - The overflow flag is turned ON, since the sign bit changed
 - If our program asked the CPU to do signed math, the fact that the
   Overflow flag is ON means that the answer is wrong for signed math.
   - In signed decimal: 4 + 4 = -8 (overflow flag ON == wrong answer)
 - Note that for this same math, if our program was doing unsigned math,
   the fact that the Carry flag is OFF means that the answer would be right!
   - In unsigned decimal: 4 + 4 = 8 (carry OFF == right answer)

Some mathematics on bit patterns gives answers that are wrong both for
unsigned and two's complement numbers, in which case *both* the carry
and overflow flags will be set in the CPU:

4-bit example: 1010 + 1010 = 0100 (and both carry and overflow come on)
  - In unsigned decimal (no negative numbers): 10 + 10 =  4 (WRONG)
  - In signed decimal (two's complement):      -6 + -6 = +4 (WRONG)

The ALU always sets both the carry and overflow flags when doing integer
mathematics on two bit patterns.  Your program must choose which flag to
check depending on whether your program was doing signed or unsigned math.

When the CPU does math on bit patterns, it doesn't know or care whether
the bits represent signed or unsigned integers.  The CPU just does the
math and sets the flags.  Your program knows whether it was using the
CPU for signed or unsigned math, and so your program must check the
correct flag.

=========================
C Language Implementation
=========================

In C language with 32-bit integers:

    signed int sx = 2000000000;
    signed int sy = 2000000001;
    signed int sz = sx + sy;

    unsigned int ux = 2000000000;
    unsigned int uy = 2000000001;
    unsigned int uz = ux + uy;

    printf("sz is 0x%x which could be %d or %u\n", sz, sz, sz);
    printf("uz is 0x%x which could be %d or %u\n", uz, uz, uz);

Output (first "0x" output is in hexadecimal):

    sz is 0xEE6B2801 which could be -294967295 or 4000000001
    uz is 0xEE6B2801 which could be -294967295 or 4000000001

The hex value 0xEE6B2801 when turned into 32-bit binary looks like this:

    11101110011010110010100000000001
       E   E   6   B   2   8   0   1

See - absolutely no difference in the two bit patterns in memory.  The bit
patterns in memory for signed and unsigned math are the same; because,
there is no difference in what the ALU does.  The ALU doesn't care.
Adding/subtracting don't care about sign; it is all pure binary math on
bit patterns.

As you see above, once we have some bits in memory, we can tell our
program to interpret the bits in either an unsigned manner ("%u" - all
the bit patterns represent non-negative numbers) or in a signed manner
("%d" - about half of the bit patterns will be displayed as negative,
with leading minus).

Signed/unsigned doesn't matter at the ALU level in computer math.  Where a
signed/unsigned declaration plays a big part is in *comparing* numbers
(and in bit-shifting; but, save that for another day).  If we compare
the bit pattern 0xEE6B2801 (above) with zero, is the bit pattern less
than zero or much greater than zero?  *That* answer depends on whether
we declared the memory location (the variable) to be signed or unsigned.

If we declare a 32-bit variable as unsigned, numeric comparisons
treat all 4,294,967,296 bit patterns from zero up to 0xFFFFFFFF
(11111111111111111111111111111111) as non-negative numbers (zero to
4,294,967,295 for 32-bit numbers).  Comparisons (unsigned) will never
say any of these patterns are less than zero:

    unsigned int x = 0xFFFFFFFF;
    if ( x < 0 ) ( ...   /* this is always FALSE for unsigned numbers */

If we declare a 32-bit variable as signed, we are saying to our compiler
that numeric comparisons using that variable will treat about half of
those 4,294,967,296 bit patterns as negative numbers.  Any bit patterns
in signed variables that have the leftmost bit (the sign bit) set,
will be interpreted and compared as "less than zero":

    signed int x = 0xFFFFFFFF;
    if ( x < 0 ) ( ...   /* TRUE because x is signed and the sign bit is on */

More C code:

    signed   int sz = 0xEE6B2801;
    unsigned int uz = 0xEE6B2801;
    /*
     * Both sz and uz contain the same bit pattern 0xEE6B2801.
     * Signed sz interprets the bits as -294967295(10).
     * Unsigned uz interprets the bits as 4000000001(10).
     */
    if ( sz < 0 )
	    printf("sz is less than zero\n");
    else
	    printf("sz is not less than zero\n");

    if ( uz < 0 )
	    printf("uz is less than zero\n");
    else
	    printf("uz is not less than zero\n");

Output:

    sz is less than zero
    uz is not less than zero

Remember that sz and uz contain exactly the same bit pattern!  It is
our program that interprets the results.

Because we declared the sz variable to be signed, the code generated
by the compiler to test whether the bit pattern in sz is less than zero
tests the sign bit (leftmost bit) on the sz memory, notices that is is on
('1') and declares the bit pattern as "less than zero" (signed).  All the
bit patterns with the sign bit on will be interpreted as negative numbers.

Because we declared the uz variable to be unsigned, the code generated
by the compiler to test whether the bit pattern in uz is less than zero
has no work to do at all - unsigned numbers are never less than zero.
The compiler arranges to print:  "uz is not less than zero"

As a programmer, declaring integer variables as signed/unsigned is
just a convenient way of having the compiler conspire with us to treat
half the bit patterns as "less than zero".  Inside the CPU doing math
(adding/subtracting), the declaration of signed/unsigned doesn't make
any difference - the CPU just does the math and sets the Carry and
Overflow flags.  For mathematical comparisons (and bit-shifting),
declaring signed or unsigned makes a big difference.

-- 
| Ian! D. Allen  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/