============================================== Bitwise Operations, bit masking - AND, OR, NOT ============================================== - Ian! D. Allen - idallen@idallen.ca - www.idallen.com References: ECOA2e p.122 http://atrevida.comprenica.com/atrtut03.html Bitwise operations ------------------ Bitwise operators treat every bit in a word as a Boolean (two-value) variable, apply a column-wise Boolean operator, and generate a result. Unlike binary math, there is no carry or borrow - every column is independent. Examples of 4-bit bitwise AND and OR (written in C or C++ as "&" and "|"): 0110 = 6h 0110 = 6h AND 1010 = Ah OR 1010 = Ah ---- ---- EQUALS 0010 = 2h EQUALS 1110 = Eh In C language: int foo = 0x6 & 0xA; /* bitwise AND gives: foo <-- 2h */ int foo = 0x6 | 0xA; /* bitwise OR gives: foo <-- Eh */ Examples of 4-bit binary bitwise "NOT" (C language uses a tilde operator "~"): NOT 0110 equals 1001 -or- ~6h equals 9h NOT 1010 equals 0101 -or- ~Ah equals 5h NOT 0000 equals 1111 -or- ~0h equals Fh NOT 1111 equals 0000 -or- ~Fh equals 0h You will recoginze the above values from the 4-bit "hex bit flip" table. The NOT operator flips every bit. Note that the word length (number of bits) is critical when performing a NOT operation, since *every bit* in the word flips, even the bits you don't see: In C language: char x = ~0; /* result is 0xFF */ short x = ~0; /* result is 0xFFFF */ int x = ~0; /* result is 0xFFFFFFFF for 32-bit int */ int x = ~0; /* result is 0xFFFFFFFFFFFFFFFF for 64-bit */ char x = ~0xF; /* result is 0xF0 */ short x = ~0xF; /* result is 0xFFF0 */ int x = ~0xF; /* result is 0xFFFFFFF0 for 32-bit int */ int x = ~0xF; /* result is 0xFFFFFFFFFFFFFFF0 for 64-bit */ int x = ~0x1; /* result is 0xFFFFFFFE for 32-bit int */ int x = ~0xFFFF; /* result is 0xFFFF0000 for 32-bit int */ int x = ~0xFFFF0000; /* result is 0x0000FFFF for 32-bit int */ Note that in C language the bit flip of character 0xF is 0xF0 and not 0x0, since character 0xF is really stored as 0x0F (8 bits) and all 8 bits flip. If you store 0xF in a 32-bit "int", all 32 bits flip, so ~0xF = 0xFFFFFFF0. Bit Masking - turning off bits ------------------------------ Bit masking is using the bits in one word to "mask off" or select part of the range of bits in another word, using the bitwise Boolean AND operator. The 1 bits in the "mask" select which bits we want to keep in the other word, and the zero bits in the mask turn all the other corresponding bits to zeroes. In other words, the 1 bits in the mask "copy" the corresponding bits from the other word. In other words, the 1 bits are the "holes" in the mask that let the corresponding bits in the other word flow through to the result. *** EXAMPLE 1: A mask to keep just the bottom nybble (4 bits) of a byte. Create a mask that lets only the bottom 4 bits flow through: 10101010 = AAh AND 00001111 = 0Fh <-- this is the 4-bit mask -------- EQUALS 00001010 = 0Ah Note how the bits in the mask "copy" the bits above. The zeroes in the mask don't let any bits "flow down" from above. *** EXAMPLE 2: A mask to keep just the top nybble (4 bits) of a byte. Create a mask that lets only the top 4 bits flow through: 10101010 = AAh AND 11110000 = F0h <-- this is the 4-bit mask -------- EQUALS 10100000 = A0h Note how the bits in the mask "copy" the bits above. The zeroes in the mask don't let any bits "flow down" from above. *** EXAMPLE 3: ECOA2e "bit masking" example in text on p.122: Is the "4" bit set in a word? To find out, bitwise AND the word with 04h. If the resulting value is not zero, the "4" bit must have been set. if( (word & 0x4) != 0 ) ... This type of bit-logic is used frequently in computer science. Each bit in some word may be assigned some meaning, and you need to mask that bit to tell whether or not it is set in the word. The next example applies the Boolean AND technique to find out if a floating-point number has the sign bit set: *** EXAMPLE 4: A mask to keep just the sign bit of an IEEE 754 single-precision floating-point number: Create a mask that lets only the top bit flow through: BFC00000h <-- this is some IEEE S.P. floating-point value AND 80000000h <-- this is the 1-bit mask -------- EQUALS 80000000h (result non-zero means the number is negative) 3FC00000h <-- this is some IEEE S.P. floating-point value AND 80000000h <-- this is the 1-bit mask -------- EQUALS 00000000h (result zero means the number is positive) Note how the bit in the mask "copies" the bit above. The zeroes in the mask don't let any bits "flow down" from above. In C++ language (requires a cast to allow bitwise operation on float): float x = -1.5; float y = 1.5; if ( (unsigned)x & 0x80000000 ) cout << "x is negative\n"; if ( (unsigned)y & 0x80000000 ) cout << "y is negative\n"; Only "x is negative" prints. *** EXAMPLE 5: Select just the CTRL (control) bits of an ASCII character. ASCII "control" characters are the 32 values from 0-31 decimal (00 to 1F hex). It takes 5 bits to hold these characters (2**5 = 32). Control characters are named by the ASCII letters with which they are associated. A CTRL-A is just the lower 5 bits of the ASCII letter 'A' (or 'a'). A bit mask that selects just the lower 5 bits in an 8-bit byte is: 00011111 = 1Fh Create a mask that lets only the bottom 5 bits flow through: 01000001 = 41h = ASCII upper-case letter 'A' AND 00011111 = 1Fh <-- this is the 5-bit mask -------- EQUALS 00000001 = 01h = CTRL-A Note how the bits in the mask "copy" the bits above. The zeroes in the mask don't let any bits "flow down" from above. This mask works if the letter is lower- or upper-case: 01100001 = 61h = ASCII lower-case letter 'a' AND 00011111 = 1Fh <-- this is the 5-bit mask -------- EQUALS 00000001 = 01h = CTRL-A Note how the bits in the mask "copy" the bits above. The zeroes in the mask don't let any bits "flow down" from above. In C language: char a = 'a'; char ctrla = a & 0x1F; /* mask selects only lower 5 bits */ ORing in bits - turning on bits ------------------------------- The opposite of masking (turning off bits) is "ORing in" bits, where we use the bitwise Boolean OR to "turn on" one or more bits in a word. We select a value that, when ORed with some other value, "turns on" selected bits and leaves the other bits unchanged. *** EXAMPLE 6: Turn any ASCII letter into lower case. In ASCII, the difference between an upper-case letter and a lower-case letter is the value of the bit 00100000 (20h), which is turned on in the lower-case letter. If we "turn on" that bit, the upper-case letter becomes lower-case. (If the letter was already lower-case, the 20h bit is already on; turning it on makes no difference.) To make any ASCII letter lower-case, we must "turn on" the 20h bit. This is called ORing-in a bit, since we use the bitwise Boolean OR operator: 01000001 = 41h = ASCII upper-case letter 'A' OR 00100000 = 20h <-- this is the bit we want turned on -------- EQUALS 01100001 = 61h = ASCII lower-case letter 'a' In C language: char uppera = 'A'; char lowera = uppera | 0x20; /* bitwise OR with 20h */ char lowerb = 'B' | 0x20; /* bitwise OR with 20h */ ... etc ... char lowerz = 'Z' | 0x20; /* bitwise OR with 20h */ Note that *adding* is not always the same as *ORing in*. It is true that if you *add* 20h to an upper-case ASCII letter, it has the same effect as *ORin in* the 20h bit and the letter becomes lower-case, e.g. for 'A' = 41h, adding 20h gives 61h which is 'a': 'A' + 20h --> 'a' /* addition works in this special case */ 41h + 20h --> 61h The same is not true if the letter is *already* lower-case, e.g. for 'a' = 61h, adding 20h gives 81h which isn't even an ASCII character (since there are only 128 ASCII characters from 00h to 7Fh): 'a' + 20h --> 81h *** wrong - want 'a' not non-ASCII 81h *** 61h + 20h --> 81h ORing the 20h bit works properly even if the letter is already lower case. ORing never causes a carry; addition may cause a carry. *** EXAMPLE 7: Turn any ASCII letter into upper case. This is the reverse of the above process that makes a lower-case letter by turning "on" the 20h bit. If we "turn off" the 20h bit in a letter, the letter becomes upper-case. (If the letter was already upper-case, the 20h bit is already off; turning it off makes no difference.) To turn off that 20h bit, we want to create a bit mask that allows all the other bits to remain on but that sets that one 20h bit to zero, i.e. we want a mask that is the opposite (bitwise NOT) of 20h. The mask must be "0" where 20h is "1". We appply the bitwise NOT to 20h to do this: Take 00100000 = 20h -> apply NOT (bit flip) -> 11011111 = DFh. Let's try the mask on a sample ASCII lower-case letter 'a'. (ASCII 'a' has value 61h = 01100001.) Any letter would work. 01100001 = 61h = ASCII lower-case letter 'a' AND 11011111 = DFh <-- this is the bitwise NOT (bit-flip) of 20h -------- EQUALS 01000001 = 41h = ASCII upper-case letter 'A' In C language: char lowera = 'a'; char uppera1 = lowera & 0xDF; /* bitwise AND DFh */ char uppera2 = lowera & ~0x20; /* bitwise AND with ~20h = DFh */ char upperz = 'z' & ~0x20; /* bitwise AND with ~20h = DFh */ Since the difference between every ASCII lower-case and upper-case letter is always the same 20h, we can use an expression to calculate the bit mask in our program instead of specifying it explicitly. In C language: char diff = 'a' - 'A'; /* subtraction always gives 20h for ASCII */ char mask = ~diff; /* bitwise NOT of 20h gives DFh */ char lowerz = 'z'; char upperz = lowerz & mask; /* 'z' masked with DFh becomes 'Z' */ You might notice that 20h is the bit pattern for an ASCII "Space", and you might see this somewhat obscure code some day: char lowerz = 'z'; char upperz = lowerz & ~' '; /* space = 20h, ~space = DFh, 'z' -> 'Z' */ char uppera = 'A'; char lowera = uppera | ' '; /* ASCII space = 20h, 'A' -> 'a' */ Note that *subtracting* is not always the same as *masking*. It is true that if you *subtract* 20h from a lower-case ASCII letter, it has the same effect as *masking* off the 20h bit and the letter becomes upper-case, e.g. for 'a' = 61h, subtracting 20h gives 41h which is 'A': 'A' - 20h --> 'a' /* subtraction works in this special case */ 61h - 20h --> 41h The same is not true if the letter is *already* upper-case, e.g. for 'A' = 41h, subtracting 20h gives 21h which is '!': 'A' - 20h --> '!' *** wrong - want 'A' not "!' *** 41h - 20h --> 21h Masking the 20h bit works properly even if the letter is already upper case. Masking never causes a borrow; subtraction may cause a borrow. Word length and NOT ------------------- Remember that the length of the word being operated on makes a difference to the bitwise NOT operator. *ALL* the bits in the word flip, not just the ones you think you see: char x = ~0x8; /* result is 0xF7 */ short x = ~0x8; /* result is 0xFFF7 */ int x = ~0x8; /* result is 0xFFFFFFF7 for 32-bit int */ int x = ~0x8; /* result is 0xFFFFFFFFFFFFFFF7 for 64-bit int */ You can't directly code the hex "bit flip" table in C language, since the bit-flip table applies to 4-bit hex digits and the shortest data type in C language is an 8-bit char. Storing 0x8 as a char in C means storing the 8-bit value 0x08. If you bit-flip eight-bit character data 0x8, you are actually bit-flipping all eight bits of 0x08 and the result is 0xF7 not 0x7.