============================================================ The Relationship between Mnemonics, Bytes, Bits, and Opcodes ============================================================ -IAN! idallen@ncf.ca See also: http://www.datainstitute.com/debug2.htm And: http://www.cit.ac.nz/smac/asm/asm_1.htm The CPU in a digital computer only responds to a series of one's and zero's read from memory. (This is the vonNeumann Stored Program concept.) The patterns of one's and zero's that determine what the CPU is to do are called the Operation Codes, or opcodes. One pattern of one's and zero's will stand for an ADD operation, other patterns will cause the digital circuitry to select and perform other operations (e.g. SUBTRACT, INCREMENT, SHIFT, etc.). These opcodes and their operands are called instructions, and the full set of all instructions is referred to as the Instruction Set of the computer. When the computer's Program Counter (PC, also called Instruction Pointer, IP) is set to a particular memory address and the computer is set running, the CPU will start picking up bytes out of memory and interpreting the contents of those bytes (the one's and zero's patterns) as instructions. A CPU is designed to interpret some number of bits read from memory as its Operation Code. It may be a fixed number of bits, or it may vary depending on the type of operation being selected. Often, an entire byte (all 8 bits) or two bytes (16 bits) is used as an opcode. This is the case for many of the instructions in the x86 family of microprocessors. The x86 CPU has a set of 8-16 bit codes that it recognizes and responds to. Each different code causes a different operation to take place inside the registers of the CPU or on the buses of the system board. Here are three examples showing the bit patterns of three actual x86 opcodes, each followed by their one or more bytes of operands: Bit Pattern ; operation performed by CPU ----------- ------------------------------------------------------- 1. 10111000 ; MOVe the next two bytes into 16-bit register AX 2. 00000101 ; ...the LSB of the number (goes in AL) 3. 00000000 ; ...the MSB of the number (goes in AH) 1. 00000001 ; ADD to the BX register 2. 11000011 ; ...the contents of the AX register 1. 10001001 ; (2-byte opcode!) MOVe the contents of BX to 2. 00011110 ; ...the memory location pointed to 3. 00100000 ; ...by these last 4. 00000001 ; ...two bytes We can represent the bit patterns in these bytes in hexadecimal: BitPattrn HEX ; operation performed by CPU --------- --- ------------------------------------------------- 1. 10111000 B8 ; MOVe the next two bytes into 16-bit register AX 2. 00000101 05 ; ...the LSB of the number (goes in AL) 3. 00000000 00 ; ...the MSB of the number (goes in AH) 1. 00000001 01 ; ADD to the BX register 2. 11000011 C3 ; ...the contents of the AX register 1. 10001001 89 ; (2-byte opcode!) MOVe the contents of BX to 2. 00011110 1E ; ...the memory location pointed to 3. 00100000 20 ; ...by these last 4. 00000001 01 ; ...two bytes This kind of programming is called Machine Code programming; because, the instructions are entered using the binary or hexadecimal numbers that are the actual bit patterns to which the CPU is responding. In the early days of computing all computer programs had to be written using the machine code of the particular CPU being used. ---------------------------------- The Mnemonics of Assembly language ---------------------------------- Remembering and writing the bit patterns that make up the machine code for a CPU is difficult. Programmers quickly developed short, symbolic names for the various operations being performed (e.g. ADD, SUB, MUL, DIV, SHFT, etc.) and wrote programs to turn the symbolic names into machine code. These programs "assembled" machine code from mnemonic names, and came to be called Assemblers. The mnemonic form of programming became known as Assembly Language. Here is the Assembly Language version of the three machine code instructions given above: HexBitPat Assembly Language mnemonic equivalent --------- ------------------------------------------ 1. B80500 MOV AX,0005 1. 01C3 ADD BX,AX 1. 891E2001 MOV [120],BX The bit patterns in the left column are exactly those coded above; these are the patterns that would be output by an Assembler program reading the Assembly Language mnemonics on the right. Note how each machine instruction becomes a single line of Assembly Language; but, each line may expand to be one or more bytes depending on the complexity of the instruction being coded. The mnemonic for a particular operation may not always turn into the same machine code opcode; in the example above, the opcode mnemonic MOV is assembled first as B8 and later as 891E. Addresses are stored in the format native to the hardware, which may mean that they look "backward" when listed as bytes in the bit pattern of the instruction in memory. ---------- Conclusion ---------- Assembly Language is a symbolic, mnemonic language that is assembled by an assembler into the bit patterns of machine code. Programmers writing at the level of machine instructions write almost exclusively in Assembly Language, though you will find some very experienced programmers who have memorized much of their machine's instruction set and are able to write machine code directly without using an Assembler program and mnemonics.