================================================================= Assignment #07 - Character Encoding and Decoding ================================================================= - Ian! D. Allen - idallen@idallen.ca - www.idallen.com Sources for these answers (thank you!): - Ian Allen - Terence Christie - Alex Thomson 1. Under what operating system was the following text file created? How do you know? 4C 69 6E 75 78 0D 0A 52 6F 63 6B 73 5C 21 0D 0A 0D 0A line ends = CR/LF = Microsoft DOS or Windows 2. Under what operating system was the following text file created? How do you know? 4C 69 6E 75 78 0D 52 6F 63 6B 73 5C 21 0D 0D line ends = CR = Apple Macintosh 3. Under what operating system was the following text file created? How do you know? 4C 69 6E 75 78 0A 52 6F 63 6B 73 5C 21 0A 0A line ends = LF (or NL) = Unix/Linux/BSD/Solaris/AIX/etc. 4. What advantage does UTF-8 have over Unicode for English text? UTF-8 encodes English exactly the same was that ASCII does - one byte per character. Unicode takes two bytes for every character. 5. If you sort a file containing lines of mixed-case ASCII text, what is the resulting relationship of lines that begin with upper-case letters and lines that begin with lower-case letters? ASCII upper-case sorts before ASCII lower-case. 6. If you sort a file containing lines of mixed-case EBCDIC text, what is the resulting relationship of lines that begin with upper-case letters and lines that begin with lower-case letters? EBCDIC upper-case sorts after EBCDIC lower-case. 7. Why can't a single text file contain both French, encoded as 8-bit Latin-1, and Polish, encoded as 8-bit Latin-2? If all 258 8-bit codes are used for French, there are no codes left over for Polish. A file can only be interpreted as either Latin-1 or Latin-2, not both at the same time. 8. Encode the following eight ASCII characters in hexadecimal using 8-bit Even Parity: a) A b) a c) B d) b e) C f) c g) D h) d 8 bit even parity ensures that the sum of the number of "one" bits in a seven bit sequence is an even number by either turning on or leaving off the most significant bit (eighth bit) A = 0x41 = 0100 0001, MSB off for even parity = 0100 0001 = 0x41 a = 0x61 = 0110 0001, MSB on for even parity = 1110 0001 = 0xE1 B = 0x42 = 0100 0010, MSB off for even parity = 0100 0010 = 0x42 b = 0x62 = 0110 0010, MSB on for even parity = 1110 0010 = 0xE2 C = 0x43 = 0100 0011, MSB on for even parity = 1100 0011 = 0xC3 c = 0x63 = 0110 0011, MSB off for even parity = 0110 0011 = 0x63 D = 0x44 = 0100 0100, MSB off for even parity = 0100 0100 = 0x44 d = 0x64 = 0110 0100, MSB on for even parity = 1110 0100 = 0xE4 9. Encode the following eight ASCII characters in hexadecimal using 8-bit Odd Parity: a) A b) a c) B d) b e) C f) c g) D h) d 8 bit odd parity ensures that the sum of the number of "one" bits in a seven bit sequence is an odd number by either turning on or leaving off the most significant bit (eighth bit). The parity bit is the exact opposite of the previous answer: A = 0x41 = 0100 0001, MSB on for odd parity = 1100 0001 = 0xC1 a = 0x61 = 0110 0001, MSB off for odd parity = 0110 0001 = 0x61 B = 0x42 = 0100 0010, MSB on for odd parity = 1100 0010 = 0xC2 b = 0x62 = 0110 0010, MSB off for odd parity = 0110 0010 = 0x62 C = 0x43 = 0100 0011, MSB off for odd parity = 0100 0011 = 0x43 c = 0x63 = 0110 0011, MSB on for odd parity = 1110 0011 = 0xE3 D = 0x44 = 0100 0100, MSB on for odd parity = 1100 0100 = 0xC4 d = 0x64 = 0110 0100, MSB off for odd parity = 0110 0100 = 0x64 10. The following ASCII byte is received from a system that generates 8-bit Even Parity: 0xA7 Is there an error in the byte? How do you know? Error, because 0xA7 = 1010 0111 -> which is odd parity, not even. Since even parity expects the sum of the 1 bits to be an even number this byte must be an error because the sum of 1 bits is an odd number (5 bits are "1"). 11. The following ASCII byte is received from a system that generates 8-bit Odd Parity: 0xA5 Is there an error in the byte? How do you know? Error, because 0xA5 = 1010 0101 -> which is even parity, not odd. Since odd parity expects the sum of the 1 bits to be an odd number this byte must be an error because the sum of 1 bits is an even number (4 bits are "1"). 12. The following hexadecimal memory dump contains two big-endian four-byte two's-complement integers, starting at address 101. What decimal values do these two big-endian four-byte integers have? ADDRESS: ---------- MEMORY BYTES ---------- 100: FF 01 DF 5E 86 FE 20 A1 79 61 62 63 64 FF C4 F4 A3 1F ... Integer 1: 01 DF 5E 86 = 01DF5E86 -> positive number = +31415942 decimal Integer 2: FE 20 A1 79 = FE20A179 -> negative number (PANIC) -> use hex flip table -> 01DF5E86 -> add 1 -> 01DF5E87 = 31415943 decimal -> put a minus sign in front -> -31415943 13. The following hexadecimal memory dump contains two little-endian four-byte two's-complement integers, starting at address 102. What decimal values do these two little-endian four-byte integers have? ADDRESS: ---------- MEMORY BYTES ---------- 100: FF 01 DF 5E 86 FE 20 A1 79 61 62 63 64 FF C4 F4 A3 1F ... Integer 1: DF 5E 86 FE -> reverse -> FE 86 5E DF = FE865EDF -> negative -> use hex flip table -> 0179A120 -> add 1 -> 0179A121 = 24748321 decimal -> put a minus sign in front -> -24748321 Integer 2: 20 A1 79 61 -> reverse -> 61 79 A1 20 = 6179A120 -> positive -> 6179A120 = 1635361056 decimal 14. What is the byte-ordering (big- or little-endian) of a GIF graphics image? GIFs are Little-Endian. 15. What is the byte-ordering (big- or little-endian) of a JPEG graphics image? JPEGs are Big-Endian. 16. What is the byte ordering of Intel-based computers? Intel x86 and x86_64 computers are all Little-Endian. 17. Given the following partial hexadecimal memory dump of the boot sector of an MS-DOS disk (MS-DOS means an Intel-based PC): 0000: EB 3C 90 4D 53 44 4F 53 35 2E 30 00 02 04 01 00 0010: 02 00 03 00 00 F8 F7 00 12 00 20 00 11 00 00 00 Read the above dump and give the hexadecimal and decimal values of these unsigned integer items of different widths (sizes in bytes): Offset/Size: Value ----------- ------------------------------------------------ a) 000Bh/2: bytes per sector: 00 02 -> 0200h = 512 decimal b) 000Dh/1: sectors per allocation unit (cluster): 04h = 4 decimal c) 0010h/1: number of copies of FAT: 02h = 2 decimal d) 0011h/2: number of root directory entries: 00 03 -> 0300h = 768(10) e) 0016h/2: number of sectors per FAT: F7 00 -> 00F7h = 247 decimal f) 0018h/2: number of sectors per track: 12 00 -> 0012h = 18 decimal g) 001Ah/2: number of heads: 20 00 -> 0020h = 32 decimal 18. Looking at the previous question: b) What is the maximum number of bytes per sector possible? max is (2**16)-1 = FFFFh = 65535 decimal a) What is the maximum number of copies of the FAT possible? max is (2**8)-1 = FFh = 255 decimal 19. The eight-character ASCII text string "abcdefgh" is stored in memory starting at location zero. See http://easycalculation.com/hex-converter.php a) Give the eight hexadecimal bytes stored in memory: 61h 62h 63h 64h 65h 66h 67h 68h b) Interpret these eight bytes as two 32-bit two's complement integers in little-endian form and give their two decimal values: Integer 1: 61 62 63 64 -> 64636261h = 1684234849 decimal Integer 2: 65 66 67 68 -> 68676665h = 1751606885 decimal c) Interpret these eight bytes as four 16-bit two's complement integers in big-endian form and give their four decimal values: Integer 1: 6162h = (positive) 24930 decimal Integer 2: 6364h = (positive) 25444 decimal Integer 3: 6566h = (positive) 25958 decimal Integer 4: 6768h = (positive) 26472 decimal d) Add one to each of the four big-endian integers from (c) and give the resulting changed ASCII text string (eight ASCII characters): 6162h + 1 = 6163h 6364h + 1 = 6365h 6566h + 1 = 6567h 6768h + 1 = 6769h ASCII: 61 63 63 65 65 67 67 69 -> "acceeggi" 20. True/False: Plain English text, encoded as ASCII, is identical to the same Plain English text encoded as UTF-8. True. That's why UTF-8 is so popular in North America. 21. True/False: Plain English text, encoded as ASCII, is identical to the same Plain English text encoded as Latin-1. True. The first 128 characters of almost all the 8-bit character encodings are just plain ASCII. Only the last 128 bytes (the bytes with the top bit set) are used for the other languages. 22. True/False: Plain English text, encoded as ASCII, is identical to the same Plain English text encoded as Unicode. False. ASCII is one byte per characters; Unicode is two bytes per character. For example, ASCII 'A' = 0x41, Unicode 'A' = 0x0041. ASCII and Unicode have the same *values* for the first 128 characters, but ASCII stores the value in one byte and Unicode needs two. If you encode an ASCII file as Unicode, the file size doubles. -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/