=================================================================
Assignment #07 - Character Encoding and Decoding
=================================================================
- Ian! D. Allen - idallen@idallen.ca - www.idallen.com

Sources for these answers (thank you!):
    - Ian Allen
    - Terence Christie
    - Alex Thomson

1.  Under what operating system was the following text file created?
    How do you know?  4C 69 6E 75 78 0D 0A 52 6F 63 6B 73 5C 21 0D 0A

    0D 0A line ends = CR/LF = Microsoft DOS or Windows

2.  Under what operating system was the following text file created?
    How do you know?  4C 69 6E 75 78 0D 52 6F 63 6B 73 5C 21 0D

    0D line ends = CR = Apple Macintosh

3.  Under what operating system was the following text file created?
    How do you know?  4C 69 6E 75 78 0A 52 6F 63 6B 73 5C 21 0A

    0A line ends = LF (or NL) = Unix/Linux/BSD/Solaris/AIX/etc.

4.  What advantage does UTF-8 have over Unicode for English text?

    UTF-8 encodes English exactly the same was that ASCII does - one
    byte per character.  Unicode takes two bytes for every character.

5.  If you sort a file containing lines of mixed-case ASCII text,
    what is the resulting relationship of lines that begin with upper-case
    letters and lines that begin with lower-case letters?

    ASCII upper-case sorts before ASCII lower-case.

6.  If you sort a file containing lines of mixed-case EBCDIC text,
    what is the resulting relationship of lines that begin with upper-case
    letters and lines that begin with lower-case letters?

    EBCDIC upper-case sorts after EBCDIC lower-case.

7.  Why can't a single text file contain both French, encoded as 8-bit
    Latin-1, and Polish, encoded as 8-bit Latin-2?

    If all 258 8-bit codes are used for French, there are no codes left
    over for Polish.  A file can only be interpreted as either Latin-1
    or Latin-2, not both at the same time.

8.  Encode the following eight ASCII characters in hexadecimal using
    8-bit Even Parity:
    a) A     b) a     c) B     d) b     e) C     f) c     g) D     h) d

    8 bit even parity ensures that the sum of the number of "one" 
    bits in a seven bit sequence is an even number by either turning
    on or leaving off the most significant bit (eighth bit)

    A = 0x41 = 0100 0001, MSB off for even parity = 0100 0001 = 0x41
    a = 0x61 = 0110 0001, MSB on  for even parity = 1110 0001 = 0xE1
    B = 0x42 = 0100 0010, MSB off for even parity = 0100 0010 = 0x42
    b = 0x62 = 0110 0010, MSB on  for even parity = 1110 0010 = 0xE2
    C = 0x43 = 0100 0011, MSB on  for even parity = 1100 0011 = 0xC3
    c = 0x63 = 0110 0011, MSB off for even parity = 0110 0011 = 0x63
    D = 0x44 = 0100 0100, MSB off for even parity = 0100 0100 = 0x44
    d = 0x64 = 0110 0100, MSB on  for even parity = 1110 0100 = 0xE4

9.  Encode the following eight ASCII characters in hexadecimal using
    8-bit Odd Parity:
    
    a) A     b) a     c) B     d) b     e) C     f) c     g) D     h) d

    8 bit odd parity ensures that the sum of the number of "one" 
    bits in a seven bit sequence is an odd number by either turning
    on or leaving off the most significant bit (eighth bit).

    The parity bit is the exact opposite of the previous answer:

    A = 0x41 = 0100 0001, MSB on  for odd parity = 1100 0001 = 0xC1
    a = 0x61 = 0110 0001, MSB off for odd parity = 0110 0001 = 0x61
    B = 0x42 = 0100 0010, MSB on  for odd parity = 1100 0010 = 0xC2
    b = 0x62 = 0110 0010, MSB off for odd parity = 0110 0010 = 0x62
    C = 0x43 = 0100 0011, MSB off for odd parity = 0100 0011 = 0x43
    c = 0x63 = 0110 0011, MSB on  for odd parity = 1110 0011 = 0xE3
    D = 0x44 = 0100 0100, MSB on  for odd parity = 1100 0100 = 0xC4
    d = 0x64 = 0110 0100, MSB off for odd parity = 0110 0100 = 0x64


10. The following ASCII byte is received from a system that generates
    8-bit Even Parity: 0xA7
    Is there an error in the byte?  How do you know?

    Error, because 0xA7 = 1010 0111 -> which is odd parity, not even.
    Since even parity expects the sum of the 1 bits to be an even
    number this byte must be an error because the sum of 1 bits is 
    an odd number (5 bits are "1").

11. The following ASCII byte is received from a system that generates
    8-bit Odd Parity: 0xA5
    Is there an error in the byte?  How do you know?

    Error, because 0xA5 = 1010 0101 -> which is even parity, not odd.
    Since odd parity expects the sum of the 1 bits to be an odd
    number this byte must be an error because the sum of 1 bits is 
    an even number (4 bits are "1").

12. The following hexadecimal memory dump contains two big-endian four-byte
    two's-complement integers, starting at address 101.  What decimal values
    do these two big-endian four-byte integers have?
   ADDRESS: ---------- MEMORY BYTES ----------
       100: FF 01 DF 5E 86 FE 20 A1 79 61 62 63 64 FF C4 F4 A3 1F ... 

   Integer 1: 01 DF 5E 86 = 01DF5E86 -> positive number = +31415942 decimal
   Integer 2: FE 20 A1 79 = FE20A179 -> negative number (PANIC)
   -> use hex flip table -> 01DF5E86 -> add 1 -> 01DF5E87 = 31415943 decimal
   -> put a minus sign in front -> -31415943

13. The following hexadecimal memory dump contains two little-endian four-byte
    two's-complement integers, starting at address 102.  What decimal values
    do these two little-endian four-byte integers have?
   ADDRESS: ---------- MEMORY BYTES ----------
       100: FF 01 DF 5E 86 FE 20 A1 79 61 62 63 64 FF C4 F4 A3 1F ... 

   Integer 1: DF 5E 86 FE -> reverse -> FE 86 5E DF = FE865EDF -> negative
   -> use hex flip table -> 0179A120 -> add 1 -> 0179A121 = 24748321 decimal
   -> put a minus sign in front -> -24748321
   Integer 2: 20 A1 79 61 -> reverse -> 61 79 A1 20 = 6179A120 -> positive
   -> 6179A120 = 1635361056 decimal

14. What is the byte-ordering (big- or little-endian) of a GIF graphics image?

    GIFs are Little-Endian.

15. What is the byte-ordering (big- or little-endian) of a JPEG graphics image?

    JPEGs are Big-Endian.

16. What is the byte ordering of Intel-based computers?

    Intel x86 and x86_64 computers are all Little-Endian.

17. Given the following partial hexadecimal memory dump of the boot sector of
    an MS-DOS disk (MS-DOS means an Intel-based PC):
     0000: EB 3C 90 4D 53 44 4F 53 35 2E 30 00 02 04 01 00
     0010: 02 00 03 00 00 F8 F7 00 12 00 20 00 11 00 00 00
    Read the above dump and give the hexadecimal and decimal values of
    these unsigned integer items of different widths (sizes in bytes):
       Offset/Size: Value
       -----------  ------------------------------------------------
    a) 000Bh/2:     bytes per sector: 00 02 -> 0200h = 512 decimal
    b) 000Dh/1:     sectors per allocation unit (cluster): 04h = 4 decimal
    c) 0010h/1:     number of copies of FAT: 02h = 2 decimal
    d) 0011h/2:     number of root directory entries: 00 03 -> 0300h = 768(10)
    e) 0016h/2:     number of sectors per FAT: F7 00 -> 00F7h = 247 decimal
    f) 0018h/2:     number of sectors per track: 12 00 -> 0012h = 18 decimal
    g) 001Ah/2:     number of heads: 20 00 -> 0020h = 32 decimal

18. Looking at the previous question:

    b) What is the maximum number of bytes per sector possible?
       max is (2**16)-1 = FFFFh = 65535 decimal
    a) What is the maximum number of copies of the FAT possible?
       max is (2**8)-1 = FFh = 255 decimal

19. The eight-character ASCII text string "abcdefgh" is stored in memory
    starting at location zero.

    See http://easycalculation.com/hex-converter.php

    a) Give the eight hexadecimal bytes stored in memory:
       61h 62h 63h 64h 65h 66h 67h 68h
    b) Interpret these eight bytes as two 32-bit two's complement integers
       in little-endian form and give their two decimal values:
       Integer 1: 61 62 63 64 -> 64636261h = 1684234849 decimal
       Integer 2: 65 66 67 68 -> 68676665h = 1751606885 decimal
    c) Interpret these eight bytes as four 16-bit two's complement integers
       in big-endian form and give their four decimal values:
       Integer 1: 6162h = (positive) 24930 decimal
       Integer 2: 6364h = (positive) 25444 decimal
       Integer 3: 6566h = (positive) 25958 decimal
       Integer 4: 6768h = (positive) 26472 decimal
    d) Add one to each of the four big-endian integers from (c) and give
       the resulting changed ASCII text string (eight ASCII characters):
        6162h + 1 = 6163h
        6364h + 1 = 6365h
        6566h + 1 = 6567h
        6768h + 1 = 6769h
        ASCII: 61 63 63 65 65 67 67 69 -> "acceeggi"

20. True/False: Plain English text, encoded as ASCII, is identical to the
    same Plain English text encoded as UTF-8.

    True.  That's why UTF-8 is so popular in North America.

21. True/False: Plain English text, encoded as ASCII, is identical to the
    same Plain English text encoded as Latin-1.

    True.  The first 128 characters of almost all the 8-bit character
    encodings are just plain ASCII.  Only the last 128 bytes (the bytes
    with the top bit set) are used for the other languages.

22. True/False: Plain English text, encoded as ASCII, is identical to the
    same Plain English text encoded as Unicode.

    False.  ASCII is one byte per characters; Unicode is two bytes
    per character.  For example, ASCII 'A' = 0x41, Unicode 'A' = 0x0041.
    ASCII and Unicode have the same *values* for the first 128 characters,
    but ASCII stores the value in one byte and Unicode needs two.  If you
    encode an ASCII file as Unicode, the file size doubles.

-- 
| Ian! D. Allen  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/