-----------------------
Exercise #3 for DAT2330 due March 9
-----------------------
-Ian! D. Allen - idallen@idallen.ca

Remember - knowing how to find out an answer is more important than
memorizing the answer.  Learn to fish!  RTFM!  (Read The Fine Manual)

Global weight: 3% of your total mark this term
Due date: 10h00 (10am) Tuesday, March 9, 2004.

The deliverables for this exercise are to be submitted online on the
Course Linux Server using the "datsubmit" method described in the exercise
description, below.  No paper; no email; no FTP.

Late-submission date:  I will accept without penalty exercises that are
submitted late but before 14h30 (2:30pm) on Friday, March 12.  After that
late-submission date, the exercise is worth zero marks; but, it must still
be completed and submitted successfully to earn credit in the course.

Exercises submitted by the *due date* will be marked online and your
marks will be sent to you by email after the late-submission date.
A sample answer will be posted online after the late-submission date.
This exercise is due on or before 10h00 (10am) Tuesday, March 9, 2004.

Exercise Synopsis:

  This week you will use "tar" format file archives and Unix file
  compression tools.

  Part I - "Tarcheology"

    First, you will copy a tar archive and examine and extract its
    contents.  The archive contains a JPEG image broken up into small
    files and hidden in various places in the archive.  Your task is
    to locate all the small files (2,725 files), decompress any files
    that need it (*.gz or *.bz2), and then concatenate all the small
    data files together into one JPEG image.

  Part II - Automatic Shell Script

    You will make note of the commands that you use during Part I so
    that you can write a Unix bash shell script that will do these same
    command-line operations automatically.  The script you write will
    start with the original tar archive.  When your script has finished
    executing, the assembled JPEG image will be in the current directory.

    Re-running the script will remake the JPEG image from scratch.

Where to work:
    Do this work on the Course Linux Server.  Do not use ACADUNIX.

References and Readings:

    Running Linux: Chapter 4, Chapter 7 (tar, gzip, bzip2), Lectures,
    online Notes.

    See also: Exercise #6 from DAT2330, Fall 2003.

Exercise Details (on the Course Linux Server):

0.  Did you read the References?  Read first; type second!

--------------------
Part I - Tarcheology
--------------------

Examine a tar archive and locate all 2,725 image fragment files:

1.  Create a new, empty directory somewhere in your account on the
    Course Linux Server (NOT ON ACADUNIX).  Make this empty directory
    your current directory.  All output should be into this new directory.

    We will call this your "base" directory.  Return to this directory
    whenever you are instructed to "return to your base directory".

2.  Under the course notes directory is a hidden tar archive file named
    ".exercise03".  Find out what kind of tar archive it is.  (Is this
    an ordinary tar file, or a compressed tar file?)  Hard link (do not
    copy) this hidden file into your new empty directory.  (You might
    want to make your name for this file a short name that isn't hidden.)

3.  Verify that you linked the tar file correctly by displaying the
    inode numbers of the two file names.  (They should be the same.)

4.  Verify that the checksum of your link to the tar archive is 59240.
    (This should match the "sum" of the original ".exercise03" file.)

5.  Expand the tar archive into the current (almost empty) base directory.
    After you have expanded the tar archive, delete it.  You will now
    have several pathnames in your directory.  You should see this:

       $ ls -a | wc
       8      11      69

    Do not proceed until you have exactly 8 lines of output from "ls -a".

    One file in the current directory has some hints that you may find
    useful in decoding the tar archive.  Read the hint file.

6.  Use a Unix command to determine what kind of information is in
    each of the files that you extracted from the tar archive.  (The
    command that does this is in the top half of your list of commands
    from a previous exercise.)  Do not use "cat" or "less" on these files!

    Three of the files in the current directory are themselves tar
    archives (or compressed tar archives).  If you run "wc" on the verbose
    table of contents listings from each of the three tar archives,
    this is what you should see as output from wc:

	929    5586   72266
	929    8319   76843
	 25     165    1852

    You can see that two of the archives contain many files.

7.  Expand each of the three archive files into the current (base)
    directory and delete the archive file after you have expanded it.
    This will now be true:

       $ ls -a | wc
       26      44     290

       $ find | wc
       1887    4662   57730

    Remember to delete each tar file after you have expanded it.
    Do not proceed until you have exactly 26 lines of output from "ls -a".

8.  Verify that of the 1,887 pathnames under the current base directory,
    there are only 1,816 pathnames containing the string "/x".
    (Hint: What command shows all pathnames in all directories [and
    sub-directories] under the current directory?  What command selects
    lines based on a pattern?  Combine them; count the results.)

    Make sure you count 1,816 pathnames before you continue.

9.  Verify that of the 1,816 pathnames containing the string "/x", 908
    of the names contain the string "part1".  (Hint: Start with the
    pipeline you used above; add to it to select another pattern.)
    Note that all 908 files are in the same directory.

10. Verify that of the 1,816 pathnames containing the string "/x", 908
    of the names do *not* contain the string "part1".  (Hint: Make a
    small change to the previous pipeline to select lines that do *not*
    match a pattern.)  Note that all 908 of these files are in a different
    directory from the previous set.

11. If either of the directories in the above two steps contains any
    compressed files, uncompress them all.  (Hint: Do *not* do this one
    file at a time - there are 908 files to process!  Use the power of
    the shell to supply all the file names to the uncompression program.
    Change to the directory containing the files.  Uncompress them all.
    Return back to the base directory.  [See Step 1 for what "base" means.])

12. Verify the following output from "ls | wc" run in each of the
    above two directories:    908     908    4540

    (908 file names, each 4 characters long, plus newlines = 4,540.)

    Make sure you see 908 4-character file names in each directory.

13. Run "ls | sum" in each of the two directories.  The output should
    be 07344 for one directory and 39165 for the other.

    You have found the first 66% of the files necessary to assemble
    your JPEG image.  Only 909 more files left to find!

14. Return to the base directory.  (See Step 1 for what "base" means.)
    Find the file named "mystery", buried in some sub-directory somewhere
    under the current directory.  (There are fast and slow ways to
    locate this pathname.  Please use a fast way.  The file contains
    168,134 bytes.)

    The other 909 files needed for your JPEG image are hidden inside the
    "mystery" file.  Find out what kind of file "mystery" is.  Extract
    the 909 files from inside "mystery" into a new empty directory.
    (This will be the third directory containing image fragment files.)
    If the files are compressed, uncompress them.

    When you are done with "mystery", you will have extracted 909
    4-character file names into the third directory:

	$ ls | wc
	909     909    4545

	$ ls | sum
	62695     5

    The last file name ("xeau") from "mystery" will contain only 45 bytes
    of data.  All the other 2,724 image fragment files you have found
    will contain 99 bytes.
    
    You now have three directories containing a total of 2,725 image
    fragment files.

15. Concatenate together into a single new output file these input files
    from the three directories:

    a) The 908 files from Part 1 (all 99 bytes)
    b) the 908 files from Part 2 (all 99 bytes)
    c) the 909 files from Part 3 "mystery" (all but "xeau" are 99 bytes)

    Concatenate all 2,725 files together into one output file named
    "tux-kong.jpg".  You may wish to link or move all the files into
    one directory first before you concatenate; but, this is optional
    and not necessary.

    Your output file will contain the concatenated contents of 2,725 files.

    Make sure you don't accidentally include file names that don't belong.
    All the image fragment file names start with "x" and are exactly
    four characters long.  All the 2,725 image fragment files are 99
    bytes long except the last one.

16. Verify the output file size and checksum of your new tux-kong.jpg file:

       $ wc tux-kong.jpg
       1341    6310  269723 tux-kong.jpg

       $ sum tux-kong.jpg
       42401   264

17. Move tux.jpg out of the current base directory into the parent directory.

18. Change to the parent directory of the base directory.  Remove
    recursively the entire base directory (the one you created in Step 1).
    Your tux-kong.jpg image will be left in the current directory.

--------------------------------
Part II - Automatic Shell Script
--------------------------------

Write a shell script on the Course Linux Server to do what you did in Part I.

Using VI/VIM, create an executable shell script file named exercise03.sh
on the Course Linux Server that will automatically build the tux-kong.jpg
file from the given .exercise03 archive file.

Follow the 9-part script format described in Notes file: script_style.txt

Build this script ONE LINE AT A TIME and test it after each line.
You will not be able to make it work if you write a dozen lines and
then try to debug it.  You will find the "-v" and "-x" options to bash
helpful to debug your script:  $ bash -u -v ./my.sh

Your script will do a selection (not all) of the steps you did in Part I.
The starting file is the .exercise03 file; the result after the script
executes should be a tux-kong.jpg file in the current directory.

Here are the actions and outputs required for your script.  These actions
and outputs are a strict subset of those in Part I.  Do not produce
output that is not shown below (each output is flagged with "*"):

 a) * Display "--- Part (a) ---" on standard output.

    Complete the actions required by Step 1 (omit output actions).

    (You may wish to recursively remove a previously existing directory
    before you start your script.  The "-f" option to the remove command
    suppresses error messages if the directory you are trying to remove
    does not exist: rm -rf )

    * Display the pathname of the current working directory.

 b) * Display "--- Part (b) ---" on standard output.
 
    Complete the actions required by Steps 2-5 (omit output actions).

    * Display the output of "ls -a | wc" (showing 8 lines).

 c) * Display "--- Part (c) ---" on standard output.

    Complete the actions required by Step 6:

    * Display the output of running "wc" on the verbose table of contents
      listings from each of the three tar archives.

 d) * Display "--- Part (d) ---" on standard output.

    Complete the actions required by Step 7:

    * Display the output of "ls -a | wc" and "find | wc".

 e) * Display "--- Part (e) ---" on standard output.

    Complete the actions required by Steps 8-10:

    Write a pipeline that selects and counts pathnames containing the
    string "/x".

    * Display the counted output (showing 1816 lines).

    Write a pipeline that counts pathnames containing the string "/x"
    and the string "part1".

    * Display the counted output (showing 908 lines).

    Write a pipeline that counts pathnames containing the string "/x"
    and *not* containing the string "part1".

    * Display the counted output (showing 908 lines).

 f) * Display "--- Part (f) ---" on standard output.

    Complete the actions required by Steps 11-13:

    * Change to the first directory and run "ls | wc" and "ls | sum".

    * Change to the second directory and run "ls | wc" and "ls | sum".

 g) * Display "--- Part (g) ---" on standard output.

    Complete the actions required by Step 14:   

    * Change to the third directory and run "ls | wc" and "ls | sum".
    
 h) * Display "--- Part (h) ---" on standard output.

    Complete the actions required by Step 15-18 (omit the output actions).   

    * Display the output of "wc tux-kong.jpg" and "sum tux-kong.jpg".

The only output from your Part II script should be the items marked
above with "*".  Do not produce the extra outputs you produced in Part
I.  Your script will execute like this:

    $ ./exercise03.sh
    --- Part (a) ---
    /home/idallen/some/directory
    --- Part (b) ---
	  8      11      69
    --- Part (c) ---
	929    5586   72266
	929    8319   76843
	 25     165    1852
    --- Part (d) ---
	 26      44     290
       1887    4662   57730
    --- Part (e) ---
       1816    4540   56296
	908     908   25424
	908    3632   30872
    --- Part (f) ---
	908     908    4540
    07344     5
	908     908    4540
    39165     5
    --- Part (g) ---
	909     909    4545
    62695     5
    --- Part (h) ---
       1341    6310  269723 tux-kong.jpg
    42401   264

Documentation:

    Add comments and blank lines before each group of commands in the
    script, explaining *why* you are doing these things in the script.
    (What is the purpose of the group of commands?  Why is it there?)

    Scripts without comments are unsatisfactory; they are worth zero
    marks.  You must submit comments with your scripts.

    One comment may serve to explain several Unix commands; you do not
    need a comment in front of every single command line in the script.
    See the comment style described in Notes file "script_style.txt".

-----------------------
Part III - More Weather
-----------------------

    Create a script file named "myweather.sh".  Follow the full
    9-part script style described in Notes file "script_style.txt".

    Copy the code from your working "weather.sh" script.  Modify the
    code to use the first argument to the script as the airport city
    code from which weather should be obtained.  Test that this works.

       $ ./myweather.sh YPH
              Temp.:               -21°C

    Now, rework your code so that the output looks like this (similar
    to what I demonstrated in class):

       $ ./myweather.sh YPH
       The temperature in YPH is   -21°C right now.

    The output must be one line, not several lines.  (Hint: Use Command
    Substitution and a variable to fetch and hold the temperature.)

    BONUS:

    Find a way to extract the city name from the data fetched.
    Produce the following format output from "myweather.sh":

    $ ./myweather.sh YPH
    The temperature in YPH (Inukjuak) is   -21°C right now.

    (Hint: Fetch the web page into a temporary file in /tmp and scan it
    once to get the city name into a variable and a second time to get the
    temperature into a variable.  Remove the temporary file when done.)

==========
Submission
==========

Submit both the finished exercise03.sh file and myweather.sh file for
marking as Exercise 03 on the Course Linux Server, using the following
*single* datsubmit command line:

       $ datsubmit 03 exercise03.sh myweather.sh

This "datsubmit" program will copy the selected files to me for marking.
Always submit all your files at the same time.  Do not delete your
copies; keep them.  Verify that you submitted all your files, using this
command line:

       $ datsubmit 03 -list

Note that the digit "1" and the letter "l" (lower-case "L") are different.
Do not confuse the two.

You may redo this exercise and re-submit your results as many times as
you like; but, you must always submit *all* your exercise files every time.

The "-delete" option of datsubmit will delete the most recent submission
you have made.  I will mark only the most recent submission that is
submitted before the final hand-in cutoff date.

For Exercise 03, always use "03" as the first argument to "datsubmit".
Always submit *all* the files each time you submit an exercise.

A correct exercise03.sh is worth 85% of the mark.
A correct myweather.sh is worth the remaining 15%.