----------------------- Exercise #3 for DAT2330 due March 9 ----------------------- -Ian! D. Allen - idallen@idallen.ca Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Global weight: 3% of your total mark this term Due date: 10h00 (10am) Tuesday, March 9, 2004. The deliverables for this exercise are to be submitted online on the Course Linux Server using the "datsubmit" method described in the exercise description, below. No paper; no email; no FTP. Late-submission date: I will accept without penalty exercises that are submitted late but before 14h30 (2:30pm) on Friday, March 12. After that late-submission date, the exercise is worth zero marks; but, it must still be completed and submitted successfully to earn credit in the course. Exercises submitted by the *due date* will be marked online and your marks will be sent to you by email after the late-submission date. A sample answer will be posted online after the late-submission date. This exercise is due on or before 10h00 (10am) Tuesday, March 9, 2004. Exercise Synopsis: This week you will use "tar" format file archives and Unix file compression tools. Part I - "Tarcheology" First, you will copy a tar archive and examine and extract its contents. The archive contains a JPEG image broken up into small files and hidden in various places in the archive. Your task is to locate all the small files (2,725 files), decompress any files that need it (*.gz or *.bz2), and then concatenate all the small data files together into one JPEG image. Part II - Automatic Shell Script You will make note of the commands that you use during Part I so that you can write a Unix bash shell script that will do these same command-line operations automatically. The script you write will start with the original tar archive. When your script has finished executing, the assembled JPEG image will be in the current directory. Re-running the script will remake the JPEG image from scratch. Where to work: Do this work on the Course Linux Server. Do not use ACADUNIX. References and Readings: Running Linux: Chapter 4, Chapter 7 (tar, gzip, bzip2), Lectures, online Notes. See also: Exercise #6 from DAT2330, Fall 2003. Exercise Details (on the Course Linux Server): 0. Did you read the References? Read first; type second! -------------------- Part I - Tarcheology -------------------- Examine a tar archive and locate all 2,725 image fragment files: 1. Create a new, empty directory somewhere in your account on the Course Linux Server (NOT ON ACADUNIX). Make this empty directory your current directory. All output should be into this new directory. We will call this your "base" directory. Return to this directory whenever you are instructed to "return to your base directory". 2. Under the course notes directory is a hidden tar archive file named ".exercise03". Find out what kind of tar archive it is. (Is this an ordinary tar file, or a compressed tar file?) Hard link (do not copy) this hidden file into your new empty directory. (You might want to make your name for this file a short name that isn't hidden.) 3. Verify that you linked the tar file correctly by displaying the inode numbers of the two file names. (They should be the same.) 4. Verify that the checksum of your link to the tar archive is 59240. (This should match the "sum" of the original ".exercise03" file.) 5. Expand the tar archive into the current (almost empty) base directory. After you have expanded the tar archive, delete it. You will now have several pathnames in your directory. You should see this: $ ls -a | wc 8 11 69 Do not proceed until you have exactly 8 lines of output from "ls -a". One file in the current directory has some hints that you may find useful in decoding the tar archive. Read the hint file. 6. Use a Unix command to determine what kind of information is in each of the files that you extracted from the tar archive. (The command that does this is in the top half of your list of commands from a previous exercise.) Do not use "cat" or "less" on these files! Three of the files in the current directory are themselves tar archives (or compressed tar archives). If you run "wc" on the verbose table of contents listings from each of the three tar archives, this is what you should see as output from wc: 929 5586 72266 929 8319 76843 25 165 1852 You can see that two of the archives contain many files. 7. Expand each of the three archive files into the current (base) directory and delete the archive file after you have expanded it. This will now be true: $ ls -a | wc 26 44 290 $ find | wc 1887 4662 57730 Remember to delete each tar file after you have expanded it. Do not proceed until you have exactly 26 lines of output from "ls -a". 8. Verify that of the 1,887 pathnames under the current base directory, there are only 1,816 pathnames containing the string "/x". (Hint: What command shows all pathnames in all directories [and sub-directories] under the current directory? What command selects lines based on a pattern? Combine them; count the results.) Make sure you count 1,816 pathnames before you continue. 9. Verify that of the 1,816 pathnames containing the string "/x", 908 of the names contain the string "part1". (Hint: Start with the pipeline you used above; add to it to select another pattern.) Note that all 908 files are in the same directory. 10. Verify that of the 1,816 pathnames containing the string "/x", 908 of the names do *not* contain the string "part1". (Hint: Make a small change to the previous pipeline to select lines that do *not* match a pattern.) Note that all 908 of these files are in a different directory from the previous set. 11. If either of the directories in the above two steps contains any compressed files, uncompress them all. (Hint: Do *not* do this one file at a time - there are 908 files to process! Use the power of the shell to supply all the file names to the uncompression program. Change to the directory containing the files. Uncompress them all. Return back to the base directory. [See Step 1 for what "base" means.]) 12. Verify the following output from "ls | wc" run in each of the above two directories: 908 908 4540 (908 file names, each 4 characters long, plus newlines = 4,540.) Make sure you see 908 4-character file names in each directory. 13. Run "ls | sum" in each of the two directories. The output should be 07344 for one directory and 39165 for the other. You have found the first 66% of the files necessary to assemble your JPEG image. Only 909 more files left to find! 14. Return to the base directory. (See Step 1 for what "base" means.) Find the file named "mystery", buried in some sub-directory somewhere under the current directory. (There are fast and slow ways to locate this pathname. Please use a fast way. The file contains 168,134 bytes.) The other 909 files needed for your JPEG image are hidden inside the "mystery" file. Find out what kind of file "mystery" is. Extract the 909 files from inside "mystery" into a new empty directory. (This will be the third directory containing image fragment files.) If the files are compressed, uncompress them. When you are done with "mystery", you will have extracted 909 4-character file names into the third directory: $ ls | wc 909 909 4545 $ ls | sum 62695 5 The last file name ("xeau") from "mystery" will contain only 45 bytes of data. All the other 2,724 image fragment files you have found will contain 99 bytes. You now have three directories containing a total of 2,725 image fragment files. 15. Concatenate together into a single new output file these input files from the three directories: a) The 908 files from Part 1 (all 99 bytes) b) the 908 files from Part 2 (all 99 bytes) c) the 909 files from Part 3 "mystery" (all but "xeau" are 99 bytes) Concatenate all 2,725 files together into one output file named "tux-kong.jpg". You may wish to link or move all the files into one directory first before you concatenate; but, this is optional and not necessary. Your output file will contain the concatenated contents of 2,725 files. Make sure you don't accidentally include file names that don't belong. All the image fragment file names start with "x" and are exactly four characters long. All the 2,725 image fragment files are 99 bytes long except the last one. 16. Verify the output file size and checksum of your new tux-kong.jpg file: $ wc tux-kong.jpg 1341 6310 269723 tux-kong.jpg $ sum tux-kong.jpg 42401 264 17. Move tux.jpg out of the current base directory into the parent directory. 18. Change to the parent directory of the base directory. Remove recursively the entire base directory (the one you created in Step 1). Your tux-kong.jpg image will be left in the current directory. -------------------------------- Part II - Automatic Shell Script -------------------------------- Write a shell script on the Course Linux Server to do what you did in Part I. Using VI/VIM, create an executable shell script file named exercise03.sh on the Course Linux Server that will automatically build the tux-kong.jpg file from the given .exercise03 archive file. Follow the 9-part script format described in Notes file: script_style.txt Build this script ONE LINE AT A TIME and test it after each line. You will not be able to make it work if you write a dozen lines and then try to debug it. You will find the "-v" and "-x" options to bash helpful to debug your script: $ bash -u -v ./my.sh Your script will do a selection (not all) of the steps you did in Part I. The starting file is the .exercise03 file; the result after the script executes should be a tux-kong.jpg file in the current directory. Here are the actions and outputs required for your script. These actions and outputs are a strict subset of those in Part I. Do not produce output that is not shown below (each output is flagged with "*"): a) * Display "--- Part (a) ---" on standard output. Complete the actions required by Step 1 (omit output actions). (You may wish to recursively remove a previously existing directory before you start your script. The "-f" option to the remove command suppresses error messages if the directory you are trying to remove does not exist: rm -rf ) * Display the pathname of the current working directory. b) * Display "--- Part (b) ---" on standard output. Complete the actions required by Steps 2-5 (omit output actions). * Display the output of "ls -a | wc" (showing 8 lines). c) * Display "--- Part (c) ---" on standard output. Complete the actions required by Step 6: * Display the output of running "wc" on the verbose table of contents listings from each of the three tar archives. d) * Display "--- Part (d) ---" on standard output. Complete the actions required by Step 7: * Display the output of "ls -a | wc" and "find | wc". e) * Display "--- Part (e) ---" on standard output. Complete the actions required by Steps 8-10: Write a pipeline that selects and counts pathnames containing the string "/x". * Display the counted output (showing 1816 lines). Write a pipeline that counts pathnames containing the string "/x" and the string "part1". * Display the counted output (showing 908 lines). Write a pipeline that counts pathnames containing the string "/x" and *not* containing the string "part1". * Display the counted output (showing 908 lines). f) * Display "--- Part (f) ---" on standard output. Complete the actions required by Steps 11-13: * Change to the first directory and run "ls | wc" and "ls | sum". * Change to the second directory and run "ls | wc" and "ls | sum". g) * Display "--- Part (g) ---" on standard output. Complete the actions required by Step 14: * Change to the third directory and run "ls | wc" and "ls | sum". h) * Display "--- Part (h) ---" on standard output. Complete the actions required by Step 15-18 (omit the output actions). * Display the output of "wc tux-kong.jpg" and "sum tux-kong.jpg". The only output from your Part II script should be the items marked above with "*". Do not produce the extra outputs you produced in Part I. Your script will execute like this: $ ./exercise03.sh --- Part (a) --- /home/idallen/some/directory --- Part (b) --- 8 11 69 --- Part (c) --- 929 5586 72266 929 8319 76843 25 165 1852 --- Part (d) --- 26 44 290 1887 4662 57730 --- Part (e) --- 1816 4540 56296 908 908 25424 908 3632 30872 --- Part (f) --- 908 908 4540 07344 5 908 908 4540 39165 5 --- Part (g) --- 909 909 4545 62695 5 --- Part (h) --- 1341 6310 269723 tux-kong.jpg 42401 264 Documentation: Add comments and blank lines before each group of commands in the script, explaining *why* you are doing these things in the script. (What is the purpose of the group of commands? Why is it there?) Scripts without comments are unsatisfactory; they are worth zero marks. You must submit comments with your scripts. One comment may serve to explain several Unix commands; you do not need a comment in front of every single command line in the script. See the comment style described in Notes file "script_style.txt". ----------------------- Part III - More Weather ----------------------- Create a script file named "myweather.sh". Follow the full 9-part script style described in Notes file "script_style.txt". Copy the code from your working "weather.sh" script. Modify the code to use the first argument to the script as the airport city code from which weather should be obtained. Test that this works. $ ./myweather.sh YPH Temp.: -21°C Now, rework your code so that the output looks like this (similar to what I demonstrated in class): $ ./myweather.sh YPH The temperature in YPH is -21°C right now. The output must be one line, not several lines. (Hint: Use Command Substitution and a variable to fetch and hold the temperature.) BONUS: Find a way to extract the city name from the data fetched. Produce the following format output from "myweather.sh": $ ./myweather.sh YPH The temperature in YPH (Inukjuak) is -21°C right now. (Hint: Fetch the web page into a temporary file in /tmp and scan it once to get the city name into a variable and a second time to get the temperature into a variable. Remove the temporary file when done.) ========== Submission ========== Submit both the finished exercise03.sh file and myweather.sh file for marking as Exercise 03 on the Course Linux Server, using the following *single* datsubmit command line: $ datsubmit 03 exercise03.sh myweather.sh This "datsubmit" program will copy the selected files to me for marking. Always submit all your files at the same time. Do not delete your copies; keep them. Verify that you submitted all your files, using this command line: $ datsubmit 03 -list Note that the digit "1" and the letter "l" (lower-case "L") are different. Do not confuse the two. You may redo this exercise and re-submit your results as many times as you like; but, you must always submit *all* your exercise files every time. The "-delete" option of datsubmit will delete the most recent submission you have made. I will mark only the most recent submission that is submitted before the final hand-in cutoff date. For Exercise 03, always use "03" as the first argument to "datsubmit". Always submit *all* the files each time you submit an exercise. A correct exercise03.sh is worth 85% of the mark. A correct myweather.sh is worth the remaining 15%.