----------------------- Exercise #5 for NET2003 due February 15, 2005 ----------------------- -Ian! D. Allen - idallen@idallen.ca Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Global weight: 4% of your total mark this term Due date: 10h00 Tuesday February 15, 2005. The deliverables for this exercise are to be submitted online on the Course Linux Server using the "datsubmit" method described in the exercise description, below. No paper; no email; no FTP. Late-submission date: I will accept without penalty exercises that are submitted late but before 10h00 on Thursday, February 17. After that late-submission date, the exercise is worth zero marks. Exercises submitted by the *due date* will be marked online and your marks will be sent to you by email after the late-submission date. This exercise is due 10h00 Tuesday February 15, 2005. Exercise Synopsis: Marks: 4% Data mining using scripts: On the Course Linux Server, create an executable bash shell script that does "data mining" on the HTML weather pages from the Canadian weather office. Locate the weather page for a given city and display information from the page. Where to work: Do your Unix command line work on the Course Linux Server. The files you work on will remain on the server even after you log off. Do not erase your files after submission; always keep a spare copy of your exercises. WARNING: Do not attempt this exercise on a Windows machine - the text file format is different. You must connect to and work on Unix/Linux. Note that you may connect to the Course Linux Server *from* a Windows machine (using PuTTY); however, you may not use the Windows machine itself to do your work. Use the vim editor on the Course Linux Server. Location of the course notes on the Course Linux Server: You can find a copy of all the course Notes files on the Linux Server under directory: ~idallen/public_html/teaching/net2003/05w/notes/ You can copy files from this directory to your own account for modification or study, if you like. (To avoid plagiarism charges, you must credit any material that you copy and submit unchanged.) Exercise Preparation: See notes and readings in week04notes.txt, week05notes.txt --------------------------------------------- Exercise Details (on the Course Linux Server) --------------------------------------------- Have you done all the preparation steps? If not, go back and do them. Finish your Notes Readings (see the weekly Notes files). Any questions? See me in a lab or post questions to the Discussion news group (on the top left of the Course Home Page). You may find it useful to create separate directories in your account to store the files for each exercise. Part I - weather5.sh --------------------- 1. Using VI/VIM, edit a new executable script file named weather5.sh on the Course Linux Server. The spelling of the file name must be exact, othewise it won't be marked. The spelling must be exact. Exact! The contents of the file will be modelled on the argv.sh script: a) The first line of the file must be a valid shell interpreter. b) The second line of the file is a one-line (less than 80 characters) description of what this script does. c) The "Syntax" section for this script should indicate that this script takes one argument: an optional city name. d) The Purpose is a few lines that describes what the script does. (You will know what that is when you are done the exercise!) e) Next is your Assignment Label for this exercise. f) Next are the three lines setting PATH, umask, and LC_COLLATE. The above are standard parts of every NET2003 shell script. Make sure the script file is executable by you. 2. At the bottom of your script file, add two test command lines. One command should be found by the shell in the /bin directory, the other in the /usr/bin directory. (Hint: check out the locations of the "date" and "wc" commands.) Execute your script with its two test command lines and verify that the script can find and run both commands without any errors. When this is true, insert shell comment characters in front of the two lines (octothorpe: "#"). Do not remove the lines. 3. At the botton of your script file, add a line that uses an *undefined* shell variable name (you pick the name). Run your script and verify that the shell gives you an error for using an undefined name. When this is true, comment out the line. Do not remove it. 4. If the user supplies a command line argument, use it as the city name; otherwise (no argument), prompt for and read the city name. Quick-exit with status 1 if the user types EOF in response to the reading of the city name. (No message is necessary.) If more than one argument is given, issue a good error message (containing all four parts of a good error message), and exit the script with status 2. 5. If the city name is the empty string (if the user entered no characters for the name), issue a good error message and exit the script with status 3. 6. Here is the main text-mode weather page URL for Canada: http://text.weatheroffice.ec.gc.ca/ Put this URL into a variable for use throughout your script. 7. Choose a file name in the current directory that will hold the raw HTML from the main weather page. If it is not true that a file by that name exists and has a size greater than zero, use the wget command to fetch the main URL into that file name. (Don't use wget to fetch the page if the page is already in the file. Check first.) After the wget, check the size of the fetched file. If the file is empty (no data - zero size), print a good error message containing the exit status of wget, remove the empty file, and exit the script with status 4. 8. Look for the word "english" in the main weather page that wget stored in the file. Pick off the partial English URL from inside the double quotes on the href= line that precedes the line containing "english". (The partial URL you are seeking looks similar to this: canada_e.html - extract whatever is between the double quotes.) If you can't find this partial URL (i.e. your search produces an empty string), print a good error message, rename the file to have a ".bak" extension, and tell the user you have done this, and exit the script with status 5. 9. Choose a different file name in the current directory that will hold the raw HTML from the "English" weather page. If it is not true that a file by that name exists and has a size greater than zero, use the wget command to fetch the "English" URL into that file name. (Don't use wget to fetch the page if the page is already in the file.) The URL you assemble for wget will start with the variable containing the main text-mode weather URL and end with the variable containing the partial English URL from the previous step, e.g. it might expand to look similar to: http://text.weatheroffice.ec.gc.ca/canada_e.html After the wget, check the size of the fetched file. If the file is empty (zero size - no data), print a good error message that includes the exit status of wget, remove the empty file, and exit the script with status 6. 10. Look for the city name (the name already entered by the user) in the "English" weather page stored in the file by wget. Only select lines containing the string /forecast/. From these lines, pick off the partial URL from inside the single quotes after the href= keyword. (The partial URL you are seeking looks similar to this: /forecast/city_e.html?ab-50 The actual /forecast/ URL is different for each city name. Pick off the string inside single quotes.) If you can't find this partial /forecast/ URL (i.e. your search for the user's city name produces an empty string), print a good error message and exit the script with status 7. 11. Count the number of /forecast/ lines you found in the previous step. If the number of lines (the number of /forecast/ URLs found) is not exactly one, issue a good error message and exit the script with status 8. (We can only load one city forecast web page - tell the user to be more specific about the city name.) 12. Use lynx to dump the formatted web page for the given city name into a file in the current directory. The URL you assemble for lynx will start with the variable containing the main text-mode weather URL and end with the variable containing the one partial city /forecast/ URL from the previous step, e.g. it might expand to look similar to: http://text.weatheroffice.ec.gc.ca/forecast/city_e.html?ab-50 13. If the output file from lynx is empty, print a good error message that includes the exit status of lynx, remove the empty file, and exit the script with status 9. 14. If the output file from lynx contains the upper-case word ERROR (probably indicating an unknown city), issue a good error message, remove the file, and exit the script with status 10. 15. Look for and extract the full city name from the lynx output file. Save this value for use later. (For example, the full city name of "Ottawa" or "Ott" or "Ot", extracted from the weather page, is "Ottawa (Kanata - Orléans)". 16. Extract the current city temperature from the lynx output file. Save this value for use later. (For example, the temperature reading might be "-2°C".) 17. Produce an output line similar to the following: $ ./weather5.sh user's_city_name The temperature in XXX is YYY where XXX is replaced by the full city name and YYY by the temperature. For example, it might look like this: $ ./weather5.sh Ott The temperature in Ottawa (Kanata - Orléans) is -2° Suppress extra blanks in the output line. Spelling counts. Part II - weatherout5.txt ------------------------ 18. Testing: Experiment with the weather in other cities to make sure your script works for other cities. Note that your script will work on minimal prefixes of city names - you don't have to type out the whole name as long as the prefix is a unique city: $ ./weather5.sh Mont The temperature in Montréal is -3°C $ ./weather5.sh Iq The temperature in Iqaluit is -28°C $ ./weather5.sh Sa The temperature in Saskatoon is -25°C $ ./weather5.sh Pr The temperature in Prince George is -17°C ...etc... (Make sure your script is only selecting city partial URL lines that contain the /forecast/ pattern, or else you will accidentally match province names contained in the same file.) 19. Run your script for city names "Sa", "Qu", and "Pr" and save the output in file weatherout5.txt (three lines total). 20. Run your script for the city name "a - O". (This string is five characters long and contains two blanks. It matches the Ottawa weather page.) Append this output the the weatherout5.txt file. No label is needed in the weatherout5.txt file; it will be only four lines. Part III - internal documentation --------------------------------- 21. On the lines above each of the steps in your scripts, add a few lines of comments (each line less than 80 characters!) explaining in your own words what the step that follows the comment does and what the expected output is. Follow the block comment conventions outlined in the script_style.txt file. Shell script comments start with "#" and extend to the end of the line. For readability, the lines must not be longer than 80 characters. (Avoid putting comments to the right of commands.) Scripts without added comment lines will not be marked. Submission ---------- Reference: datsubmit.txt - Using the datsubmit command Submit the finished and labelled script file for marking as Exercise 05 on the Course Linux Server, using the following datsubmit command line: $ datsubmit 05 weather5.sh weatherout5.txt This "datsubmit" program will copy the files to me for marking. Always submit both files at the same time.