========================== CST8129 Term Assignment #2-B (Make-Up) ========================== -IAN! idallen@ncf.ca Due: 12:00 (12 noon) Monday February 10, 2003 Marks: 8% Late penalty: 100% per day Purpose: - practice writing real shell scripts using Regular Expressions (material in Chapters 2, 3, 8, 9) Hand in format: online submission only - no paper, no diskettes All scripts described below must be written to conform to the script writing checklist: script_checklist.txt and to the script style given in: script_style.txt. All user input (command line arguments or input via "read") must be fully validated before being used in expressions. Do not process bad input! Echo user input (including command line arguments given) back to the user. This is usually a good idea both for debugging your script and giving the user feedback on what data the script is actually processing. Avoid Linux-only commands and command options. The same script should work without modification on both Linux and ACADUNIX, where possible. (In particular, do *not* use Bash 2.x shell syntax!) Test your scripts. The sample inputs and output shown below are not a complete test suite. I will try to find test cases using glob patterns and blanks that will make your scripts abort or misbehave. Scripts without *useful* block comments will be severely penalized. (See the file script_style.txt for a description of good comment style.) ------------------ Hand in directory: ------------------ Completed scripts must have permissions "read-write-execute-only" for you, "read-only" permissions for group, no permissions for other people. The following directory is ready to receive your completed scripts: ~alleni/cst/assignment02-B/xxxxnnnn/ where xxxxnnnn is your Algonquin userid (e.g. abcd0001). When you have completed a script, copy it into the above directory: cp myscript.sh ~alleni/cst/assignment02-B/xxxxnnnn/myscript.sh Replace "myscript.sh" with the actual script name (given below). Files with the wrong name or wrong Unix permissions will be penalized. -------------------- Write these scripts: -------------------- --) Write this executable script named "3B_header_validator.sh" (7 marks) Syntax: $0 [ pathname ] Purpose: Write a script to validate some aspects of a properly-written email message. The script you write will expect exactly one command-line pathname argument that will be a mail message file to process. Prompt for and read the argument if it is missing. Print an error message and exit with status 2 if there is more than one argument given on the command line. Check (validate) the pathname argument before trying to read and process it; if the pathname is not a non-empty, readable, not-executable, plain file, issue an error message and exit the script with status 1. Do not process a bad pathname. Perform a series of tests on this file, as specified below. Write shell functions to perform each test. (An example is given below.) Each shell function should take an argument that is the file pathname of the mail message that is to be tested. If the test fails, the function will print an error message and return a non-zero status. (Do not exit the script!) Each function must return status 0 if its particular test succeeds and non-zero otherwise. For example, you will write the following one-argument testing function based on this description: 1) This testing function looks for the "To:" header line that should be in the top 30 lines of the file whose name is given as the first argument to the function. If the header line is found, the function validates it further. To select the "To:" line, we select the first 30 lines of the mail message file (first argument) and run the lines through the following egrep extended regexp: - start at the beginning of the line - match "To:" - match at least one blank (space) - match at least three non-blank characters If egrep finds the line based on the above regular expression, the function will do the following further tests on the line found: - start at the beginning of the line - match "To:" - allow any number of any characters except "@" - match a letter, digit, "_" or "-" - match a single "@" - match a letter, digit, "_" or "-" - allow any number of any characters except "@" - match the end of line Return status 0 if all the tests pass, non-zero otherwise. Here is the code that you would write to implement the above testing function (this code comes directly from the above description): TestToHeaderLine () { # Select the first 30 lines of the file and use egrep on them. # Put the egrep regexp into a variable so we can use it twice # without writing it twice - do not duplicate code. # Save the output of the egrep in variable $line for later use. # regexp1='^To:' regexp2=" +[^ ][^ ][^ ]" # three non-blank chars line=$( head -30 "$1" | egrep "$regexp1$regexp2" ) # See if the egrep found the line (test for zero size string). # if [ -z "$line" ] ; then echo 1>&2 "$0: '$1': Missing To: header line" return 1 # line not found - return bad status (not exit) fi # We get here if the egrep pattern did find the header line. # The line we found is stored in shell variable "$line". # Do further tests on the line: # See if the line we found has at least two characters # surrounding the expected "@" character. # This re-uses the same regexp pattern from above and adds to it. # Note the correct use of double/single quoting in the pattern. # Note that we echo the $line as part of the error message. # You must put "-" first in a character class to match "-". # goodch='[-a-zA-Z0-9_]' try=$( echo "$line" | egrep "$regexp1[^@]*${goodch}@${goodch}[^@]*"'$' ) if [ -z "$try" ] ; then echo 1>&2 "$0: '$1': Unrecognized format To: line: $line" return 2 # return bad status (do not exit) fi return 0 # no errors - must be a valid line - return good status } All the testing functions will follow the above order of operation: First, the function must try to find the basic keywords used in the line for which it is looking. Second, perform some validations on the text that is supposed to follow the keywords. Each testing function prints an error message and returns a non-zero status when it detects an error. (Do not exit the script on error!) You would write the above function and then use it in an IF statement in your script as follows (assuming the argument pathname is in the variable $file): if ! TestToHeaderLine "$file" ; then ... insert code to count the error here ... fi Your finished program will start with a set of function definitions, each preceded by a block comment describing the purpose of the function. After all the function definitions will come the same number of IF statements (similar to the one above). Each IF statement will execute one of the functions and check its return status. Below are descriptions of more testing functions that you must write. Each function should search the mail message file for a line containing the the given pattern and, if found, validate the rest of the line. Be precise - "To:*" would be an incorrect regexp for use in the above function; because, it matches "To::::", which is not in the specification for the function. Match exactly what is described in the specifications for each function. Define each testing function before you use it in the script. The top part of your script will be all your function definitions; the bottom part will be IF statements, similar to the one shown above, each using one of the function definitions on the existing pathname argument. (Do not process nonexistent arguments!) Every time a testing function returns a non-zero status, count it as an error. (Do not exit the script after a testing function - just count each error and move on to the next IF statement that uses the next testing function.) You are to write the following testing functions, one at a time, and add them to the script. Write each function, make sure it works, and then add the next one. Start by using the TestToHeaderLine function code and IF statement given above, then add more functions, one at a time. Test your script after each new function. Copy the TestToHeaderLine function code given above and modify it to work for each new function that you write. Follow the same two-part order of operation (described earlier) in each function that you write. Here are the descriptions of the testing and validation functions. Write one function per numbered test below; invent and use your own good function names: 1) A function to test for and validate the "To:" header line. (See the description and TestToHeaderLine code already given above.) Good line: To: (Ian! D. Allen) idallen@ncf.ca Good line: To: Ian! D. Allen Bad line: To: idallen @ncf.ca Bad line: To: idallen@ncf.ca@ncf.ca Bad line: To : idallen@ncf.ca 2) A function to test for and validate the "From:" header line. Apply the same tests as for the "To:" line. Also make sure that the email address comes from Algonquin College. (Do not accept an email address from any non-Algonquin addresses.) Note that all header lines must begin at the start of the line. 3) A function to test for the "Date:" header line. If the header line "Date:" is found, match the rest of the line (after "Date:") against the following extended regexp: - any number of blanks (spaces) - an optional day of the week field (three parts): + a 3 letter day of the week (Mon or Tue or Wed or Thu or Fri or Sat or Sun) + an optional comma + one or more spaces - one or two digits (day of month) - one or more spaces - a 3 letter month name (Jan or Feb or Mar or ... or Nov or Dec) - one or more spaces - a digit 1 or 2 - three more digits - one or more spaces - a digit 0, 1, or 2 - another digit - a colon (":") - a digit 0 through 5 - another digit - a colon (":") - a digit 0 through 5 - another digit Good line: Date: Fri, 19 Dec 2003 15:22:37 -0500 (EST) Good line: Date: Fri 19 Dec 2003 15:22:37 -0500 (EST) Good line: Date: 19 Dec 2003 15:22:37 -0500 (EST) Bad line: Date: Friday, 19 Dec 2003 15:22:37 -0500 (EST) Bad line: Date: Fri, 19 December 2003 15:22:37 -0500 (EST) Bad line: Date: Fri, 19 Dec 3102 15:22:37 -0500 (EST) Bad line: Date: Fri, 19 Dec 2003 95:22:37 -0500 (EST) 4) A function to test for the "Message-Id:" header line. If you find "Message-Id:" at the beginning, match the following: - any number of blanks (spaces) - the character "<" - one or more non-blank, non-"<", non-">" characters - the character ">" - any number of blanks (spaces) - the end of the line Good line: Message-Id: <199610232228.PAA13136@haus.efn.org> Bad line: Message-Id: 199610232228.PAA13136@haus.efn.org Bad line: Message-Id: <199610232228.PAA13136 haus.efn.org> Bad line: Message-Id: <199610232228haus.efn.org> 5) A function to test for the "Subject:" header line. If you find "Subject:", make sure there is at least one non-blank character somewhere on the rest of the line. Good line: Subject: hi Bad line: Subject: Bad line: Subject : hi 6) A function to validate the first line of the file as being a correct "From " (note the trailing blank) line. Match the following at the start of the first line of the file: - "From " at the beginning (5 chars - note the trailing blank) - zero or more additional blanks - one or more non-blank characters (the sending userid) - one or more spaces - a 3-letter day of the week (use the same matching as you did for "Date:") - one or more spaces - a 3-letter month name (use the same matching as you did for "Date:") - one or more spaces - one or two digits (day of month) - one or more spaces - a digit 0, 1, or 2 - another digit - a colon (":") - a digit 0 through 5 - another digit - a colon (":") - a digit 0 through 5 - another digit - one or more spaces - a digit 1 or 2 - three more digits - zero or more blanks - end of line Make sure you are testing only the first line of the input file! If the first line is not valid, also echo it in your error message. Good line: From alumlist@alumni.uwaterloo.ca Thu Jan 9 14:21:10 2003 Good line: From idallen Thu Jan 9 14:21:10 2003 Bad line: From: idallen Thu Jan 9 14:21:10 2003 Bad line: From idallen Jan Thu 9 14:21:10 2003 Bad line: From idallen Thu Jan 9 14:21 2003 Bad line: From idallen Thu Jan 9 14:21:10 3333 Bad line: From idallen Jan Thu 9 34:21:10 2003 Using all the testing functions: Write a series of IF statements that uses each of your functions and counts the error if the function returns a non-zero status. (See my example IF statement, above.) At the end of the script, after you have made all of the above tests and counted all the errors that might exist, exit the script with the following exit status: - exit 0 if the argument file passed all of the tests without error - exit 1 if the argument was not a non-empty, readable, non-executable file - exit 2 if more than one argument was given to the script - if the argument file had errors, exit with a value that is the negative of the number of errors e.g. exit -5 for a count of 5 errors. (Note that the bash shell will not report this exit status as a negative number - it will report the exit status as 256 minus the number of errors.) Here is a sample valid input email file named "input.txt" (the real file has no leading blanks on any line): From Maiser@algonquincollege.com Thu Feb 27 10:00:06 2003 To: "Computer Topics List" From: Xavier Date: Thu 27 Feb 2003 09:45:54 -0500 Message-ID: Subject: Linux test message Example runs of this script might look like this (the comment lines beside the exit codes were added to explain the return value): $ ./3B_header_validator.sh input.txt $ echo $? 0 # no errors in sample input $ ./3B_header_validator.sh a b c ./3B_header_validator.sh: only 1 path argument allowed, you entered 3 (a b c) $ echo $? 2 # invalid calling syntax $ ./3B_header_validator.sh /dev/null ./3B_header_validator.sh: /dev/null is not a file $ echo $? 1 # non-file argument supplied $ ./3B_header_validator.sh badmsgid.sh ./3B_header_validator.sh: 'badmsgid.sh': unrecognized format Message-Id in: MessageId: <09809.09287304@ncf.ca $ echo $? 255 # one error means exit status -1 -> 256-1 = 255 $ ./3B_header_validator.sh badmsgidfirst.sh ./3B_header_validator.sh: 'badmsgidfirst.sh': unrecognized format Message-Id in: MessageId: <09809.09287304@ncf.ca ./3B_header_validator.sh: 'badmsgidfirst.sh': incorrect first line of file: From: idallen Jan Thu 9 14:21:10 2003 $ echo $? 254 # 2 errors means exit status -2 -> 256-2 = 254 $ echo not much >j $ ./3B_header_validator.sh j ./3B_header_validator.sh: 'j': Missing To: header line ./3B_header_validator.sh: 'j': Missing From: header line ./3B_header_validator.sh: 'j': Missing Date: header line ./3B_header_validator.sh: 'j': Missing Message-Id: header lin ./3B_header_validator.sh: 'j': Missing Subject: header line ./3B_header_validator.sh: 'j': incorrect first line of file: not much $ echo $? 250 # 6 errors means exit status -6 -> 256-6 = 250 The above sample test runs are not exhaustive. Test all of your functions to make sure they work with various input files containing various errors. The script file "validate.sh" (under Notes) may also be useful to you as an example of writing and using these testing functions. --) Write this executable script named "3B_multi_validator.sh" (1 mark) Loop for all command line arguments and call the above 3B_header_validator.sh script with each argument. (Do not bother validating the pathnames before passing them to the 3B_header_validator.sh script, since that script already does argument validation for you.) Process and total up the exit statuses of the validator script after each execution to use for the statistics, below: After having processed all the command line arguments with the validator script, print your collected statistics on how the arguments were processed by the script: - a count of how many files were processed with no errors found (how many times the validator script exited with code 0) - a count of how many pathnames were found invalid (unreadable, etc.) (how many times the validator script exited with code 1) - a count of how many files had any errors found (how many times the validator script exited with a code between 250 and 255) - a count of the total number of errors in all the files processed (recall that the 3B_header_validator.sh script exits with the negative of the number of errors [if there are errors], and that to the bash shell this negative number appears to be 256 minus the number of errors - calculate and add up the total number of errors) Note that you can write and test this script even if you don't have a working 3B_header_validator.sh script. You can create a "dummy" script that does nothing but exit with the desired return status, and call that instead of 3B_header_validator.sh.