==========================
CST8129 Term Assignment #2-B (Make-Up)
==========================
-IAN! idallen@ncf.ca

Due: 12:00 (12 noon) Monday February 10, 2003

Marks: 8%   Late penalty: 100% per day
Purpose:
 - practice writing real shell scripts using Regular Expressions
   (material in Chapters 2, 3, 8, 9)
Hand in format: online submission only - no paper, no diskettes

All scripts described below must be written to conform to the script
writing checklist: script_checklist.txt and to the script style given
in: script_style.txt.  All user input (command line arguments or input
via "read") must be fully validated before being used in expressions.
Do not process bad input!

Echo user input (including command line arguments given) back to the user.
This is usually a good idea both for debugging your script and giving
the user feedback on what data the script is actually processing.

Avoid Linux-only commands and command options.  The same script should
work without modification on both Linux and ACADUNIX, where possible.
(In particular, do *not* use Bash 2.x shell syntax!)

Test your scripts.  The sample inputs and output shown below are not a
complete test suite.  I will try to find test cases using glob patterns
and blanks that will make your scripts abort or misbehave.

Scripts without *useful* block comments will be severely penalized.
(See the file script_style.txt for a description of good comment style.)

------------------
Hand in directory:
------------------

Completed scripts must have permissions "read-write-execute-only" for you,
"read-only" permissions for group, no permissions for other people.

The following directory is ready to receive your completed scripts:

    ~alleni/cst/assignment02-B/xxxxnnnn/

where xxxxnnnn is your Algonquin userid (e.g. abcd0001).

When you have completed a script, copy it into the above directory:

    cp myscript.sh ~alleni/cst/assignment02-B/xxxxnnnn/myscript.sh

Replace "myscript.sh" with the actual script name (given below).
Files with the wrong name or wrong Unix permissions will be penalized.

--------------------
Write these scripts:
--------------------

--) Write this executable script named "3B_header_validator.sh" (7 marks)

    Syntax:    $0 [ pathname ]

    Purpose: Write a script to validate some aspects of a properly-written
    email message.  The script you write will expect exactly one
    command-line pathname argument that will be a mail message file
    to process.  Prompt for and read the argument if it is missing.
    Print an error message and exit with status 2 if there is more than
    one argument given on the command line.

    Check (validate) the pathname argument before trying to read
    and process it; if the pathname is not a non-empty, readable,
    not-executable, plain file, issue an error message and exit the
    script with status 1.  Do not process a bad pathname.

    Perform a series of tests on this file, as specified below.  Write
    shell functions to perform each test.  (An example is given below.)

    Each shell function should take an argument that is the file pathname
    of the mail message that is to be tested.  If the test fails, the
    function will print an error message and return a non-zero status.
    (Do not exit the script!)  Each function must return status 0 if
    its particular test succeeds and non-zero otherwise.

    For example, you will write the following one-argument testing
    function based on this description:

    1)  This testing function looks for the "To:" header line that
        should be in the top 30 lines of the file whose name is given
        as the first argument to the function.  If the header line is
        found, the function validates it further.
        
        To select the "To:" line, we select the first 30 lines of the
        mail message file (first argument) and run the lines through
        the following egrep extended regexp:

           - start at the beginning of the line
           - match "To:"
           - match at least one blank (space)
           - match at least three non-blank characters

        If egrep finds the line based on the above regular expression,
        the function will do the following further tests on the line found:

           - start at the beginning of the line
           - match "To:"
           - allow any number of any characters except "@"
           - match a letter, digit, "_" or "-"
           - match a single "@"
           - match a letter, digit, "_" or "-"
           - allow any number of any characters except "@"
           - match the end of line

        Return status 0 if all the tests pass, non-zero otherwise.

    Here is the code that you would write to implement the above testing
    function (this code comes directly from the above description):

        TestToHeaderLine () {
            # Select the first 30 lines of the file and use egrep on them.
            # Put the egrep regexp into a variable so we can use it twice
            # without writing it twice - do not duplicate code.
            # Save the output of the egrep in variable $line for later use.
            #
            regexp1='^To:'
            regexp2=" +[^ ][^ ][^ ]"  # three non-blank chars
            line=$( head -30 "$1" | egrep "$regexp1$regexp2" )

            # See if the egrep found the line (test for zero size string).
            #
            if [ -z "$line" ] ; then
                echo 1>&2 "$0: '$1': Missing To: header line"
                return 1        # line not found - return bad status (not exit)
            fi

            # We get here if the egrep pattern did find the header line.
            # The line we found is stored in shell variable "$line".
            # Do further tests on the line:
            
            # See if the line we found has at least two characters
            # surrounding the expected "@" character.
            # This re-uses the same regexp pattern from above and adds to it.
            # Note the correct use of double/single quoting in the pattern.
            # Note that we echo the $line as part of the error message.
            # You must put "-" first in a character class to match "-".
            #
            goodch='[-a-zA-Z0-9_]'
            try=$( echo "$line" | egrep "$regexp1[^@]*${goodch}@${goodch}[^@]*"'$' )
            if [ -z "$try" ] ; then
                echo 1>&2 "$0: '$1': Unrecognized format To: line: $line"
                return 2        # return bad status (do not exit)
            fi

            return 0  # no errors - must be a valid line - return good status
        }

    All the testing functions will follow the above order of operation:
    First, the function must try to find the basic keywords used in
    the line for which it is looking.  Second, perform some validations
    on the text that is supposed to follow the keywords.  Each testing
    function prints an error message and returns a non-zero status when
    it detects an error.  (Do not exit the script on error!)

    You would write the above function and then use it in an IF statement
    in your script as follows (assuming the argument pathname is in
    the variable $file):

        if ! TestToHeaderLine "$file" ; then
            ... insert code to count the error here ...
        fi

    Your finished program will start with a set of function definitions,
    each preceded by a block comment describing the purpose of the
    function.  After all the function definitions will come the same
    number of IF statements (similar to the one above).  Each IF statement
    will execute one of the functions and check its return status.

    Below are descriptions of more testing functions that you must
    write.  Each function should search the mail message file for a line
    containing the the given pattern and, if found, validate the rest
    of the line.  Be precise - "To:*" would be an incorrect regexp for
    use in the above function; because, it matches "To::::", which is
    not in the specification for the function.   Match exactly what is
    described in the specifications for each function.

    Define each testing function before you use it in the script.  The top
    part of your script will be all your function definitions; the bottom
    part will be IF statements, similar to the one shown above, each using
    one of the function definitions on the existing pathname argument.
    (Do not process nonexistent arguments!)  Every time a testing function
    returns a non-zero status, count it as an error.  (Do not exit the
    script after a testing function - just count each error and move on
    to the next IF statement that uses the next testing function.)

    You are to write the following testing functions, one at a time, and
    add them to the script.  Write each function, make sure it works,
    and then add the next one.  Start by using the TestToHeaderLine
    function code and IF statement given above, then add more functions,
    one at a time.  Test your script after each new function.

    Copy the TestToHeaderLine function code given above and modify
    it to work for each new function that you write.  Follow the same
    two-part order of operation (described earlier) in each function
    that you write.

    Here are the descriptions of the testing and validation functions.
    Write one function per numbered test below; invent and use your own
    good function names:

    1)  A function to test for and validate the "To:" header line.
        (See the description and TestToHeaderLine code already given above.)

        Good line: To: (Ian! D. Allen) idallen@ncf.ca
        Good line: To: Ian! D. Allen <idallen@ncf.ca>
        Bad line:  To: idallen  @ncf.ca
        Bad line:  To: idallen@ncf.ca@ncf.ca
        Bad line:  To : idallen@ncf.ca

    2)  A function to test for and validate the "From:" header line.
        Apply the same tests as for the "To:" line.  Also make sure
        that the email address comes from Algonquin College.  (Do not
        accept an email address from any non-Algonquin addresses.)
        Note that all header lines must begin at the start of the line.

    3)  A function to test for the "Date:" header line.
        If the header line "Date:" is found, match the rest of the line
        (after "Date:") against the following extended regexp:

          - any number of blanks (spaces)
          - an optional day of the week field (three parts):
            + a 3 letter day of the week
              (Mon or Tue or Wed or Thu or Fri or Sat or Sun)
            + an optional comma
            + one or more spaces
          - one or two digits (day of month)
          - one or more spaces
          - a 3 letter month name
            (Jan or Feb or Mar or ... or Nov or Dec)
          - one or more spaces
          - a digit 1 or 2
          - three more digits
          - one or more spaces
          - a digit 0, 1, or 2
          - another digit
          - a colon (":")
          - a digit 0 through 5
          - another digit
          - a colon (":")
          - a digit 0 through 5
          - another digit

        Good line: Date: Fri, 19 Dec 2003 15:22:37 -0500 (EST)
        Good line: Date: Fri 19 Dec 2003 15:22:37 -0500 (EST)
        Good line: Date: 19 Dec 2003 15:22:37 -0500 (EST)
        Bad line:  Date: Friday, 19 Dec 2003 15:22:37 -0500 (EST)
        Bad line:  Date: Fri, 19 December 2003 15:22:37 -0500 (EST)
        Bad line:  Date: Fri, 19 Dec 3102 15:22:37 -0500 (EST)
        Bad line:  Date: Fri, 19 Dec 2003 95:22:37 -0500 (EST)

    4)  A function to test for the "Message-Id:" header line.
        If you find "Message-Id:" at the beginning, match the following:

          - any number of blanks (spaces)
          - the character "<"
          - one or more non-blank, non-"<", non-">" characters
          - the character ">"
          - any number of blanks (spaces)
          - the end of the line

        Good line: Message-Id: <199610232228.PAA13136@haus.efn.org>
        Bad line:  Message-Id: 199610232228.PAA13136@haus.efn.org
        Bad line:  Message-Id: <199610232228.PAA13136  haus.efn.org>
        Bad line:  Message-Id: <199610232228<PAA13136>haus.efn.org>

    5)  A function to test for the "Subject:" header line.
        If you find "Subject:", make sure there is at least one
        non-blank character somewhere on the rest of the line.

        Good line: Subject:           hi
        Bad line:  Subject:
        Bad line:  Subject : hi

    6)  A function to validate the first line of the file as being a
        correct "From " (note the trailing blank) line.  Match the
        following at the start of the first line of the file:

          - "From " at the beginning (5 chars - note the trailing blank)
          - zero or more additional blanks
          - one or more non-blank characters (the sending userid)
          - one or more spaces
          - a 3-letter day of the week
            (use the same matching as you did for "Date:")
          - one or more spaces
          - a 3-letter month name
            (use the same matching as you did for "Date:")
          - one or more spaces
          - one or two digits (day of month)
          - one or more spaces
          - a digit 0, 1, or 2
          - another digit
          - a colon (":")
          - a digit 0 through 5
          - another digit
          - a colon (":")
          - a digit 0 through 5
          - another digit
          - one or more spaces
          - a digit 1 or 2
          - three more digits
          - zero or more blanks
          - end of line

        Make sure you are testing only the first line of the input file!
        If the first line is not valid, also echo it in your error message.

        Good line: From alumlist@alumni.uwaterloo.ca  Thu Jan  9 14:21:10 2003
        Good line: From idallen Thu Jan 9 14:21:10 2003
        Bad line:  From: idallen Thu Jan 9 14:21:10 2003
        Bad line:  From idallen Jan Thu 9 14:21:10 2003
        Bad line:  From idallen Thu Jan 9 14:21 2003
        Bad line:  From idallen Thu Jan 9 14:21:10 3333
        Bad line:  From idallen Jan Thu 9 34:21:10 2003

    Using all the testing functions:

    Write a series of IF statements that uses each of your functions
    and counts the error if the function returns a non-zero status.
    (See my example IF statement, above.)

    At the end of the script, after you have made all of the above tests
    and counted all the errors that might exist, exit the script with
    the following exit status:

       - exit 0 if the argument file passed all of the tests without error
       - exit 1 if the argument was not a non-empty, readable,
         non-executable file
       - exit 2 if more than one argument was given to the script
       - if the argument file had errors, exit with a value that is the
         negative of the number of errors e.g. exit -5 for a count of
         5 errors.  (Note that the bash shell will not report this exit
         status as a negative number - it will report the exit status
         as 256 minus the number of errors.)

    Here is a sample valid input email file named "input.txt" (the real
    file has no leading blanks on any line):

    From Maiser@algonquincollege.com  Thu Feb 27 10:00:06 2003
    To: "Computer Topics List" <COMPUTERS@hotmail.com>
    From: Xavier <downx@algonquincollege.com>
    Date: Thu  27 Feb 2003 09:45:54 -0500
    Message-ID: <AA1256846F04@algonquincollege.com>
    Subject: Linux test message

    Example runs of this script might look like this (the comment lines
    beside the exit codes were added to explain the return value):

       $ ./3B_header_validator.sh input.txt
       $ echo $?
       0                   # no errors in sample input

       $ ./3B_header_validator.sh a b c
       ./3B_header_validator.sh: only 1 path argument allowed,
          you entered 3 (a b c)
       $ echo $?
       2                   # invalid calling syntax

       $ ./3B_header_validator.sh /dev/null
       ./3B_header_validator.sh: /dev/null is not a file
       $ echo $?
       1                   # non-file argument supplied

       $ ./3B_header_validator.sh badmsgid.sh
       ./3B_header_validator.sh: 'badmsgid.sh': unrecognized format
          Message-Id in: MessageId: <09809.09287304@ncf.ca
       $ echo $?
       255                 # one error means exit status -1 -> 256-1 = 255

       $ ./3B_header_validator.sh badmsgidfirst.sh
       ./3B_header_validator.sh: 'badmsgidfirst.sh': unrecognized format
          Message-Id in: MessageId: <09809.09287304@ncf.ca
       ./3B_header_validator.sh: 'badmsgidfirst.sh': incorrect first line
          of file: From: idallen Jan Thu 9 14:21:10 2003
       $ echo $?
       254                 # 2 errors means exit status -2 -> 256-2 = 254

       $ echo not much >j
       $ ./3B_header_validator.sh j
       ./3B_header_validator.sh: 'j': Missing To: header line
       ./3B_header_validator.sh: 'j': Missing From: header line
       ./3B_header_validator.sh: 'j': Missing Date: header line
       ./3B_header_validator.sh: 'j': Missing Message-Id: header lin
       ./3B_header_validator.sh: 'j': Missing Subject: header line
       ./3B_header_validator.sh: 'j': incorrect first line of file:
          not much
       $ echo $?
       250                 # 6 errors means exit status -6 -> 256-6 = 250

    The above sample test runs are not exhaustive.  Test all of your
    functions to make sure they work with various input files containing
    various errors.

    The script file "validate.sh" (under Notes) may also be useful to
    you as an example of writing and using these testing functions.

--) Write this executable script named "3B_multi_validator.sh" (1 mark)

    Loop for all command line arguments and call the above
    3B_header_validator.sh script with each argument.  (Do not
    bother validating the pathnames before passing them to the
    3B_header_validator.sh script, since that script already does argument
    validation for you.)  Process and total up the exit statuses of the
    validator script after each execution to use for the statistics, below:

    After having processed all the command line arguments with the
    validator script, print your collected statistics on how the arguments
    were processed by the script:

    - a count of how many files were processed with no errors found
      (how many times the validator script exited with code 0)
    - a count of how many pathnames were found invalid (unreadable, etc.)
      (how many times the validator script exited with code 1)
    - a count of how many files had any errors found (how many times the
      validator script exited with a code between 250 and 255)
    - a count of the total number of errors in all the files processed
      (recall that the 3B_header_validator.sh script exits with the
      negative of the number of errors [if there are errors], and that
      to the bash shell this negative number appears to be 256 minus the
      number of errors - calculate and add up the total number of errors)

    Note that you can write and test this script even if you don't have
    a working 3B_header_validator.sh script.  You can create a "dummy"
    script that does nothing but exit with the desired return status,
    and call that instead of 3B_header_validator.sh.