Winter 2014 - January to April 2014 - Updated 2014-03-20 20:22 EDT
Do not print this assignment on paper!
- On paper, you will miss updates, corrections, and hints added to the online version.
- On paper, you cannot follow any of the hyperlink URLs that lead you to hints and course notes relevant to answering a question.
- On paper, scrolling text boxes will be cut off and not print properly.
23h59 (11:59pm) Saturday February 8, 2014 (end of Week 5)
Do not print this assignment on paper! On paper, you cannot follow any of the hyperlink URLs that lead you to hints and course notes relevant to answering a question.
This is an overview of how you are expected to complete this assignment. Read all the words before you start working.
You are given a file of somewhat random text, and a set of descriptions
of sets of lines in that file. For each description, you are to produce
a regular expression that will match the described set of lines. You will
initially test your regular expressions on the command line, and when you
are satisfied with each one, you will put the grep
command in a shell
script. A Checking Program is available to check your work as you go.
The following tasks (except the first three, which should be done once) are to be repeated for each description.
When you are finished the tasks, leave the files and directories in place as part of your deliverables. Do not delete any assignment work until after the term is over! Assignments may be re-marked at any time; you must have your term work available right until term end.
The prevous term’s course notes are available on the Internet here: CST8207 GNU/Linux Operating Systems I. All the notes files are also on the CLS. You can learn about how to read and search these files using the command line on the CLS under the heading Copies of the CST8207 course notes near the bottom of the page Course Linux Server.
Since I also do manual marking of student assignments, your final mark may not be the same as the mark submitted using the current version of the Checking Program. I do not guarantee that any version of the Checking Program will find all the errors in your work. Complete your assignments according to the specifications, not according to the incomplete set of the mistakes detected by the Checking Program.
All references to the “Source Directory” below are to the CLS directory
~idallen/cst8177/14w/assignment03/
and that name starts with a
tilde character ~
followed by a userid with no intervening slash.
The leading tilde indicates to the shell that the pathname starts with
the HOME directory of the account idallen
(seven letters).
Do a Remote Login to the Course Linux Server (CLS) from any existing computer, using the name appropriate for whether you are on-campus or off-campus. All work in this assignment must be done on the CLS.
Make an assignment03
directory in the same directory as you made
assignment02
in a previous assignment.
This directory is the base directory for most pathnames in this assignment. Store your files and answers here.
The file foo.txt
in the Source Directory contains many lines of text.
Put a soft link to this file in your new assignment03
directory.
Use the same name for the link.
In your new assignment03
directory create a soft link named check
to the checking program assignment03check
from the Source Directory.
Below, in the Labelled Descriptions section, you are given labelled
descriptions of lines to find in the file foo.txt
. For each labelled
description you will repeat these two steps (described in detail below):
grep
command using a single basic regular
expression that will match the described lines of text (and nothing
more). Do not use any options to grep
(except possibly for the
last question). You do not need multiple expressions or any extended
regular expressions or special expressions except possibly for the
last question. Use basic regular expressions.grep
command into its own shell script.Each set of lines to be found is labelled below with a label.
The label is the first word in the section, followed by a colon.
For example, the following example description is labelled bar:
bar: lines that contain the word barbar
Repeat the following steps for each of the labelled descriptions:
Make your current working directory the base directory (the
directory containing the new link you made to the foo.txt
file)
if it is not already so.
You must find lines in the foo.txt
file using a single grep
command.
Type directly at the command line your initial attempt at a
grep
command that finds the lines, and view the result on your
screen. The correct answer in all cases will result in less than
50 lines of text on your screen.
No pipes are allowed. Use only a single grep
command.
For the example given above with the label bar
, a grep
command
you might try could be:
$ grep 'barbar' foo.txt
If you’re not satisfied with your initial attempt, use up-arrow to retrieve the previous command, and make changes to the regular expression, then re-run the new command. Repeat the this step until you’re satisfied with the output on your screen and want to check your answer.
To check your answer, use up-arrow to retrieve the command, and
modify it to pipe the output of your command into the wc
program,
then do the same, changing wc
to sum
. Compare the output of
wc
and sum
with the values output by the Checking Program.
For the example given above with the label bar
, the checking
pipelines would be done like this, in this order:
$ grep 'barbar' foo.txt
$ grep 'barbar' foo.txt | wc
$ grep 'barbar' foo.txt | sum
The 'barbar'
string is the quoted regular expression.
If the word count or checksum values differ, you need to change your regular expression. Use up-arrow to retrieve the command, make your changes to the regular expression, and re-run the command.
When you are satisfied with your answer as typed on the command
line, use a text editor to create in your assignment03
directory an
executable shell script whose name is the label name followed by
.sh
that simply runs your grep
command without the piping of its
output to the check program. Just copy the grep
command from the
command line into the last line of the new shell script.
For the example given above with the label bar
, the script name
must be bar.sh
in the assignment03
directory.
The first few lines of every shell script must correspond exactly to the Script Header described in class.
The last line of every script will be your grep
command. Do not
redirect or pipe the output of your command into anything – the
script should produce the correct lines of output from foo.txt
on your screen so that it can be checked.
Do not put any lines into your script other than the Script
Header, the single grep
command line, and optional blank or
comment lines.
You can also check the output of your script using the wc
and
sum
commands, similar to the way you checked the original grep
command. The script must output exactly the same lines as the
original grep
command that you put into it. The results should
be identical:
$ grep 'barbar' foo.txt | wc
$ ./bar.sh | wc
$ grep 'barbar' foo.txt | sum
$ ./bar.sh | sum
Repeat the 8 steps in this section for each of the Labelled Descriptions below.
NOTE: When it comes time to create your second and subsequent scripts, copy the previous script to the new label name rather than starting from scratch every time. Run the Checking Program to make sure you have copied the Script Header correctly.
Do not put any lines into your script other than the Script Header,
the single grep
command line, and optional blank or comment lines.
Repeat the 8 steps of the above section
for each of these labelled items below. None of these expressions
except the very last one require any options to grep
, nor multiple
expressions, nor do they require any extended regular expressions.
All except the last must be solved with no options and only one single basic regular expression.
label
: description of desired grep
output from in file foo.txt
upper
: lines containing at least one upper case alphabetic character.
control
: lines containing at least one control character.
(When checking your output, you can make control characters visible
using the -vT
options to the cat
command, otherwise they won’t
show on your screen. Do not put the cat
command in your script.)
punct
: lines containing at least one punctuation character.
blank
: blank lines. (A blank line contains only zero or more
Whitespace characters and no other kinds of characters.)
only_alpha
: non-empty lines containing only alphabetic characters.
(“Non-empty” means there has to be at least one character.)
only_digit
: non-empty lines containing only digits.
only_alnum
: non-empty lines containing only alphanumeric characters.
only_upper
: non-empty lines containing only upper case characters.
no_white
: lines containing no Whitespace characters.
Another way of saying this is: lines containing zero or more only
non-Whitespace characters.
no_num_white
: lines containing no Whitespace or digit characters.
Another way of saying this is: lines containing zero or more only
non-Whitespace non-digit characters.
empty
: empty lines. (An empty line means nothing on the line, not
even Whitespace characters. The line contains no characters.)
plus
: lines containing at least one plus +
character.
question
: lines containing at least one question mark ?
character.
backslash
: lines containing at least one backslash \
character.
caret
: lines containing at least one circumflex/caret ^
character.
star
: lines containing at least one asterisk *
character.
dot
: lines containing at least one period .
character.
square
: lines containing at least one square bracket [
or ]
character.
begin_end
: lines that start with the exact five characters begin
and that end with the exact three characters end
. (Any other
characters might appear between the begin
and the end
.)
AB
: lines containing A
and B
, capitalized and in that order but
not necessarily right next to each other. Another way of saying
this is: lines containing a B
following an A
.
first
: lines that start with optional Whitespace, then the
string first
.
capital
: lines that contain the string Capital
where the initial
letter C
must be upper-case but the rest of the letters could be
either case, e.g. CAPTIAL
, CaPiTaL
, etc..
first_last
: lines that start with the exact five characters first
preceded by any amount of Whitespace and that end with the exact four
characters last
followed by any amount of Whitespace. (Any other
characters might appear between the first
and the last
, but
only optional Whitespace is allowed before first
and after last
.)
(Hint: Another way of saying this: The line starts with optional
Whitespace, followed by first
, followed by anything, followed
by last
, followed by optional Whitespace, and then the end of
the line.)
phone
: lines that contain a seven-(or more)-digit number with
one or more dashes between the group of three (or more) digits
and the group of four (or more) digits. These should match:
555-1212
, 555555-----121212121212
, x555-1212x
, x555---1212x
,
x999555-1212x
, x555-1212999x
x999555-1212999x
, but these would
not match: 555-121x
, x55-1212
, 5551212
better_phone
: lines that contain a seven-digit number, surrounded
before and after with non-digit characters, with one or
more underscores, dashes, or periods between the third and
fourth digits. These should match: x555-1212x
, x555.1212x
,
x555_-.1212x
, x555--__..-_.1212x
but these would not match:
555555-----121212121212
, x999555-1212x
, x555-1212999x
x999555-1212999x
, 555-121x
, x55-1212
, 5551212
password
: lines containing password
or passwd
, with the p
optionally capitalized. These would match: Password
, password
,
Passwd
, but these would not match Pass
, passwD
, paSsword
,
passw
, or passd
. (Hint: There is a solution to this that
permits grep
to use multiple search patterns, or you can use a
single extended regular expression. This is the only question in
which you may use an option or extended regexp.)
I’ve added a second test file bar.txt
to the Source Directory
containing additional test material designed to find more problems in
your regular expressions that weren’t detected by the foo.txt
file.
Error messages from the checking program will tell you which file
is being read to detect the errors in your script.
Your scripts must also give the correct output word count and checksum
results when searching in this bar.txt
test file. If the output is
incorrect, you will be told what the correct values should be in the
error message. Do not save this message - the bar.txt
file may change
at any time and your scripts must still match the correct lines.
Write the basic regular expressions to match the given pattern specifications, not to match the particular set of lines in the given test files. I may come up with other test cases even after the due date of the assignment; your script loses marks if it fails these tests because it doesn’t do what the specification says it must do. You may have to write your own test cases, to be sure you got it right.
I’ve also set up the checking program to detect failure to protect special characters from shell GLOB expansion. If your expression works in your account but not when the checking script runs it, this may be your problem. You may also see “Permission denied” errors if this is the problem. Fix your script.
That is all the tasks you need to do.
Check your work a final time using the Checking Program and save the output as described below. Submit your mark following the directions below.
Summary: Do some tasks, then run the checking program to verify your work as you go. You can run the checking program as often as you want. When you have the best mark, upload the marks file to Blackboard.
There is a Checking Program named assignment03check
in the
Source Directory on the CLS. Create a [Symbolic Link] to this
program named check
under your new assignment03
directory so
that you can easily run the program to check your work and assign
your work a mark. Note: You can create a symbolic link to this
executable program but you do not have permission to read or copy
the program file.
Execute the above “check” program using its new symbolic link. (Review the Search Path notes if you forget how to run a program by pathname from the command line.) This program will check your work, assign you a mark, and display the output on your screen. (You may want to paginate the long output so you can read all of it.)
You may run the “check” program as many times as you wish, to correct mistakes and get the best mark. Some task sections require you to finish the whole section before running the checking program at the end; you may not always be able to run the checking program successfully after every single task step.
When you are done with checking this assignment, and you like what
you see on your screen, redirect the output of the Checking Program
into the text file assignment03.txt
under your assignment03
directory. Use the exact name assignment03.txt
in your
assignment03
directory. Case (upper/lower case letters) matters.
Be absolutely accurate, as if your marks depended on it. Do not
edit the file. Make sure the file actually contains the output of
the checking program!
Transfer the above assignment03.txt
file from the CLS to your local
computer and verify that the file still contains all the output from
the checking program. Do not edit this file! No empty files, please!
Edited or damaged files will not be marked. You may want to refer
to your File Transfer notes.
Submit the assignment03.txt
file under the correct Assignment
area on Blackboard (with the exact name) before the due date.
Upload the file via the assignment03 “Upload Assignment” facility
in Blackboard: click on the underlined assignment03 link in
Blackboard. Use “Attach File” and “Submit” to upload your
plain text file.
No word-processor documents. Do not send email. Use only “Attach File”. Do not enter any text into the Submission or Comments boxes on Blackboard; I do not read them. Use only the “Attach File” section followed by the Submit button. (If you want to send me comments about your assignment, use email.)
Your instructor may also mark the assignment03
directory in your
CLS account after the due date. Leave everything there on the CLS.
Do not delete any assignment work from the CLS until after the
term is over!
Use the exact file name given above. Upload only one single file of plain text, not HTML, not MSWord. No fonts, no word-processing. Plain text only.
Did I mention that the format is plain text (suitable for VIM/Nano/Pico/Gedit or Notepad)?
NO EMAIL, WORD PROCESSOR, PDF, RTF, or HTML DOCUMENTS ACCEPTED.
No marks are awarded for submitting under the wrong assignment number or for using the wrong file name. Use the exact name given above.
WARNING: Some inattentive students don’t read all these words. Don’t make that mistake! Be exact.
READ ALL THE WORDS. OH PLEASE, PLEASE, PLEASE READ ALL THE WORDS!