% CST8177 Assignment 03 - Regular Expressions and Simple Shell Scripts % Wenjuan Jiang, Ian! D. Allen -- -- [www.idallen.com] % Fall 2014 - September to December 2014 - Updated 2015-09-06 00:38 EDT - [Course Home Page] - [Course Outline] - [All Weeks] - [Plain Text] Due Date and Deliverables ========================= > **Do not print this assignment on paper!** > > - On paper, you will miss updates, corrections, and hints added to the > online version. > - On paper, you cannot follow any of the [hyperlink URLs] that lead you > to hints and course notes relevant to answering a question. > - On paper, scrolling text boxes will be cut off and not print properly. - **Due Date**: `23h59 (11:59pm) Friday October 3, 2014 (end of Week 5)` - You have two weeks to do this assignment, but your next assignment will be available in one week and will overlap this assignment. Start work on this now! Don't delay! - Late assignments or wrong file names may not be marked. Please be accurate and punctual. - **Available online** - Version 1 -- 10:20 Thursday September 18, 2014 - **Prerequisites** - [CST8207 GNU/Linux Operating Systems I] - All Class Notes since the beginning of term. - All your previous [Assignments]. - An ability to **READ ALL THE WORDS** to work effectively. - **Deliverables** 1. One plain text file uploaded to Blackboard according to the steps in the [Checking Program] section below. 2. Directory structure and files created and left for marking on the [Course Linux Server] (**CLS**).\ **Do not delete any assignment work from the CLS until after the term is over!** Purpose of this Assignment ========================== > **Do not print this assignment on paper!** On paper, you cannot follow any > of the hyperlink URLs that lead you to hints and course notes relevant to > answering a question. 1. Practise with regular expressions of varying complexity 2. Create simple shell scripts 3. Practise with a text editor Introduction and Overview ========================= This is an overview of how you are expected to complete this assignment. Read all the words before you start working. For full marks, follow these directions exactly. 1. Complete the **Tasks** listed below. 2. Verify your own work before running the **Checking Program**. 3. Run the **Checking Program** to help you find errors. 4. Submit the output of the **Checking Program** to Blackboard before the due date. 5. **READ ALL THE WORDS** to work effectively and not waste time. You are given a file of somewhat random text, and a set of descriptions of sets of lines in that file. For each description, you are to produce a command with a regular expression that will select the described set of lines. You will initially test your regular expressions on the interactive shell command line, and when you are satisfied with each one, you will put the command you used into a shell script. You can use a [Checking Program] to check your work as you do the tasks. You can check your work with the checking program as often as you like before you submit your final mark. (Some tasks sections below require you to finish the whole section before running the checking program; you may not always be able to run the checking program successfully after every single task step.) > Since I also do manual marking of student assignments, your final mark may > not be the same as the mark submitted using the current version of the > **Checking Program**. I do not guarantee that any version of the **Checking > Program** will find all the errors in your work. Complete your assignments > according to the specifications, not according to the incomplete set of the > mistakes detected by the **Checking Program**. Save your work -------------- You will create file system structure in your HOME directory on the CLS, with various directories, files, and links. When you are finished the tasks, leave these files, directories, and links in place as part of your deliverables on the CLS. **Do not delete any assignment work until after the term is over!** Assignments may be re-marked at any time; you must have your term work available right until term end. The Source Directory -------------------- All references to the "Source Directory" below are to the CLS directory `~idallen/cst8177/14f/assignment03/` and that name starts with a *tilde* character `~` followed by a userid with no intervening slash. The leading tilde indicates to the shell that the pathname starts with the HOME directory of the account `idallen` (seven letters). Searching the course notes -------------------------- The prevous term's course notes are available on the Internet here: [CST8207 GNU/Linux Operating Systems I]. All the notes files are also on the CLS. You can learn about how to read and search these files using the command line on the CLS under the heading *Copies of the CST8207 course notes* near the bottom of the page [Course Linux Server]. Tasks ===== - Do the following tasks in order, from top to bottom. - These tasks must be done in your account on the [Course Linux Server]. - **READ ALL THE WORDS!** and do not skip steps. - Run the [Checking Program] to grade your work, then upload the file to Blackboard. - Your instructor will also mark on the due date the work you do in your account on the CLS. Leave all your work on the CLS and do not modify it. - **Do not delete any assignment work from the CLS until after the course is over.** Set Up -- The Base Directory on the CLS --------------------------------------- 1. Do a [Remote Login] to the [Course Linux Server] (**CLS**) from any existing computer, using the host name appropriate for whether you are on-campus or off-campus. **All work in this assignment must be done on the CLS.** 2. Make an `assignment03` directory in the same directory as you made `assignment02` in a [previous assignment]. > **This `assignment03` directory is the [Base Directory] for most pathnames > in this assignment. Store your files and answers in this Base Directory.** 3. Follow the instructions in the first two steps at the start of [Checking Program] below to create a working symbolic link to the executable **Checking Program**. 4. The input text file `test_input.txt` in the [Source Directory] contains many lines of text. Put a soft link to this input file in your [Base Directory]. Use the same name for the link. Check your work so far using the checking program symlink. Write Regular Expression Commands --------------------------------- Below, in the [Labelled Descriptions] section, you are given labelled descriptions of lines to find in the input text file `test_input.txt`. For each labelled description you will repeat these two steps (described in detail below): 1. On the command line, invent a `grep` command using a single **basic** (not extended) regular expression that will select and display only the described lines of text, and nothing more, from the input file. Do not use any options to `grep`, except possibly for the last question; see below. You do not need multiple expressions or any extended regular expressions or special expressions, except possibly for the last question. Use basic regular expressions. 2. Put the working `grep` command into its own shell script. Each set of lines to be found is labelled below with a **label**. The label is the first word in the section, followed by a colon. For example, the following example description is labelled `bar:` bar: lines that contain the word barbar Repeat the following steps for each of the labelled descriptions: ### Repeat these steps for each label 1. Make your current working directory the [Base Directory] (the directory containing the new symlink you made to the `test_input.txt` file) if it is not already so. 2. You must find lines in the `test_input.txt` file using a single `grep` command with a regular expression pattern. Type directly at the command line your initial attempt at a `grep` command that finds the lines, and view the result on your screen. For the example given above with the label `bar`, a `grep` command you might try to match lines containing `barbar` could be: $ grep 'barbar' test_input.txt The correct answer output on your screen for each problem below will vary between a few and a few dozen lines, depending on the problem. No pipes are allowed. Use only a single `grep` command, imitating the above command format. No `grep` options or extended regular expressions are allowed except as explained in the last problem. 3. If you're not satisfied with the output you see, use up-arrow to retrieve the previous command, and make changes to the regular expression, then re-run the new command. Repeat the this step over and over on the interactive command line until you're satisfied with the output on your screen and want to check your answer. 4. To check your answer, use up-arrow to retrieve the command, and modify it to pipe the output of your command into the `wc` program, then do the same, changing `wc` to `sum`. Compare the output of `wc` and `sum` with the expected values output by the [Checking Program]. For the example given above with the label `bar`, the checking pipelines would be done like this, in this order: $ grep 'barbar' test_input.txt $ grep 'barbar' test_input.txt | wc $ grep 'barbar' test_input.txt | sum The `'barbar'` string is the quoted basic regular expression. 5. If the word count or checksum values differ from those expected values output by the [Checking Program], you need to fix your regular expression. Use up-arrow to retrieve the command, make your changes to the regular expression, and re-run the command until you get it right. > Do not save the output of the [Checking Program]; the test file may > change at any time to include new test cases, so the word count and > checksums may change at any time. 6. When you are satisfied with your answer as typed on the command line, use a text editor to create in your [Base Directory] an executable shell script whose name is the **label** name followed by an `.sh` extension, e.g. `bar.sh`. Copy the working `grep` command from the command line into the last line of the new shell script. Only put the `grep` command into the script, not any pipelines or checking. This executable script must run only your `grep` command. For the example given above with the label `bar`, the script name must be `bar.sh` in the [Base Directory]. The first few lines of every shell script must correspond exactly to the **Script Header** described in class. The last line of every script will be your `grep` command. Do not redirect or pipe the output of your command into anything inside the script -- the script should produce the correct lines of output from `test_input.txt` on your screen so that it can be checked. Do not put any lines into your script other than the **Script Header**, the single `grep` command line, and optional blank or comment lines. 7. You can also check the output of your script using the `wc` and `sum` commands, similar to the way you checked the original `grep` command. The script must output exactly the same lines as the original `grep` command that you put into it. The results should be identical: $ grep 'barbar' test_input.txt | wc $ ./bar.sh | wc $ grep 'barbar' test_input.txt | sum $ ./bar.sh | sum 8. Repeat the 8 steps in this section for each of the [Labelled Descriptions] below. NOTE: When it comes time to create your second and subsequent scripts, copy the previous script to the new label name rather than starting from scratch every time. Run the [Checking Program] to make sure you have copied the **Script Header** correctly. Do not put any lines into your script other than the **Script Header**, the single `grep` command line, and optional blank or comment lines. Your scripts must give the correct output word count and checksum results when searching in this `test_input.txt` test file. If the output is incorrect, you will be told what the correct values should be in the error message. Do not save this message -- the testing file may change at any time during the assignment and your scripts must still match the correct lines. Write the basic regular expressions to match the given pattern specifications, not to match the particular set of lines in the given test file(s). I may come up with other test cases even after the due date of the assignment; your script loses marks if it fails these tests because it doesn't do what the specification says it must do. You may have to write your own test cases, to be sure you got it right. I've also set up the checking program to detect failure to [protect special characters] from shell GLOB expansion. If your expression works in your account but not when the checking script runs it, this may be your problem. You may also see "Permission denied" errors if this is the problem. Fix your script. Labelled Descriptions --------------------- Repeat the 8 steps of the [above section] for each of these labelled items below. None of these expressions except the very last one require any options to `grep`, nor multiple expressions, nor do they require any extended regular expressions. All except the last must be solved with no options and only one single basic regular expression. Definition: [Whitespace] : Spaces or space-like characters such as TABs, newlines, carriage-returns, form-feeds, etc. This is a distinct POSIX character class from **blanks**, which are only space and TAB. This assignment uses Whitespace, not blanks. All the points below have the following format: - *label*: *description of desired `grep` output from in file `test_input.txt`* Here are the names of the patterns (and scripts) you must create: 1. `upper`: lines containing at least one upper case alphabetic character. 2. `control`: lines containing at least one control character. (When checking your output, you can make control characters visible using the `-vT` options to the `cat` command, otherwise they won't show on your screen. Do *not* put the `cat` command in your script.) 3. `punct`: lines containing at least one punctuation character. 4. `blank`: blank lines. (A blank line contains only zero or more [Whitespace] characters and no other kinds of characters.) 5. `only_alpha`: non-empty lines containing only alphabetic characters. ("Non-empty" means there has to be at least one character.) 6. `only_digit`: non-empty lines containing only digits. 7. `only_alnum`: non-empty lines containing only alphanumeric characters. 8. `only_upper`: non-empty lines containing only upper case characters. 9. `no_white`: lines containing no [Whitespace] characters. Another way of saying this is: lines containing zero or more only non-Whitespace characters. 10. `no_num_white`: lines containing no [Whitespace] or digit characters. Another way of saying this is: lines containing zero or more only non-Whitespace non-digit characters. 11. `empty`: empty lines. (An empty line means nothing on the line, not even Whitespace characters. The line contains *no* characters.) 12. `plus`: lines containing at least one plus `+` character. 13. `question`: lines containing at least one question mark `?` character. 14. `backslash`: lines containing at least one backslash `\` character. 15. `caret`: lines containing at least one circumflex/caret `^` character. 16. `star`: lines containing at least one asterisk `*` character. 17. `dot`: lines containing at least one period `.` character. 18. `square`: lines containing at least one square bracket `[` or `]` character. 19. `begin_end`: lines that start with the exact five characters `begin` and that end with the exact three characters `end`. (Any other characters might appear between the `begin` and the `end`.) 20. `AB`: lines containing `A` and `B`, capitalized and in that order but not necessarily right next to each other. Another way of saying this is: lines containing a `B` following an `A`. 21. `first`: lines that start with optional [Whitespace], then the string `first`. 22. `capital`: lines that contain the string `Capital` where the initial letter `C` must be upper-case but the rest of the letters could be either case, e.g. `CAPTIAL`, `CaPiTaL`, etc.. 23. `first_last`: lines that start with the exact five characters `first` preceded by any amount of Whitespace and that end with the exact four characters `last` followed by any amount of Whitespace. (Any other characters might appear between the `first` and the `last`, but only optional Whitespace is allowed before `first` and after `last`.) (**Hint:** Another way of saying this: The line starts with optional Whitespace, followed by `first`, followed by anything, followed by `last`, followed by optional Whitespace, and then the end of the line.) 24. `phone`: lines that contain a seven-(or more)-digit number with one or more dashes between the group of three (or more) digits and the group of four (or more) digits. These should match: `555-1212`, `555555-----121212121212`, `x555-1212x`, `x555---1212x`, `x999555-1212x`, `x555-1212999x` `x999555-1212999x`, but these would not match: `555-121x`, `x55-1212`, `5551212` 25. `better_phone`: lines that contain a seven-digit number, surrounded before and after with non-digit characters, with one or more underscores, dashes, or periods between the third and fourth digits. These should match: `x555-1212x`, `x555.1212x`, `x555_-.1212x`, `x555--__..-_.1212x` but these would not match: `555555-----121212121212`, `x999555-1212x`, `x555-1212999x` `x999555-1212999x`, `555-121x`, `x55-1212`, `5551212` 26. `password`: lines containing `password` or `passwd`, with the `p` optionally capitalized. These would match: `Password`, `password`, `Passwd`, but these would not match `Pass`, `passwD`, `paSsword`, `passw`, or `passd`. **Hint:** There is a solution to this that uses an option to permit `grep` to use multiple search patterns, or you can use a single *extended* regular expression. This is the only question in which you may use an option or extended regexp. Check your work so far using the checking program symlink. > Do not save the output of the [Checking Program]; the test file may change > at any time to include new test cases, so the word count and checksums may > change at any time. When you are done ----------------- That is all the tasks you need to do. Check your work a final time using the [Checking Program] and save the output as described below. Submit your mark following the directions below. Checking, Marking, and Submitting your Work =========================================== **Summary:** Do some tasks, then run the checking program to verify your work as you go. You can run the checking program as often as you want. When you have the best mark, upload the marks file to Blackboard. > Since I also do manual marking of student assignments, your final mark may > not be the same as the mark submitted using the current version of the > [Checking Program]. I do not guarantee that any version of the [Checking > Program] will find all the errors in your work. Complete your assignments > according to the specifications, not according to the incomplete set of the > mistakes detected by the [Checking Program]. 1. There is a [Checking Program] named `assignment03check` in the [Source Directory] on the CLS. Create a [Symbolic Link] to this program named `check` under your new [Base Directory] on the CLS so that you can easily run the program to check your work and assign your work a mark on the CLS. Note: You can create a symbolic link to this executable program but you do not have permission to read or copy the program file. 2. Execute the above `check` program on the CLS using its symbolic link. (Review the [Search Path] notes if you forget how to run a program by pathname from the command line.) This program will check your work, assign you a mark, and display the output on your screen. (You may want to paginate the long output so you can read all of it.) You may run the `check` program as many times as you wish, to correct mistakes and get the best mark. **Some task sections require you to finish the whole section before running the checking program at the end; you may not always be able to run the checking program successfully after every single task step.** 3. When you are done with checking this assignment, and you like what you see on your screen, **redirect** the output of the [Checking Program] into the text file `assignment03.txt` under your [Base Directory] on the CLS. Use that *exact* name. Case (upper/lower case letters) matters. Be absolutely accurate, as if your marks depended on it. - Do not edit the output file. Submit it exactly as given. - Make sure the file actually contains the output of the checking program! - The file should contain near the bottom a line starting with: `YOUR MARK for` - Really! **MAKE SURE THE FILE HAS YOUR MARKS IN IT!** 4. Transfer the above `assignment03.txt` file from the CLS to your local computer and verify that the file still contains all the output from the checking program. Do not edit this file! No empty files, please! Edited or damaged files will not be marked. You may want to refer to your [File Transfer] notes. - Do not edit the output file. Submit it exactly as given. - Make sure the file actually contains the output of the checking program! - The file should contain near the bottom a line starting with: `YOUR MARK for` - Really! **MAKE SURE THE FILE HAS YOUR MARKS IN IT!** 5. Upload the `assignment03.txt` file from your local computer to the correct Assignment area on Blackboard (with the exact name) before the due date: 1. On your local computer use a web browser to log in to Blackboard and go to the Blackboard page for this course. 2. Go to the Blackboard *Assignments* area for the course, in the left side-bar menu, and find the current assignment. 3. Under *Assignments*, click on the underlined **assignment03** link for this assignment. a) If this is your first upload, the *Upload Assignment* page will open directly; skip the next sentence. b) If you have already uploaded previously, the *Review Submission History* page will be open and you must use the *Start New* button at the bottom of the page to get to the *Upload Assignment* page. 4. On the *Upload Assignment* page, scroll down and beside *Attach File* use *Browse My Computer* to find and attach your assignment file from your local computer. Make sure the assignment file has the correct name on your local computer before you attach it. 5. After you have attached the file on the *Upload Assignment* page, scroll down to the bottom of the page and use the *Submit* button to actually upload your attached assignment file to Blackboard. Use only *Attach File* on the *Upload Assignment* page. Do not enter any text into the *Text Submission* or *Comments* boxes on Blackboard; I do not read them. Use only the *Attach File* section followed by the *Submit* button. If you need to comment on any assignment submission, send me [email]. You can revise and upload the file more than once using the *Start New* button on the *Review Submission History* page to open a new *Upload Assignment* page. I only look at the most recent submission. You must upload the file with the correct name from your local computer; you cannot correct the name as you upload it to Blackboard. 6. **Verify that Blackboard has received your submission**: After using the *Submit* button, you will see a page titled *Review Submission History* that will show all your uploaded submissions for this assignment. Each of your submissions is called an *Attempt* on this page. A drop-down list of all your attempts is available. a) Verify that your latest *Attempt* has the correct 16-character, lower-case file name under the *SUBMISSION* heading. b) The one file name must be the *only* thing under the *SUBMISSION* heading. Only the one file name is allowed. c) No *COMMENTS* heading should be visible on the page. Do not enter any comments when you upload an assignment. d) **Save a screen capture** of the *Review Submission History* page on your local computer, showing the single uploaded file name listed under *SUBMISSION*. If you want to claim that you uploaded the file and Blackboard lost it, you will need this screen capture to prove that you actually uploaded the file. (To date, Blackboard has never lost an uploaded file.) You will also see the *Review Submission History* page any time you already have an assignment attempt uploaded and you click on the underlined **assignment03** link. You can use the *Start New* button on this page to re-upload your assignment as many times as you like. You cannot delete an assignment attempt, but you can always upload a new version. I only mark the latest version. 7. Your instructor may also mark files in your directory in your CLS account after the due date. Leave everything there on the CLS. **Do not delete any assignment work from the CLS until after the term is over!** - I do not accept any assignment submissions by email. Use only the Blackboard *Attach File*. No word processor documents. Plain Text only. - Use the *exact* file name given above. Upload only one single file of Linux-format plain text, not HTML, not RTF, not MSWord. No fonts, no word-processing. Linux plain text only. - **NO EMAIL, WORD PROCESSOR, PDF, RTF, or HTML DOCUMENTS ACCEPTED.** - No marks are awarded for submitting under the wrong assignment number or for using the wrong file name. Use the exact 16-character, lower-case name given above. - WARNING: Some inattentive students don't read all these words. Don't make that mistake! Be exact. **READ ALL THE WORDS. OH PLEASE, PLEASE, PLEASE READ ALL THE WORDS!** -- | Wenjuan Jiang, Todd Kelley, and | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [Course Home Page]: .. [Course Outline]: course_outline.pdf [All Weeks]: indexcgi.cgi [Plain Text]: assignment03.txt [hyperlink URLs]: indexcgi.cgi#XImportant_Notes__alphabetical_order_ [CST8207 GNU/Linux Operating Systems I]: ../../../cst8207/14w [Assignments]: indexcgi.cgi#XAssignments [Checking Program]: #checking-marking-and-submitting-your-work [Course Linux Server]: ../../../cst8207/14f/notes/070_course_linux_server.html [Remote Login]: ../../../cst8207/14f/notes/110_remote_login.html [previous assignment]: assignment02.html#set-up-the-base-directory-on-the-cls [Base Directory]: #set-up-the-base-directory-on-the-cls [Source Directory]: #the-source-directory [Labelled Descriptions]: #labelled-descriptions [protect special characters]: ../../../cst8207/14w/notes/440_quotes.html [above section]: #repeat-these-steps-for-each-label [Whitespace]: https://en.wikipedia.org/wiki/Whitespace_character [Symbolic Link]: ../../../cst8207/14w/notes/460_symbolic_links.html [Search Path]: ../../../cst8207/14w/notes/400_search_path.html [File Transfer]: ../../../cst8207/14f/notes/015_file_transfer.html [email]: mailto:idallen@idallen.ca [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/