% CST8177 Assignment 03 - Regular Expressions and Simple Shell Scripts % Todd Kelley Richard Donnelly Ian! D. Allen - idallen@idallen.ca - www.idallen.com % Winter 2013 - January to April 2013 - Updated Sun Feb 10 02:45:32 EST 2013 Due Date and Deliverables ========================= - **Due Date**: `23h59 (11:59pm) Saturday February 9, 2013 (end of Week 5)` - Late assignments or wrong file names may not be marked. Be accurate. - **Available online**: - Version 01 - 04:00 Monday January 28, 2013 - Version 02 - 09:55 Monday January 28, 2013 - Version 03 - 16:45 Monday January 28, 2013 - Version 04 - 16:30 Wednesday January 30, 2013 (Added: Do not use options to `grep`.) - **Prerequisites**: - [CST8207 GNU/Linux Operating Systems I] - an ability to **READ ALL THE WORDS** to work effectively - **Deliverables**: 1. One text file uploaded to Blackboard according to the steps in the [Checking Program] section below. 2. Directory structure and files created and left for marking on the [Course Linux Server] (**CLS**).\ **Do not delete any assignment work from the CLS until after the term is over!** Purpose of this Assignment ========================== 1. Practise with regular expressions of varying complexity 2. Create simple shell scripts 3. Practise with a text editor Remember to **READ ALL THE WORDS** to work effectively and not waste time. Introduction and Overview ========================= This is an overview of how you are expected to complete this assignment. Read all the words before you start working. Complete the [Tasks] listed below on the [Course Linux Server] (**CLS**). Run a [Checking Program] to verify your work as you go. Submit your marks. You are given a file of somewhat random text, and a set of descriptions of sets of lines in that file. For each description, you are to produce a regular expression that will match the described set of lines. You will initially test your regular expressions on the command line, and when you are satisfied with each one, you will put the `grep` command in a shell script. A [Checking Program] is available to check your work as you go. The following tasks (except the first three, which should be done once) are to be repeated for each description. When you are finished the tasks, leave the files and directories in place as part of your deliverables. **Do not delete any assignment work until after the term is over!** Assignments may be re-marked at any time; you must have your term work available right until term end. The prevous term’s course notes are available on the Internet here: [CST8207 GNU/Linux Operating Systems I]. All the notes files are also on the CLS. You can learn about how to read and search these files using the command line on the CLS under the heading *Copies of the CST8207 course notes* near the bottom of the page [Course Linux Server]. Remember to **READ ALL THE WORDS** to work effectively and not waste time. Tasks ===== - Do the following tasks in order, from top to bottom. - These tasks must be done in your account on the [Course Linux Server]. - **READ ALL THE WORDS!** and do not skip steps. The Source Directory -------------------- All references to the “Source Directory” below are to the directory `~idallen/cst8177/13w/assignment03/` and that name starts with a *tilde* character followed by a userid with no intervening slash. Set Up ------ 1. Make an `Assignments/assignment03` directory in your HOME. 2. The file `foo.txt` in the [Source Directory] contains many lines of text. Put a soft link to this file in your new `assignment03` directory. Use the same name for the link. 3. In your new `assignment03` directory create a soft link named `check` to the checking program `assignment03check` from the [Source Directory]. Regular Expressions ------------------- Below, in the [Labelled Descriptions] section, you are given labelled descriptions of lines to find in the file `foo.txt`. For each labelled description you will repeat these two steps (described in detail below): 1. On the command line, invent a `grep` command using a single regular expression that will match the described lines of text (and nothing more). Do not use any options to `grep` (except possibly for the last question). You do not need multiple expressions or any extended regular expressions except possibly for the last question. 2. Put the working `grep` command into its own shell script. Each set of lines to be found is labelled below with a **label**. The label is the first word in the section, followed by a colon. For example, the following example description is labelled `bar:` bar: lines that contain the word barbar Repeat the following steps for each of the labelled descriptions: ### Repeat these steps for each label 1. Make your current working directory the one containing the link to the `foo.txt` file (if it is not already so). 2. Type directly at the command line your initial attempt at a `grep` command that finds the lines, and view the result on your screen. The correct answer in all cases will result in less than 50 lines of text on your screen. For the example given above with the label `bar`, a `grep` command you might try could be: $ grep 'barbar' foo.txt 3. If you’re not satisfied with your initial attempt, use up-arrow to retrieve the previous command, and make changes to the regular expression, then re-run the new command. Repeat the this step until you’re satisfied with the output and want to check your answer. 4. To check your answer, use up-arrow to retrieve the command, and modify it to pipe the output of your command into the `wc` program, then do the same, changing `wc` to `sum`. Compare the output of `wc` and `sum` with the values output by the [Checking Program]. For the example given above with the label `bar`, the checking pipelines would be done like this: $ grep 'barbar' foo.txt | wc $ grep 'barbar' foo.txt | sum The `'barbar'` string is the quoted regular expression. 5. If the word count or checksum values differ, you need to change your regular expression. Use up-arrow to retrieve the command, make your changes to the regular expression, and re-run the command. 6. When you are satisfied with your answer, use a text editor to create in your `assignment03` directory a shell script whose name is the **label** name followed by `.sh`, that simply runs your `grep` command without the piping of its output to the check program. Just copy the `grep` command into the last line of the script. For the example given above with the label `bar`, the script name must be `bar.sh` in the `assignment03` directory. The first few lines of every shell script must correspond to the **International Script Header** described in class and available in the [Internationalization] notes. The last line of every script will be your `grep` command. Do not redirect or pipe the output of your command into anything - the script should produce output on your screen so that it can be checked. 7. You can also check the output of your script using the `wc` and `sum` commands, similar to the way you checked the original `grep` command. The results should be identical: $ grep 'barbar' foo.txt | wc $ grep 'barbar' foo.txt | sum $ ./bar.sh | wc $ ./bar.sh | sum 8. Repeat the 8 steps in this section for each of the [Labelled Descriptions] below. NOTE: when it comes time to create your second and subsequent scripts, copy the previous script to the new label name rather than starting from scratch every time. Run the [Checking Program] to make sure you have copied the **International Script Header** correctly. ### When you are done When you are finished and all of the shell scripts have been created, run the [Checking Program] program to create an overall mark. Labelled Descriptions --------------------- Repeat the 8 steps of the [above section] for each of these labelled items below. None of these expressions except the very last one require any options to `grep`, nor multiple expressions, nor do they require any extended regular expressions. All except the last can be done with no options and basic regular expressions. 1. `upper`: lines containing at least one upper case alphabetic character. 2. `control`: lines containing at least one control character. (When checking your output, you can make control characters visible using the `-vT` options to the `cat` command, otherwise they won’t show on your screen. Do *not* put the `cat` command in your script.) 3. `punct`: lines containing at least one punctuation character. 4. `no_white`: lines containing no whitespace characters (Whitespace means spaces or space-like characters such as TABs.) 5. `no_num_white`: lines containing no whitespace or digit characters. 6. `empty`: empty lines (nothing on the line, not even whitespace characters). 7. `blank`: blank lines (A blank line contains zero or more whitespace characters.). 8. `only_alnum`: non-empty lines containing only alphanumeric characters. (“Non-empty” means there has to be at least one character.) 9. `only_alpha`: non-empty lines containing only alphabetic characters. 10. `only_upper`: non-empty lines containing only upper case alphabetic characters. 11. `only_digit`: non-empty lines containing only digits. 12. `backslash`: lines containing at least one `\` character. 13. `plus`: lines containing at least one `+` character. 14. `square`: lines containing at least one `[` or `]` character. 15. `question`: lines containing at least one `?` character. 16. `star`: lines containing at least one `*` character. 17. `dot`: lines containing at least one `.` character. 18. `caret`: lines containing at least one `^` character. 19. `begin_end`: lines that start with the exact five characters `begin` and that end with the exact three characters `end`. 20. `AB`: lines containing `A` and `B`, capitalized and in that order but not necessarily right next to each other. 21. `first`: lines that start with optional whitespace, then the string `first`. 22. `capital`: lines that contain the string `Capital` where the initial letter `C` must be upper-case but the rest of the letters could be either case, e.g. `CAPTIAL`, `CaPiTaL`, etc.. 23. `first_last`: lines that start with the exact five characters `first` preceded by any amount of whitespace and that end with the exact four characters `last` followed by any amount of whitespace. 24. `phone`: lines that contain a seven-(or more)-digit number with one or more dashes between the group of three (or more) digits and the group of four (or more) digits. These should match: `555-1212`, `555555-----121212121212`, `x555-1212x`, `x555---1212x`, `x999555-1212x`, `x555-1212999x` `x999555-1212999x`, but these would not match: `555-121x`, `x55-1212`, `5551212` 25. `better_phone`: lines that contain a seven-digit number, surrounded before and after with non-digit characters, with one or more underscores, dashes, or periods between the third and fourth digits. These should match: `x555-1212x`, `x555.1212x`, `x555_-.1212x`, `x555--__..-_.1212x` but these would not match: `555555-----121212121212`, `x999555-1212x`, `x555-1212999x` `x999555-1212999x`, `555-121x`, `x55-1212`, `5551212` 26. `password`: lines containing `password` or `passwd`, with the `p` optionally capitalized. These would match: `Password`, `password`, `Passwd`, but these would not match `Pass`, `passwD`, `paSsword`, `passw`, or `passd`. (Hint: There is a solution to this that permits `grep` to use multiple search patterns, or you can use a single *extended* regular expression.) ### When you are done When you are finished and all of the shell scripts have been created, run the [Checking Program] program to create an overall mark. Checking, Marking, and Submitting your Work =========================================== Check your work a final time using the `assignment03check` program symlink and save the output as described below. Submit your final mark following the directions below. **Summary:** Do some tasks, then run the checking program to verify your work as you go. You can run the checking program as often as you want. When you have the best mark, upload the marks file to Blackboard. 1. There is a [Checking Program] named `assignment03check` in the [Source Directory] on the CLS. Create a symbolic link to this program named `check` under your new `assignment03` directory so that you can easily run the program to check your work and assign your work a mark. Note: You can create a symbolic link to this executable program but you do not have permission to read or copy the program file. To verify the symbolic link, try executing it. 2. Execute the above “check” program using its symbolic link. (Review the [CST8207 Search Path] notes if you forget how to run a program by pathname from the command line.) This program will check your work, assign you a mark, and display the output on your screen. (You may want to paginate the long output so you can read all of it.) You may run the “check” program as many times as you wish, to correct mistakes and get the best mark. 3. When you are done with checking this assignment, and you like what you see on your screen, redirect the output of the [Checking Program] into the text file `assignment03.txt` under your `assignment03` directory. Use the *exact* name `assignment03.txt` in your `assignment03` directory. You only get *one* chance to get the name correct. Case (upper/lower case letters) matters. Be absolutely accurate, as if your marks depended on it. Do not edit the file. 4. Transfer the above `assignment03.txt` file from the CLS to your local computer and verify its contents. Do not edit this file! No empty files, please! Edited or damaged files will not be marked. You may want to refer to this term’s updated [File Transfer] notes. 5. Submit the `assignment03.txt` file under the correct Assignment area on Blackboard (with the exact name) before the due date. Upload the file via the **assignment03** “Upload Assignment” facility in Blackboard: click on the underlined **assignment03** link in Blackboard. Use “**Attach File**” and “**Submit**” to upload your plain text file. No word-processor documents. Do not send email. Use only “Attach File”. Do not enter any text into the **Submission** or **Comments** boxes on Blackboard; I do not read them. Use only the “**Attach File**” section followed by the **Submit** button. (If you want to send me comments about your assignment, use email.) 6. Your instructor may also mark the `assignment03` directory in your CLS account after the due date. Leave everything there on the CLS. **Do not delete any assignment work from the CLS until after the term is over!** Use the *exact* file name given above. Upload only one single file of plain text, not HTML, not MSWord. No fonts, no word-processing. Plain text only. Did I mention that the format is plain text (suitable for VIM/Nano/Pico/Gedit or Notepad)? **NO EMAIL, WORD PROCESSOR, PDF, RTF, or HTML DOCUMENTS ACCEPTED.** No marks are awarded for submitting under the wrong assignment number or for using the wrong file name. Use the exact name given above. WARNING: Some inattentive students don’t read all these words. Don’t make that mistake! Be exact. **READ ALL THE WORDS. OH PLEASE, PLEASE, PLEASE READ ALL THE WORDS!** -- | Todd Kelley / Richard Donnelly and | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [CST8207 GNU/Linux Operating Systems I]: ../../../cst8207/12f [Checking Program]: #checking-marking-and-submitting-your-work [Course Linux Server]: 000_course_linux_server.html [Tasks]: #tasks [Source Directory]: #the-source-directory [Labelled Descriptions]: #labelled-descriptions [Internationalization]: 000_character_sets.html#international-script-header [above section]: #repeat-these-steps-for-each-label [CST8207 Search Path]: ../../../cst8207/12f/notes/400_search_path.html [File Transfer]: 220_file_transfer.html [Plain Text]: assignment03.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/