% CST8177 Assignment 04 - Extended Regular Expressions % Todd Kelley, Ian! D. Allen – – [www.idallen.com] % Winter 2014 - January to April 2014 - Updated Thu Mar 20 20:17:44 EDT 2014 Due Date and Deliverables ========================= > **Do not print this assignment on paper!** > > - On paper, you will miss updates, corrections, and hints added to the > online version. > - On paper, you cannot follow any of the [hyperlink URLs] that lead you > to hints and course notes relevant to answering a question. > - On paper, scrolling text boxes will be cut off and not print properly. - **Due Date**: `23h59 (11:59pm) Saturday February 15, 2014 (end of Week 6)` - Late assignments or wrong file names may not be marked. Please be punctual. - **Available online** - Version 1 – 07:00 Tuesday February 4, 2014 - Version 2 – 11:00 Tuesday February 4, 2014 – fixed hour12; new due date - **Prerequisites** - [CST8207 GNU/Linux Operating Systems I] - All Class Notes since the beginning of term. - All your previous [Assignments]. - An ability to **READ ALL THE WORDS** to work effectively. - **Deliverables** 1. One text file uploaded to Blackboard according to the steps in the [Checking Program] section below. 2. Directory structure and files created and left for marking on the [Course Linux Server] (**CLS**).\ **Do not delete any assignment work from the CLS until after the term is over!** Purpose of this Assignment ========================== > **Do not print this assignment on paper!** On paper, you cannot follow any > of the hyperlink URLs that lead you to hints and course notes relevant to > answering a question. 1. Practise with **extended** regular expressions of varying complexity 2. Create simple shell scripts 3. Practise with a text editor Introduction and Overview ========================= This is an overview of how you are expected to complete this assignment. Read all the words before you start working. 1. Complete the **Tasks** listed below. 2. Verify your own work before running the [Checking Program]. 3. Run the [Checking Program] to help you find errors. 4. Submit the output of the [Checking Program] to Blackboard before the due date. 5. **READ ALL THE WORDS** to work effectively and not waste time. You are given a file of somewhat random text, and a set of descriptions of sets of lines in that file. For each description, you are to produce a `grep -E` command with one single extended regular expression that will match the described set of lines. You will initially test your regular expressions on the command line, and when you are satisfied with each one, you will put the command in a shell script. A [Checking Program] is available to check your work as you go. > Do not save the output of the [Checking Program]; the test file may > change at any time to include new test cases, so the word count and > checksums may change at any time. The following tasks (except the first three, which should be done once) are to be repeated for each description. When you are finished the tasks, leave the files and directories in place as part of your deliverables. **Do not delete any assignment work until after the term is over!** Assignments may be re-marked at any time; you must have your term work available right until term end. The prevous term’s course notes are available on the Internet here: [CST8207 GNU/Linux Operating Systems I]. All the notes files are also on the CLS. You can learn about how to read and search these files using the command line on the CLS under the heading *Copies of the CST8207 course notes* near the bottom of the page [Course Linux Server]. > Since I also do manual marking of student assignments, your final mark may > not be the same as the mark submitted using the current version of the > [Checking Program]. I do not guarantee that any version of the [Checking > Program] will find all the errors in your work. Complete your assignments > according to the specifications, not according to the incomplete set of the > mistakes detected by the [Checking Program]. The Source Directory -------------------- All references to the “Source Directory” below are to the CLS directory `~idallen/cst8177/14w/assignment04/` and that name starts with a *tilde* character `~` followed by a userid with no intervening slash. The leading tilde indicates to the shell that the pathname starts with the HOME directory of the account `idallen` (seven letters). Tasks ===== - Do the following tasks in order, from top to bottom. - These tasks must be done in your account on the [Course Linux Server]. - **READ ALL THE WORDS!** and do not skip steps. - Your instructor will mark on the due date the work you do in your account on the CLS. Leave all your work on the CLS and do not modify it. - **Do not delete any assignment work from the CLS until after the course is over.** Set Up ------ 1. Do a [Remote Login] to the [Course Linux Server] (**CLS**) from any existing computer, using the name appropriate for whether you are on-campus or off-campus. **All work in this assignment must be done on the CLS.** 2. Make an `assignment04` directory in the same directory as you made `assignment02` in a previous assignment. **This directory is the base directory for most pathnames in this assignment. Store your files and answers here.** 3. The file `foo.txt` in the [Source Directory] contains many lines of text. Put a soft link to this file in your new `assignment04` directory. Use the same name for the link. 4. In your new `assignment04` directory create a soft link named `check` to the checking program `assignment04check` from the [Source Directory]. Regular Expressions ------------------- Below, in the [Labelled Descriptions] section, you are given labelled descriptions of lines to find in the file `foo.txt`. For each labelled description you will repeat these two steps (described in detail below): 1. On the command line, invent a `grep -E` command using a single extended regular expression that will match lines consisting of one instance of the described item (and nothing more). For example, if you’re looking for phone numbers, your regular expression will look for lines that contain a single phone number and nothing else. Do not use any other options to `grep`. Use extended regular expression syntax appropriate for `-E`. 2. Put the working `grep -E` command into its own shell script. Each set of lines to be found is labelled below with a **label**. The label is the first word in the section, followed by a colon. For example, the following example description is labelled `bar:` bar: lines that consist of (only) the word barbar and nothing else Repeat the following steps for each of the labelled descriptions: ### Repeat these steps for each label 1. Make your current working directory the base directory (the directory containing the new link you made to the `foo.txt` file) if it is not already so. 2. You must find lines in the `foo.txt` file using a single `grep -E` command with no other options. Type directly at the command line your initial attempt at a `grep -E` command that finds the lines, and view the result on your screen. (Some correct answers may produce hundreds of lines of output.) No pipes or other options are allowed. Use only a single `grep -E` command with a single extended regular expression. For the example given above with the label `bar`, a `grep -E` command that would work would be: $ grep -E '^barbar$' foo.txt The following would all be **incorrect** solutions: $ grep -E 'barbar$' foo.txt # WRONG $ grep -E '^barbar' foo.txt # WRONG $ grep -E 'barbar' foo.txt # WRONG $ grep -E '^.*barbar.*$' foo.txt # WRONG $ grep -E '^ *barbar *$' foo.txt # WRONG $ grep -E '^Barbar$' foo.txt # WRONG 3. If you’re not satisfied with your initial attempt, use up-arrow to retrieve the previous command, and make changes to the regular expression, then re-run the new command. Repeat the this step until you’re satisfied with the output on your screen and want to check your answer. 4. To check your answer, use up-arrow to retrieve the command, and modify it to pipe the output of your command into the `wc` program, then do the same, changing `wc` to `sum`. Compare the output of `wc` and `sum` with the values output by the [Checking Program]. For the example given above with the label `bar`, the checking pipelines would be done like this, in this order: $ grep -E '^barbar$' foo.txt $ grep -E '^barbar$' foo.txt | wc $ grep -E '^barbar$' foo.txt | sum The `'^barbar$'` string is the quoted extended regular expression. 5. If the word count or checksum values differ, you need to change your regular expression. Use up-arrow to retrieve the command, make your changes to the regular expression, and re-run the command. > Do not save the output of the [Checking Program]; the test file may > change at any time to include new test cases, so the word count and > checksums may change at any time. 6. When you are satisfied with your answer as typed on the command line, use a text editor to create in your `assignment04` directory an executable shell script whose name is the **label** name followed by `.sh` that simply runs your `grep` command without the piping of its output to the check program. Just copy the `grep` command from the command line into the last line of the new shell script. For the example given above with the label `bar`, the script name must be `bar.sh` in the `assignment04` directory. The first few lines of every shell script must correspond exactly to the **Script Header** described in class. The last line of every script will be your `grep -E` command. Do not redirect or pipe the output of your command into anything – the script should produce the correct lines of output from `foo.txt` on your screen so that it can be checked. Do not put any lines into your script other than the **Script Header**, the single `grep -E` command line, and optional blank or comment lines. 7. You can also check the output of your script using the `wc` and `sum` commands, similar to the way you checked the original `grep` command. The script must output exactly the same lines as the original `grep` command that you put into it. The results should be identical: $ grep -E '^barbar$' foo.txt | wc $ ./bar.sh | wc $ grep '^barbar$' foo.txt | sum $ ./bar.sh | sum 8. Repeat the 8 steps in this section for each of the [Labelled Descriptions] below. NOTE: When it comes time to create your second and subsequent scripts, copy the previous script to the new label name rather than starting from scratch every time. Run the [Checking Program] to make sure you have copied the **Script Header** correctly. Do not put any lines into your script other than the **Script Header**, the single `grep -E` command line, and optional blank or comment lines. Checking your Answers --------------------- > Do not save the output of the [Checking Program]; the test file may > change at any time to include new test cases, so the word count and > checksums may change at any time. Write the extended regular expressions to match the given pattern specifications, not to match the particular set of test cases in the test file. The test file may change at any time to include new test cases. I may come up with other test cases even after the due date of the assignment; your script loses marks if it fails these new tests because it doesn’t do what the specification says it must do. You may have to write your own test cases to be sure you got it right. I’ve also set up the checking program to detect failure to [protect special characters] from shell GLOB expansion. If your expression works in your account but not when the checking script runs it, this may be your problem. You may also see “Permission denied” errors if this is the problem. Fix your script. Labelled Descriptions --------------------- - Repeat the 8 steps of the [above section] for each of these labelled items below. - All must be solved with only one single extended regular expression. - You are not allowed to use character ranges in character classes (e.g. `[a-z]` or `[0-9]`) due to problems with [Internationalization]. Definition: [Whitespace] : Spaces or space-like characters such as TABs, newlines, carriage-returns, form-feeds, etc. This is a distinct POSIX character class from **blanks**, which are only space and TAB. This assignment uses Whitespace, not blanks. 0. *label*: *description of desired `grep -E` output from in file `foo.txt`* 1. `names`: Lines consisting of a single name of a person, capitalized, with capitalized optional middle name, separated by a single space. Any alphabetic string is acceptable as a name. These should match John Smith John Yeardly Smith A B A B C Aabc Cdef Z Abc Def Ghi Abc Def These should not match john Smith John Smith John YeardlySmith a B A b C A B c A b abc Def Abc Def ghi A B C A B C D Abc Def G hi 2. `zipcode`: Lines consisting of a single numeric USA zip code of the form `99999` or `99999-9999`, with zeros allowed everywhere. These should match 12345 00000 01234 23456-0000 99999-0001 00000-0000 These should not match 123456 1233-4444 000-00000 2345-34568 23456-34568 3. `hour12`: Lines consisting of a single one or two digit integer between 1 and than 12 (inclusive), for valid 12-hour times. *Hint*: create one regular expression that matches numbers between 1 and 9, and another regular expression that matches numbers between 10 and 12, and combine those with alternation. *(Note: `0` is not a valid 12-hour time.)* These should match 1 01 2 02 9 09 10 11 12 These should not match 0 00 20 13 90 001 011 120 012 4. `daynum`: Lines consisting of a single integer between 1 and 31, for a valid day of a month. *Hint*: create one regular expression that matches numbers between 1 and 9 with optional leading zero, and another that matches numbers between 10 and 29, and one that matches numbers between 30 and 31, then use alternation to combine those three. These should match: 1 01 9 09 10 20 30 31 These should not match 0 00 001 100 32 031 310 56 33 100 5. `hour24`: Lines consisting of a single non-negative integer less than 24, for a valid hour in 24-hour times. *Hint*: See the hints for the previous questions. These should match: 0 1 2 3 00 01 05 09 10 15 20 23 These should not match: 000 100 023 009 012 24 25 045 6. `minutes`: Lines consisting of a single two digit integer less than 60, for valid minutes or seconds in a time. These should match 00 01 09 10 11 20 30 59 These should not match 000 010 200 60 070 0 9 90 1 7. `decimal`: Lines consisting of a single unsigned decimal or floating point number. These should match 000 0000.0000 8.45 2.768 0.320 .320 96 These should not match . 45. 1..2 8. `currency`: Lines consisting of a single dollar amount, starting with the leading dollar sign and optional two-digit cents. These should match $0 $1 $12 $.12 $0.12 $0000.12 $1234.56 These should not match 1 12 .12 0.12 1234.56 1. 1.2 1.23 $ $. $.1 $1.2 $1.234 9. `date`: Lines consisting of a single date with syntax *YYYY-MM-DD* where the year (*YYYY*) is exactly 4 digits, the month (*MM*) is between 1 and 12, two digits maximum, and the day (*DD*) is between 1 and 31, two digits maximum. The day does not have to be accurate for February, June, leap years, etc.; it only has to be a number between 1 and 31, two digits maximum. *Hint*: Combine and re-use your work and hints from earlier questions! These should match 0000-01-01 2014-1-1 2014-1-15 2014-01-31 2014-02-31 0000-6-31 2014-12-31 These should not match 0000-00-00 0000-00-01 2000-13-01 2000-12-00 20000-12-01 2000-012-12 2000-12-012 2000-12-120 2014-01-32 10. `time24hr`: Lines consisting of a single 24-hour time with optional seconds with syntax *HH:MM[:SS]* where minutes and seconds must have exactly two digits. These should match 02:23 2:23 2:23:59 12:23:59 23:23 00:00:00 00:00 00:00:59 01:01:01 These should not match 24:00 12:60:00 12:34:56:00 12:15:60 012:14:00 11:59:001 11:059 11:059:1 10:1:10 11. `time12hr`: Lines consisting of a single 12-hour based time with optional seconds and AM/PM using syntax *HH:MM[:SS][am|AM|pm|PM]* where minutes and seconds must have two digits, followed by an optional `am`, `pm`, `AM`, or `PM`. *Hint*: Use re-use parts of your `hour12` and `minute` regular expressions from above in your answer. *(Note: `00:00am` is not a valid 12-hour time.)* These should match 2:24pm 2:24 2:24PM 2:24AM 2:24am 02:34 12:59 12:56:59 4:56:56 These should not match 0:00 00:00 00:01 00:00am 00:59:59PM 99:99 002:23 13:01am 23:01PM 2:3pm 1:2:3pm 2:24pmpm 2:24amPM 2:23Pm 2:23pM 2:23aM 2:23Am 23:23 12:23:76 12:60:34 1:1 1:1:1 10:1:10 12. `ipaddr:` Lines consisting of a single IPV4 Address of four integers from 0 to 255 separated by dots. Each integer should be three digits or less, and leading zeros are OK. *Hint*: break each of the four integers into an alternation between the following ranges: integers greater or equal to 200 and less than or equal to 255, integers from 100 to 199, integers from 10 to 99, integers from 0 to 9. These should match 255.255.255.255 1.1.1.1 02.089.89.001 0.0.0.0 00.01.002.000 23.234.123.123 12.12.12.12 012.012.012.012 These should not match 1.1.1. 1. 1.1 1.1.1.1.1 0234.1.1.1 234.0234.166.23 1.1.1 345.2.2.2 299.2.2.2 > Do not save the output of the [Checking Program]; the test file may > change at any time to include new test cases, so the word count and > checksums may change at any time. When you are done ----------------- That is all the tasks you need to do. Check your work a final time using the [Checking Program] and save the output as described below. Submit your mark following the directions below. Checking, Marking, and Submitting your Work =========================================== **Summary:** Do some tasks, then run the checking program to verify your work as you go. You can run the checking program as often as you want. When you have the best mark, upload the marks file to Blackboard. > Do not save the output of the [Checking Program]; the test file may > change at any time to include new test cases, so the word count and > checksums may change at any time. 1. There is a [Checking Program] named `assignment04check` in the [Source Directory] on the CLS. Create a [Symbolic Link] to this program named `check` under your new `assignment04` directory so that you can easily run the program to check your work and assign your work a mark. Note: You can create a symbolic link to this executable program but you do not have permission to read or copy the program file. 2. Execute the above “check” program using its new symbolic link. (Review the [Search Path] notes if you forget how to run a program by pathname from the command line.) This program will check your work, assign you a mark, and display the output on your screen. (You may want to paginate the long output so you can read all of it.) You may run the “check” program as many times as you wish, to correct mistakes and get the best mark. **Some task sections require you to finish the whole section before running the checking program at the end; you may not always be able to run the checking program successfully after every single task step.** 3. When you are done with checking this assignment, and you like what you see on your screen, redirect the output of the [Checking Program] into the text file `assignment04.txt` under your `assignment04` directory. Use the *exact* name `assignment04.txt` in your `assignment04` directory. Case (upper/lower case letters) matters. Be absolutely accurate, as if your marks depended on it. Do not edit the file. Make sure the file actually contains the output of the checking program! 4. Transfer the above `assignment04.txt` file from the CLS to your local computer and verify that the file still contains all the output from the checking program. Do not edit this file! No empty files, please! Edited or damaged files will not be marked. You may want to refer to your [File Transfer] notes. 5. Submit the `assignment04.txt` file under the correct Assignment area on Blackboard (with the exact name) before the due date. Upload the file via the **assignment04** “Upload Assignment” facility in Blackboard: click on the underlined **assignment04** link in Blackboard. Use “**Attach File**” and “**Submit**” to upload your plain text file. No word-processor documents. Do not send email. Use only “Attach File”. Do not enter any text into the **Submission** or **Comments** boxes on Blackboard; I do not read them. Use only the “**Attach File**” section followed by the **Submit** button. (If you want to send me comments about your assignment, use email.) 6. Your instructor may also mark the `assignment04` directory in your CLS account after the due date. Leave everything there on the CLS. **Do not delete any assignment work from the CLS until after the term is over!** Use the *exact* file name given above. Upload only one single file of plain text, not HTML, not MSWord. No fonts, no word-processing. Plain text only. Did I mention that the format is plain text (suitable for VIM/Nano/Pico/Gedit or Notepad)? **NO EMAIL, WORD PROCESSOR, PDF, RTF, or HTML DOCUMENTS ACCEPTED.** No marks are awarded for submitting under the wrong assignment number or for using the wrong file name. Use the exact name given above. WARNING: Some inattentive students don’t read all these words. Don’t make that mistake! Be exact. **READ ALL THE WORDS. OH PLEASE, PLEASE, PLEASE READ ALL THE WORDS!** -- | Todd Kelly and | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [hyperlink URLs]: indexcgi.cgi#XImportant_Notes__alphabetical_order_ [CST8207 GNU/Linux Operating Systems I]: ../../../cst8207/13f [Assignments]: indexcgi.cgi#XAssignments [Checking Program]: #checking-marking-and-submitting-your-work [Course Linux Server]: ../../../cst8207/14w/notes/070_course_linux_server.html [Remote Login]: ../../../cst8207/14w/notes/110_remote_login.html [Source Directory]: #the-source-directory [Labelled Descriptions]: #labelled-descriptions [protect special characters]: ../../../cst8207/13f/notes/440_quotes.html [above section]: #repeat-these-steps-for-each-label [Internationalization]: 000_character_sets.html [Whitespace]: https://en.wikipedia.org/wiki/Whitespace_character [Search Path]: ../../../cst8207/13f/notes/400_search_path.html [File Transfer]: ../../../cst8207/14w/notes/015_file_transfer.html [Plain Text]: assignment04.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/