% CST8207 Assignment 05 – GLOB wildcard patterns, finding files using GLOB, redirection and pipes % Ian! D. Allen – – [www.idallen.com] % Winter 2015 - January to Apil 2015 - Updated Mon Mar 2 22:29:28 EST 2015 Due Date and Deliverables ========================= > **Do not print this assignment on paper!** > > - On paper, you will miss updates, corrections, and hints added to the > online version. > - On paper, you cannot follow any of the [hyperlink URLs] that lead you > to hints and course notes relevant to answering a question. > - On paper, scrolling text boxes will be cut off and not print properly. - **Due Date**: `15h00 (3pm) Tuesday February 24, 2015 (start of Week 7)` - Your next assignment will be available in a few days and will overlap this assignment. Start work on this now! Don’t delay! - Late assignments or wrong file names may not be marked. Please be accurate and punctual. - **Available online** - Version 1 – 14:00 January 27, 2015 – last task not finished yet - Version 2 – 02:00 February 2, 2015 – finished last task - Version 3 – 04:00 February 9, 2015 – updated due date above - **Prerequisites** - All [Class Notes][hyperlink URLs] since the beginning of term. - All your previous [Assignments] and [Worksheets]. - An ability to **READ ALL THE WORDS** to work effectively. - **Deliverables** 1. One plain text file uploaded to Blackboard according to the steps in the [Checking Program] section below. 2. Directory structure created and left for marking on the [Course Linux Server] (**CLS**).\ **Do not delete any assignment work from the CLS until after the term is over!** **WARNING:** Some inattentive students upload Assignment #5 into the Assignment #4 upload area. Don’t make that mistake! Be exact. Purpose of this Assignment ========================== > **Do not print this assignment on paper!** On paper, you cannot follow any > of the hyperlink URLs that lead you to hints and course notes relevant to > answering a question. This assignment is based on your weekly [Class Notes]. 1. Select names using GLOB patterns. 2. Search the course notes for keywords using GLOB patterns. 3. Copy 100 files based on a complex GLOB pattern. 4. Identify Unix, Windows, and Macintosh text file types. 5. Find file names in a directory using a GLOB pattern. 6. Find basenames recursively in a maze using a GLOB pattern. 7. Use redirection to append to a file. 8. Search some system log files and produce summary information Remember to **READ ALL THE WORDS** to work effectively and not waste time. Introduction and Overview ========================= This is an overview of how you are expected to complete this assignment. Read all the words before you start working. For full marks, follow these directions exactly. 1. Complete the **Tasks** listed below. 2. Verify your own work before running the **Checking Program**. 3. Run the **Checking Program** to help you find errors. 4. Submit the output of the **Checking Program** to Blackboard before the due date. 5. **READ ALL THE WORDS** to work effectively and not waste time. You will create file system structure in your CLS home directory containing various directories and files. You can use the **Checking Program** to check your work as you do the tasks. You can check your work with the **Checking Program** as often as you like before you submit your final mark. **Some task sections below require you to finish the whole section before running the **Checking Program**; you may not always be able to run the **Checking Program** successfully after every single task step.** When you are finished the tasks, leave the files and directories in place on the CLS as part of your deliverables. **Do not delete any assignment work until after the term is over!** Assignments may be re-marked at any time on the CLS; you must have your term work available on the CLS right until term end. > Since I also do manual marking of student assignments, your final mark may > not be the same as the mark submitted using the current version of the > **Checking Program**. I do not guarantee that any version of the **Checking > Program** will find all the errors in your work. Complete your assignments > according to the specifications, not according to the incomplete set of the > mistakes detected by the **Checking Program**. The Source Directory -------------------- All references to the “Source Directory” below are to the CLS directory `~idallen/cst8207/15w/assignment05/` and that name starts with a *tilde* character `~` followed by a user name with no intervening slash. The leading tilde indicates to the shell that the pathname starts with the HOME directory of the account `idallen` (seven letters). You do not have permission to list the names of all the files in the Source Directory, but you can access any files whose names you already know. Tasks ===== - Do the following tasks in order, from top to bottom. - These tasks must be done in your account on the [Course Linux Server]. - **READ ALL THE WORDS!** and do not skip steps. - Run the **Checking Program** to grade your work, then upload the file containing the output of the **Checking Program** to Blackboard. - Your instructor will also mark on the due date the work you do in your account on the CLS. Leave all your work on the CLS and do not modify it. - **Do not delete any assignment work from the CLS until after the course is over.** Do Worksheet 4 and Worksheet 5 ------------------------------ See your previous assignments for how best to fill in the worksheets. These worksheets prepare you to do the rest of the tasks listed below. Failure to complete the worksheets will make the rest of this assignment very difficult. Do the worksheets first! Record and save all your worksheet answers for study and quizzes! 1. Do a [Remote Login] to the [Course Linux Server] (**CLS**). **All work in this assignment must be done on the CLS.** 2. Set your `PS1` shell prompt. 3. Use LibreOffice or OpenOffice to complete [Worksheet #04 ODT]. (View online: [Worksheet #04 HTML].) Record and save all your worksheet answers for study and quizzes! 4. Use LibreOffice or OpenOffice to complete [Worksheet #05 ODT]. (View online: [Worksheet #05 HTML].) Record and save all your worksheet answers for study and quizzes! Failure to complete the worksheets will make the rest of this assignment very difficult. Do the worksheets first! Set Up – The Base Directory on the CLS -------------------------------------- > You must keep a list of command names used each week and write down what > each command does, as described in the [List of Commands You Should > Know]. Without that list to remind you what command names to use, you > will find assignments very difficult. 1. Create the following directory structure in your CLS personal HOME directory and record (for study purposes) the series of Unix commands you used to create it. (You do not have to create any directories that you have already created in a previous assignment.) Spelling and capitalization must be exactly as shown: a. Create the `CST8207-15W` directory in your CLS HOME directory. b. Create the `Assignments` directory in the `CST8207-15W` directory. c. Create the `assignment05` directory in the `Assignments` directory. > **Hint:** You can create the entire directory tree above using *one* single > command. **This `assignment05` directory is called the [Base Directory] for most pathnames in this assignment. Store your files and answers in this [Base Directory], not in your HOME directory or anywhere else.** Run the [Checking Program] to verify your work so far. Using shell GLOB patterns to select names ----------------------------------------- You need to understand [Shell GLOB Patterns] to do this task. 1. Make your HOME directory your current directory. 2. In your HOME directory, create two symbolic links to the old and new course notes for CST8207 using the `ln -s` command and option and the method described in [Copies of the CST8207 Course Notes]. (The old notes must be term `14f` and the new notes must be term `15w` in the pathnames you use.) 3. Do a long listing of the new `oldnotes` symlink and verify that it looks similar to this (but the userid and time will differ): lrwxrwxrwx 1 abcd0001 abcd0001 52 Jan 27 07:37 oldnotes -> /home/idallen/public_html/teaching/cst8207/14f/notes You should be able to do `ls oldnotes` and see all the course notes file names from last term (14f). If not, remove and redo the symlink. 4. In your HOME directory, use the `ls` command with a single shell GLOB pattern to match all pathnames under the symbolic link `oldnotes/` that end in `.txt` and display all the names on your screen. The shell will find 87 pathnames ending in `.txt`, and the `ls` command will display those 87 names on your screen as 87 lines. One of the last names on your screen should look exactly like this: oldnotes/worksheet08.txt Make sure you see 87 pathnames. (You can use a command pipeline to count the lines to be sure you have 87.) **Hints:** No pipeline is required to generate the 87 pathnames, just use the `ls` command and one GLOB pattern argument containing the symlink `oldnotes/`. This use of a GLOB pattern on a command line is illustrated in [Copies of the CST8207 Course Notes]. The example in the notes uses the given GLOB pattern to generate pathnames to the `ls` command and count them. Follow the example and display the 87 pathnames on your screen instead of counting them (don’t use any pipes). ### `textfound.txt` {#textfound.txt .floatright .unnumbered} 5. When the `ls` output on your screen is correct (87 names), redirect the output 87 names into file `textfound.txt` under your [Base Directory]. The file must contain 87 names, one per line. 6. Again in your HOME directory, use the `echo` command with a shell GLOB pattern to match all pathnames under `oldnotes/` that contain the word `symbolic` *anywhere* in the file name and display the names on your screen. The shell will find two pathnames, one ending in `.html` and the other in `.txt`, and the `echo` command will display those two names on your screen on one line. ### `symfound.txt` {#symfound.txt .floatright .unnumbered} 7. When the `echo` output on your screen is correct (two names on one line), redirect the output into file `symfound.txt` under your [Base Directory]. The file must contain two names on one line. 8. Again in your HOME directory, use the `echo` command with a shell GLOB pattern to match pathnames under `oldnotes/` that contain the word `vi` anywhere in the file name and end in the extension `.pdf` at the end. The shell will find two pathnames, each ending in `.pdf` at the end, and the `echo` command will display those two names on your screen on one line. 9. When the `echo` output on your screen is correct (two names on one line), change the command name from `echo` to `ls` and add an option to show the full, long information about the pathnames. You should see two long lines on your screen, showing the full file information for each of the two files. ### `vifound.txt` {#vifound.txt .floatright .unnumbered} 10. Now redirect the two lines of long output on your screen into file `vifound.txt` under your [Base Directory]. The file must contain two lines and approximately 18 words. Run the [Checking Program] to verify your work so far. Searching for text inside files (e.g. course notes) --------------------------------------------------- As mentioned in [Worksheet #03 HTML], choose which text search command you use depending on whether special characters are being used in the search string. We almost always use the fixed-string `fgrep` command in this introductory course. You will learn regular expressions and the `grep` command in later terms. Always verify that the correct output appears on your screen *before* you redirect the output into a file. [**You can only redirect what you can see.**] ### `mypasswd.txt` {#mypasswd.txt .floatright .unnumbered} 1. Search for lines containing your login userid in the password file and redirect the output into file `mypasswd.txt` in your [Base Directory]. You should find exactly one line. 2. Search for lines containing a period (dot) character (`.`) in the file `special.txt` in the [Source Directory]. **Hint:** A period is a special character. Choose the right command. The word count of the correct output should be: `6 32 167` ### `periods.txt` {#periods.txt .floatright .unnumbered} 3. When you have the correct output on your screen, redirect that output into file `periods.txt` under your [Base Directory]. The word count of the file should be the same as above. 4. Search for lines containing two asterisk characters (`**`) in the file `special.txt` in the [Source Directory]. **Hint:** An asterisk is a special character. Choose the right command. The word count of the correct output should be: `3 28 159` ### `asterisks.txt` {#asterisks.txt .floatright .unnumbered} 5. When you have the correct output on your screen, redirect that output into file `asterisks.txt` under your [Base Directory]. The word count of the file should be the same as above. 6. In your HOME directory, create two symbolic links to the old and new course notes for CST8207 using the method described in [Copies of the CST8207 Course Notes], unless you have already created these links earlier in this assignment. In the same notes section, see the example use of `fgrep` with shell GLOB patterns to match `*.txt` files in these `oldnotes` and `newnotes` directories. The GLOB pattern easily generates a huge list of file names for `fgrep` to search inside. 7. In the course notes from last year, search inside all the `.txt` files for the word `Filezilla` (spelled exactly as shown, case-sensitive). Only three lines of text should display, from three files. **Hint:** You will need to use the same GLOB pattern you used earlier to match all the `.txt` files under `oldnotes`. This time, use the GLOB pattern to give file names to the command that searches inside all those files. If you see more than three lines of output, you are likely using options that make the search case-insensitive. Don’t do that. 8. Repeat the above on all the `*.txt` files, but add the searching option that ignores case distinctions when matching lines in the files (RTFM). Now, 11 lines are found in five different files. **Hint:** These text-searching commands are case-sensitive by default – searching inside files for lines containing `abc` won’t find any lines containing `ABC` unless you use an option to *ignore case distinctions* during the search. (What option? RTFM) ### `filezilla.txt` {#filezilla.txt .floatright .unnumbered} 9. Redirect the 11 lines of output into a file named `filezilla.txt` under your [Base Directory]. Run the [Checking Program] to verify your work so far. The cracker WAREZ 100 files --------------------------- You need to understand [Shell GLOB Patterns] to do this task. The “story” here is that a malicious cracker has dumped a bunch of WAREZ files in a directory on the server and has hidden them among thousands of other files. (See .) Your job is to take a copy of the WAREZ files, and only the WAREZ files, for use in a court case. You must not touch or copy any other files, only the WAREZ files. 1. There is a directory named `start` under the [Source Directory]. Hidden (really hidden) deeper under this directory is one single directory containing over **136,000** pathnames. Be careful about typing `ls` in this directory without using any output pagination pipe – the amount of output may flood your terminal window for some time and even a `^C` interrupt may take a minute or two to interrupt the command! One way to avoid flooding your screen is by using `ls | wc` to count how many pathnames would be output on your screen before you do just `ls`. Find this huge hidden directory and make this huge directory your current directory, so that you can experiment with the GLOB pattern you will need in the following questions. **Hints:** This isn’t a maze. There is only *one* path down to the huge hidden directory inside the `start` directory, though the way is hidden. Remember not to type `ls` in this large directory, when you find it, because the output is very large! 2. Exactly 100 files in this one (huge) directory have names that contain your userid (which must be matched lower-case) followed somewhere later by the string *warez*, where *warez* is case-insensitive and may appear in any combination of upper- and lower-case letters, e.g. `warez`,`Warez`,`wArez`,`waREz`, etc. Any amount of text may appear before your userid, between your userid and the *warez*, and after the *warez*. Some sample file names for userid *abcd0001* might look like these (note that the *warez* word *must* always follow the userid in all the required file names): - `HhUtfgYtyGhjJADGekCAkgtZEKsTGKdYZZabcd0001ADGekCwaREZZaFSrXJnxGex` - `zynabcd0001uKVUFOsCXaGFWZPECbYWVFKzynuKWaREZv` Using one single copy command and a single shell GLOB expression, copy all 100 (exactly 100) of these cracker files (and no others) into a new directory named `warez` that you must create in your [Base Directory]. Make sure you preserve the modify times of the copied files, as you did in a previous lab. (In this simulation, all the files are empty.) **Hints:** Use one shell GLOB pattern to match the 100 file names. The shell can do it all with one copy command using the right GLOB pattern for the source files, as you did in section 4.1 of [Worksheet #04 HTML]. Always use `echo` with the GLOB pattern into word count to [verify your GLOB patterns before using them] to see if your file names are correct before you try to use the GLOB expression in a copy command. **Do not use a pipe or `find` to select the file names. Use only the copy command with a GLOB pattern for the source files, as you did in section 4.1 of [Worksheet #04 HTML].** **Do not quote the shell GLOB patten.** Quoting turns *off* shell GLOB patterns. You *want* the shell to expand the GLOB pattern for this task! (If you were passing a GLOB pattern as an expression in a `find` command, you would quote it so that the shell didn’t expand it. That is not what you are doing here.) ### `copycmd.txt` {#copycmd.txt .floatright .unnumbered} 3. Put the copy command line that you used into file `copycmd.txt` in your [Base Directory]. You can use a text editor or you can use the `echo` command to do this, as you did in a previous assignment. **Hint:** If you use `echo` to echo and redirect the line into the file, make sure you quote all the shell metacharacters that might expand. Make sure that the content of the file is exactly the same as the command you typed, with no special characters expanded. The number of words in the file should be less than a dozen. 4. You can check your work by doing a recursive listing of your `warez` directory and counting the number of names that were copied. All the files should have their original modify dates preserved – verify this. Run the [Checking Program] to verify your work so far. Finding files in a directory using a shell GLOB pattern ------------------------------------------------------- You need to understand [Shell GLOB Patterns] to do this task. The shell will expand the GLOB pattern to match file names in the current directory. ### *abcd0001.txt* {#abcd0001.txt .floatright .unnumbered} 1. Under the [Source Directory] there is a name `maze` (four letters). What is the absolute path of this `maze` under that directory? Put the absolute pathname of this `maze` in that directory into a file in your [Base Directory] with a *basename* similar to *abcd0001.txt*, but use the basename that starts with your *own* Blackboard userid, not the fake userid *abcd0001*. Use your own userid in the file name. The file name must be exactly 12 characters long. The absolute pathname of the maze itself is over 40 characters long. 2. Use the GLOB feature to have the shell display on your screen on six lines the six absolute paths of the six file names under the above `maze` directory that begin with your userid. (One of the six absolute pathnames will end in *abcd0001.txt* where *abcd0001* is your own userid.) **Hint:** Use the `ls` command (no options) with an absolute path shell GLOB pattern as an argument, in a manner similar to how you displayed all the `tty` names in section 4.1 of [Worksheet #04 HTML]. ### `firstmaze.txt` {#firstmaze.txt .floatright .unnumbered} 3. Save (redirect) the six lines of output into file `firstmaze.txt` in your [Base Directory]. The file must contain six absolute pathnames, one on each line, each containing your userid. ### `firstmaze.sh` {#firstmaze.sh .floatright .unnumbered} 4. Save the command line you used into file `firstmaze.sh` in the same directory. Pay attention to the file name extension. **Hint:** You can’t just stick `echo` on the front of a command line that contains shell metacharacters such GLOB patterns; the shell will expand all those metacharacters before the `echo` command runs. If you want to use `echo` with redirection to save the command line, you have to hide all the metacharacters from the shell. An easier way to save the command line is to copy it and paste it into a text editor such as `vim`. These six pathnames are only six of the many file names in the maze that start with your userid. We need to find them all. Finding files recursively in a maze using a `find` GLOB pattern --------------------------------------------------------------- Shell GLOB patterns can only look in one directory; they don’t search the entire maze. To find *all* the files in the maze that start with your userid, we can’t use shell GLOB patterns directly. We need to use that command that searches a directory recursively, and make it use the GLOB pattern. (You have used this command many times already.) You need to understand [Shell GLOB Patterns] to do this task. You must know about [Finding Files]. The shell will not be expanding the GLOB pattern in this task, since you will be quoting them and passing the GLOB pattern to another command for evaluation, but the GLOB pattern metacharacters work the same way to match basenames, as shown in the examples in [Finding Files]. We need to hide the GLOB patterns from the shell, since we want to pass the GLOB patterns unchanged to the command we use. Here’s how: 1. Using the search tools in your web browser (not on the CLS), look for the string `quote` in the course notes web page on [Searching for and finding files by name, size, use, modify time, etc.][Finding Files] Read *all* the paragraphs containing this word (search multiple times) and remember the importance of quoting. You will need to know how to do this quoting when you start the finding and searching work for this task on the CLS, below. ### `howtoquote.txt` {#howtoquote.txt .floatright .unnumbered} 2. In the first paragraph you found, above, put the example command line (showing the use of quotes around the `*.txt` argument that contains a GLOB character) into file `howtoquote.txt` in your [Base Directory]. The file must contain just the example command line text after the `e.g.` and it will be one line, three words, 19 characters, according to `wc`. **Hint:** If the count is wrong, *look in the file* to see what is wrong with the text. Does the file contain *exactly* the same text as the course notes? If not, edit the file and fix it. ### `mazeinfo.txt` {#mazeinfo.txt .floatright .unnumbered} 3. Use the absolute pathname of the `maze` name in the [Source Directory] as an argument to `ls` along with an option that shows the *long information* about the pathname. When you see the correct one line of output, redirect and save the output (one line) into file `mazeinfo.txt` under your [Base Directory]. **Hint:** You should see exactly one line of output. You have the right option to `ls` if the first word of the output is `lrwxrwxrwx`, indicating that `maze` is a symbolic link, not a directory. **Hint:** If you get a directory listing full of files instead of one line starting with `lrwxrwxrwx`, make sure you are using the right option to `ls` and the correct [Source Directory] path from this assignment and not any previous assignment. ### `mazelscommand.sh` {#mazelscommand.sh .floatright .unnumbered} 4. Save the exact `ls` command line you used into file `mazelscommand.sh` in the same directory. The command should use one option and one absolute pathname. We will learn more about symbolic links in a future assignment. For now, note that the `maze` symbolic link has an arrow that leads to the same directory maze used in [Assignment #03 HTML]. (See that assignment for details on the size of this maze.) ### Finding names starting with *abcd0001* Again, in a manner similar to your previous assignments, you must find files in this maze, using the maze as the *starting directory*. The symbolic link requires some special handling, because the command that recursively finds files *does not follow symbolic link arguments on the command line* without using an option. You must choose one of these methods to search this symbolic link maze (choose one): a) **Method 1:** Use an option to the finding command that makes it follow symbolic links only *while processing the command line arguments*.\ **Hint:** RTFM, search for `while processing`, and do **not** use the `-L` option, **OR** b) **Method 2:** Make the `maze` your current directory and then recursively search the current directory. (A current directory can never be a symbolic link – it must be a real directory.) You will choose *one* of the previous two *starting directory* methods to reach the maze when you start searching, below. 5. As you know from a previous assignment, this `maze` contains many hidden sub-directories. With this maze as a *starting directory*, and using one of the two above methods, use a single command (no pipes needed) to recursively find *all* pathnames with a **basename** that begins with your eight-character userid at the *start* of the name. For example, if your userid were *abcd0001* then you might match and output pathnames containing basenames such as `abcd0001` and `abcd0001YYY` but *not* `XXXabcd0001` or `XXXabcd0001YYY` or `abcdYYYY` where *XXX* and *YYY* can be any non-empty strings of characters. Your own userid must start every basename. Your single recursive command should find exactly 23 pathnames. **Hint:** You must use a single command (not a pipeline) that is good at [Finding Files] by a *basename* pattern to do this. Do not try to use `cd` and `ls` to find all the files; the maze is really, really big. **Hint:** You have previously used this recursive command many times without a pattern for a **basename**. This task requires you to use a quoted GLOB pattern that matches your userid followed by *zero or more characters*. The command you use should recursively find exactly 23 pathnames, all containing your userid. **Hint:** If you don’t find any pathnames, re-read the section on Methods, above. If you only find a few pathnames, re-read the section on **quotes**, above. 6. When you see all 23 pathnames on your screen, take the same single command you used to find the names above and modify it to use the expression that makes the command show the full detailed attribute information about the names (including permissions, owner, size, date, etc.) instead of just the pathname. Use the same command; just remove `-print` and add the right expression. You will know you have the right expression if the output of the command is 23 lines and approximately 256 words. **Hint:** You know which expression to use from your answers in [Worksheet #02 HTML] and [Worksheet #03 HTML] and from reading the **detailed attribute information** paragraph at the end of Section 2 of the [Finding Files] notes. You may want to review using pipes in [Worksheet #05 HTML] and [Redirection and Pipes] to do this next item. ### `mazefound1.txt` {#mazefound1.txt .floatright .unnumbered} 7. Pipe the 23 lines of pathname output of the above command into a sorting program and put the *sorted* output into file `mazefound1.txt` under your [Base Directory]. The sorted file will still contain exactly the same number of lines and words as you counted, above. ### `findcmd1.txt` {#findcmd1.txt .floatright .unnumbered} 8. Put the above two-command pipeline with redirection that you just used, into file `findcmd1.txt` in your [Base Directory]. **Hint:** You can use a text editor (easy) or you can use the `echo` command (tricky) to do this, as you did in a previous assignment. *Use a text editor!* If you try to use `echo` you will need to hide all the shell meta-characters in the command line from the shell; make sure the command line echoes correctly to the screen before you try to redirect it into the file. “You can only redirect what you can see!” ### Finding names containing *abcd0001* anywhere 9. In this same maze, use a single command (not a pipeline) to recursively find all pathnames with a **basename** that contains your eight-character userid *anywhere* in the name. For example, if your userid were *abcd0001* then you might output pathnames containing basenames such as `abcd0001`, `abcd0001YYY`, `XXXabcd0001`, and `XXXabcd0001YYY` where *XXX* and *YYY* can be anything (zero or more characters). Your own userid will be somewhere in every basename. Your single recursive command should find exactly 47 pathnames. **Hint:** See the hints for the previous section. 10. When you see all 47 pathnames on your screen, take the same single command you used to find the names above modify it to use again the expression that makes the command show the detailed attribute information about the names, as you did above. You will know you have the right expression if the output of the command is 47 lines and approximately 535 words. ### `mazefound2.txt` {#mazefound2.txt .floatright .unnumbered} 11. Pipe the 47 lines of pathname output of the above command into a sorting program and put the *reverse*-sorted output into file `mazefound2.txt` under your [Base Directory]. The *reverse-sorted* file will still contain exactly the same number of lines and words as you counted, above. ### `findcmd2.txt` {#findcmd2.txt .floatright .unnumbered} 12. Put the above two-command pipeline with redirection that you just used, into file `findcmd2.txt` in your [Base Directory]. **Hint:** See the hints for the previous section. Run the [Checking Program] to verify your work so far. Three different O/S Text Files – Unix, Windows, Macintosh --------------------------------------------------------- 1. Somewhere under that same `start` directory you used earlier for the WAREZ problem are exactly three non-empty files whose names contain your userid (lower-case) somewhere (anywhere) in the name. (Most of the other files whose names also contain your userid are empty files.) Use a command to recursively find and display these three non-empty (size larger than zero) files with your userid anywhere in the name. When you know the three pathnames, manually copy each of them (preserving modify times) to a new directory named `OSfiles` that you must create in your [Base Directory]. Since there are only three file names, you can use your mouse to copy-and-paste the three file names you need to copy, once you know their names. **Hints:** What command finds files based on expressions that can include both **size** *and* a **basename** that can be a GLOB-style pattern? You have used this command many times this term. See the end of [Worksheet #02 HTML] and the notes on [Finding Files]. **Hints:** You will also find your userid mentioned inside each file, but because the files are not all Unix/Linux text files, some of the text content may not display correctly on your terminal screen. The `less` command is better than `cat` when displaying files containing strange (e.g. unprintable) characters, but see also the “show-nonprinting” option to `cat`. *(Optional advanced use: You can also read this optional material on a better way to [use find -exec and xargs].)* ### `unix` {#unix .floatright .unnumbered} ### `windows` {#windows .floatright .unnumbered} ### `macintosh` {#macintosh .floatright .unnumbered} 2. In your `OSfiles` directory, determine which operating system created each of the three non-empty files. Rename the Unix/Linux file to be `unix`, the Windows file to be `windows` and the Macintosh file to be `macintosh`. **Hints:** In [Assignment #02 HTML] you used a command that can determine file type to identify the text inside a `date.txt` file. You will also find this command listed under Week 01 in the [List of Commands][List of Commands You Should Know] in your notebook. Use this command and the notes on [Text File Line End Differences] to identify the special line endings of the Windows and Macintosh files. Your instructor will also mark the [Base Directory] in your account on the due date. Leave everything there on the CLS. Do not delete anything. Run the [Checking Program] to verify your work so far. Appending to files ------------------ You need to understand [Redirection and Pipes] to do this task. ### `wc` {#wc .floatright .unnumbered} 1. Count the lines, words, and characters in the file `services` under the `/etc` directory and put the count in file `wc` under your [Base Directory]. (Use the absolute pathname of the `services` file when you count and do not use any pipes.) The file `wc` should contain one line containing three numbers and an absolute pathname at the end. 2. Extract just the first line of the same `services` file and append this one line to the end of the `wc` file, so that the file `wc` now has two lines in it (the word count line and the first line of `services`). **Hint:** You know a command that shows lines at the start of a file. Review your work in [Worksheet #05 HTML] and the notes on [Redirection and Pipes]. 3. Append the count of the lines, words, and characters in the file `protocols` in the `/etc` directory to the end of file `wc`, so that the `wc` file now has three lines in it. (Use the absolute pathname of the `protocols` file when you count and do not use any pipes.) 4. Extract just the last line of the same `protocols` file and append just this one line to the end of the `wc` file, so that the file `wc` now has four lines in it. **Hint:** You know a command that shows lines at the end of a file. Review your work in [Worksheet #05 HTML] and the notes on [Redirection and Pipes]. Confirm that the word count of the `wc` file gives `4 20 140`. If you see the right number of lines but the other values differ, go back and re-read all the words in the sentences above, especially the sentences that start with the words “Use the”. Run the [Checking Program] to verify your work so far. Searching System Log Files -------------------------- You need to understand [Redirection and Pipes] to do this task, especially the section on [**Using successive filters in pipes**]. Your [Week 05 Notes] explain the command that extracts fields from lines. 1. Copy the six-command pipeline used in Example 2 given in [**Using successive filters in pipes**] and modify it to add on the end a seventh filter command that limits the output on the screen to the first seven lines. **Hints:** Do not change any of the existing six commands in the pipeline. All you need to do is add a seventh filter command. The first line of output will be `1581 (221.235.188.212)` and the last line (of seven lines) will be `147 (81.8.0.22)`. ### `refused7.sh` {#refused7.sh .floatright .unnumbered} 2. When the output is correct, put the new seven-command pipline you used into file `refused7.sh` in the [Base Directory]. Typing `sh -u refused7.sh` should print the seven most active attack IP addresses for January on your screen. If it doesn’t do this, you haven’t copied the command line correctly. Check it! 3. Edit the `refused7.sh` file and add to the end of the file, underneath your seven-command pipeline, exactly seven numbered shell comments that explain briefly and **in your own words** the meaning of each of the seven commands used in the pipeline, using the format described below. Shell script comments start with the number-sign (or hash-tag) character `#` and extend to the end of the line. The seven numbered comment lines must have a syntax similar to this (though this is the wrong pipeline and wrong comments to use for this task): last idallen | awk '{ print $3 }' | grep '^[0-9]' | sort | uniq | wc -l # 1. last idallen: show last login lines only for user idallen # 2. awk '{ print $3 }': display only third field (IP address) # 3. grep '^[0-9]': select only lines starting with a digit # 4. sort: put IP addresses into sorted order # 5. uniq: throw away duplicate adjacent IP addresses, leaving only unique # 6. wc -l: count the number of unique IP addresses (number of lines) **Comment Format:** Since there are **seven** commands in your script pipeline, you will need to write exactly **seven** numbered comment lines to explain them. As you see in the above example, each of the seven comment lines starts at the left margin with the `#` comment character (no spaces in front), followed by a space, number, a period, space, the pipeline command name and options to which the comment refers, and then your own comment text written **in your own words**. Each comment text is written **in your own words** to explain what the command does in the pipeline. Do not copy words; write your own. Follow the syntax shown in the above example, and use your **own** words (don’t copy mine). Including the seven comment lines, your `refused7.sh` file will be at least eight (or more) lines long. 4. Write a command to count the number of lines containing the string `new denied hosts` in the `denyhosts-2014` log file on the CLS. (This log file is in the same directory as the `auth.log` file used in the previous item and in most of the [Weekly Class Notes].) You should find `1669` matching lines in the file. **Hint:** My solution used one command name with no pipes needed. I used an option that counted the number of matching lines, as shown in the weekly course notes. ### `denycmd1.sh` {#denycmd1.sh .floatright .unnumbered} 5. When the output is correct, put the command line you used to generate the number `1669` into file `denycmd1.sh` in the [Base Directory]. Typing `sh -u denycmd1.sh` should print the number `1669` on your screen. If it doesn’t do this, you haven’t copied the command line correctly. Check it! 6. Write a command pipeline (using pipes) to count the number of lines containing the string `new denied hosts` in only **September 2014** in the `denyhosts-2014` log file on the CLS. You should find 129 matching lines to count and the output should be the number 129. **Hints:** The Example 1 given in [**Using successive filters in pipes**] explains how you might find some lines in the `auth.log` file that were created in January. Apply what you learn there to solve this problem. Before you try, look at the `denyhosts-2014` file and find out what format it uses to represent the date “September 2014”. You can’t just look for the text “September 2014” in the file; it’s not there. Look into the file to see the actual date format and create a filter command to search for that date format and count the lines. My solution used two command names with one pipe between. The second command used an option that counted the number of matching lines, as shown in the weekly course notes. ### `denycmd2.sh` {#denycmd2.sh .floatright .unnumbered} 7. When the command pipeline is correct, put the command pipeline you used to generate the number `129` into file `denycmd2.sh` in the [Base Directory]. Typing `sh -u denycmd2.sh` should print the number `129` on your screen. If it doesn’t do this, you haven’t copied the command line correctly. Check it! 8. Using your shell history and the command you used in the previous item, modify and redo the command a few times to manually find the number of denied hosts in each month in 2014. Use this to determine the month with the largest number of denied hosts (383). **Hint:** It’s one of the months after September. ### `denyhosts3.txt` {#denyhosts3.txt .floatright .unnumbered} 9. When you find the month with the largest number of denied hosts, Put the first five lines and the last five lines of log entries for this month into file `denyhosts3.txt` in the [Base Directory]. **Hint:** Use a command pipeline to generate the first five lines of log output for this month and save them, then modify the command pipeline to generate the last five lines of log output for this month and append them to the file containing the first five lines. That is your answer. The word count of this ten-line file should be: `10 100 854` Run the [Checking Program] to verify your work so far. When you are done ----------------- That is all the tasks you need to do. Check your work using the [Checking Program] below and save the standard output of that program into a file as described below. Submit that file (and only that one file) to Blackboard following the directions below. When you are done, log out of the CLS before you close your laptop or close the PuTTY window, by using the shell `exit` command: $ exit Checking, Marking, and Submitting your Work =========================================== **Summary:** Do some tasks, then run the **Checking Program** to verify your work as you go. You can run the **Checking Program** as often as you want. When you have the best mark, upload the single file that is the output of the **Checking Program** to Blackboard. > Since I also do manual marking of student assignments, your final mark may > not be the same as the mark submitted using the current version of the > **Checking Program**. I do not guarantee that any version of the **Checking > Program** will find all the errors in your work. Complete your assignments > according to the specifications, not according to the incomplete set of the > mistakes detected by the **Checking Program**. 1. There is a **Checking Program** named `assignment05check` in the [Source Directory] on the CLS. You can execute this program by typing its (long) pathname into the shell as a command name: $ ~idallen/cst8207/15w/assignment05/assignment05check You will learn of ways to make this shorter in future assignments. 2. Execute the above **Checking Program** as a command line on the CLS. This program will check your work, assign you a mark, and display the output on your screen. If the **Checking Program** is not yet ready, it will say `NOT FINISHED YET` and `DO NOT SUBMIT THIS FILE`. No mark is shown; do not submit the file. Wait until the checking program is finished (it gives you a mark) before you save and submit your marks. You may run the **Checking Program** as many times as you wish, allowing you to correct mistakes and get the best mark. **Some task sections require you to finish the whole section before running the**Checking Program** at the end; you may not always be able to run the **Checking Program** successfully after every single task step.** 3. When you are done with this assignment, and you like the mark displayed on your screen by the **Checking Program**, you must **redirect** only the standard output of the **Checking Program** into the text file `assignment05.txt` on the CLS, like this: $ ~idallen/cst8207/15w/assignment05/assignment05check >assignment05.txt $ cat assignment05.txt - Use output redirection with that *exact* `assignment05.txt` file name. - Be absolutely accurate, as if your marks depended on it. - Case (upper/lower case letters) matters. - Make sure the file actually contains the output of the **Checking Program**! - The file should contain, near the bottom, a line starting with: `YOUR MARK for` - Really! **MAKE SURE THE FILE HAS YOUR MARKS IN IT!** 4. Transfer the above single file `assignment05.txt` (containing the output from the **Checking Program**) from the CLS to your local computer. - You may want to refer to the [File Transfer] page for how to transfer the file. - Verify that the file still contains all the output from the **Checking Program**. - Do not edit this file! No empty files, please! Edited or damaged files will not be marked. Submit the file exactly as given. - The file should contain near the bottom a line starting with: `YOUR MARK for` - Really! **MAKE SURE THE FILE HAS YOUR MARKS IN IT!** 5. Upload the `assignment05.txt` file from your local computer to the correct Assignment area on Blackboard (with the exact name) before the due date: 1. On your local computer use a web browser to log in to Blackboard and go to the Blackboard page for this course. 2. Go to the Blackboard *Assignments* area for the course, in the left side-bar menu, and find the current assignment. 3. Under *Assignments*, click on the underlined **assignment05** link for this assignment. a) If this is your first upload, the *Upload Assignment* page will open directly; skip the next sentence. b) If you have already uploaded previously, the *Review Submission History* page will be open and you must use the *Start New* button at the bottom of the page to get to the *Upload Assignment* page. 4. On the *Upload Assignment* page, scroll down and beside *Attach File* use *Browse My Computer* to find and attach your assignment file from your local computer. Make sure the assignment file has the correct name on your local computer before you attach it. 5. After you have attached the file on the *Upload Assignment* page, scroll down to the bottom of the page and use the *Submit* button to actually upload your attached assignment file to Blackboard. Use only *Attach File* on the *Upload Assignment* page. Do not enter any text into the *Text Submission* or *Comments* boxes on Blackboard; I do not read them. Use only the *Attach File* section followed by the *Submit* button. If you need to comment on any assignment submission, send me [EMail]. You can revise and upload the file more than once using the *Start New* button on the *Review Submission History* page to open a new *Upload Assignment* page. I only look at the most recent submission. You must upload the file with the correct name from your local computer; you cannot correct the name as you upload it to Blackboard. 6. **Verify that Blackboard has received your submission**: After using the *Submit* button, you will see a page titled *Review Submission History* that will show all your uploaded submissions for this assignment. Each of your submissions is called an *Attempt* on this page. A drop-down list of all your attempts is available. a) Verify that your latest *Attempt* has the correct 16-character, lower-case file name under the *SUBMISSION* heading. b) The one file name must be the *only* thing under the *SUBMISSION* heading. Only the one file name is allowed. c) No *COMMENTS* heading should be visible on the page. Do not enter any comments when you upload an assignment. d) **Save a screen capture** of the *Review Submission History* page on your local computer, showing the single uploaded file name listed under *SUBMISSION*. If you want to claim that you uploaded the file and Blackboard lost it, you will need this screen capture to prove that you actually uploaded the file. (To date, Blackboard has never lost an uploaded file.) You will also see the *Review Submission History* page any time you already have an assignment attempt uploaded and you click on the underlined **assignment05** link. You can use the *Start New* button on this page to re-upload your assignment as many times as you like. You cannot delete an assignment attempt, but you can always upload a new version. I only mark the latest version. 7. Your instructor may also mark files in your directory in your CLS account after the due date. Leave everything there on the CLS. **Do not delete any assignment work from the CLS until after the term is over!** - I do not accept any assignment submissions by EMail. Use only the Blackboard *Attach File*. No word processor documents. Plain Text only. - Use the *exact* file name given above. Upload only one single file of Linux-format plain text, not HTML, not RTF, not MSWord. No fonts, no word-processing. Linux plain text only. - **NO EMAIL, WORD PROCESSOR, PDF, RTF, or HTML DOCUMENTS ACCEPTED.** - No marks are awarded for submitting under the wrong assignment number or for using the wrong file name. Use the exact 16-character, lower-case name given above. - **WARNING:** Some inattentive students don’t read all these words. Don’t make that mistake! Be exact. **READ ALL THE WORDS. OH PLEASE, PLEASE, PLEASE READ ALL THE WORDS!** -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [hyperlink URLs]: indexcgi.cgi#Important_Notes__alphabetical_order_ [Assignments]: indexcgi.cgi#Assignments [Worksheets]: indexcgi.cgi#Worksheets__not_for_hand_in_ [Checking Program]: #checking-marking-and-submitting-your-work [Course Linux Server]: 070_course_linux_server.html [Remote Login]: 110_remote_login.html [List of Commands You Should Know]: 900_unix_command_list.html [Base Directory]: #set-up-the-base-directory-on-the-cls [Shell GLOB Patterns]: 190_glob_patterns.html [Copies of the CST8207 Course Notes]: 070_course_linux_server.html#copies-of-the-cst8207-course-notes [**You can only redirect what you can see.**]: 200_redirection.html#you-can-only-redirect-what-you-can-see [Source Directory]: #the-source-directory [verify your GLOB patterns before using them]: 190_glob_patterns.html#verifying-glob-patterns-before-using-them [Finding Files]: 180_finding_files.html [Assignment #03 HTML]: assignment03.html#finding-files-in-a-maze [Redirection and Pipes]: 200_redirection.html [use find -exec and xargs]: 185_find_and_xargs.html [Text File Line End Differences]: 015_file_transfer.html#text-file-line-end-differences [**Using successive filters in pipes**]: 200_redirection.html#using-successive-filters-in-pipes [Weekly Class Notes]: indexcgi.cgi#Weekly_Class_Notes [File Transfer]: 015_file_transfer.html [EMail]: mailto:idallen@idallen.ca [Plain Text]: assignment05.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/