------------------------------------------------ Linux Shells by Example: Chapter 3 Reading Guide ------------------------------------------------ -IAN! idallen@ncf.ca Here is a reading guide and some review questions for Chapter 3 "The GREP Family". Remember to read the text_errata.txt file (under Notes) and correct all the mistakes in this Chapter before you read it. Useful additional notes to read: regular_expressions.txt regular_expression_questions.txt The data files for the examples in the textbook are under this directory: /home/alleni/cst/cdrom/chap03/ Note: The information in Table 3.2 is partially duplicated in Table 2.1 (p.44) and Table 4.3 (p.99). Warning: Do not confuse the meaning of metacharacters used in regular expressions and those used in shell GLOB patterns. The same characters are used; but, they often mean different things. *) What is the syntax of the "grep" command? (p.56) *) Are forward slashes used in the pattern part of the grep command line? *) Can you use two patterns as the first argument to grep? (p.56) *) What happens if you don't give grep any file names? (p.56) *) Ignore the first part of Section 3.1.3 and read the file "regular_expressions.txt" (under Notes) instead. (p.57) *) Learn to use the Basic and Extended regular expression characters listed in the file "regular_expressions.txt". You will need to know how to use all the Basic and Extended metacharacters in this file. List the Basic regular expression characters and their meanings. List the Extended regular expression characters and their meanings. *) POSIX named character classes are not supported by all programs that handle regular expressions. Experiment before you use them. *) Why is the POSIX character class [:alnum:] not identical to the character class A-Za-z0-9 ? (p.59) *) For North American ASCII, what is the one character difference between the POSIX character class [:alnum:] and the VI or Gnu Grep character class \w ? (See the text_errata.txt file, p.58.) *) What causes each of the three exit statuses to be returned from the grep command? (p.61) *) Study well all the examples in this chapter (p.61-73). Try them! The data files for the examples in the textbook are on the CDROM and are also under this directory in the Linux Lab: /home/alleni/cst/cdrom/chap03/ *) Skip over section 3.2.2 (p.73) - don't try to memorize which versions of which commands do/don't handle the "oddball" regular expression metacharacters and back-references. *) How does "fgrep" differ from both "grep" and "fgrep"? (p.76) *) True or False: because fgrep does not recognize any regular expression metacharacters, no quoting of metacharacters is necessary on the fgrep command line, e.g. $ fgrep *best* file *) Does the pattern argument of "rgrep" (recursive grep) support the same features as "grep", "egrep", or "fgrep"? (p.77) *) Know the meaning of these options to the grep family (from Table 3.6 on p.78): -c -i -l -n -v -w *) Skim Section 3.6.1 (extended options for Gnu Grep). (p.82) *) Study well sections 3.6.2 and 3.6.3. (p.88-89) *) Do the exercise on p.90-91. The data files for the examples in the textbook are on the CDROM and are also under this directory in the Linux Lab: /home/alleni/cst/cdrom/chap03/ -------------------------------------- More questions on Regular Expressions: -------------------------------------- *) In the expression "abc*", does the "*" repeat the entire word "abc" zero or more times, or does it only repeat the letter "c" zero or more times? *) In the extended regular expression "(abc)+", does the "+" repeat the closing parenthesis one or more times, or does it repeat the entire parenthesized expression one or more times (e.g. abcabcabc)? *) How do these two (extended) regular expressions differ? $ egrep '(b|B)(e|E)(e|E)(r|R)' file $ egrep '[bB][eE][eE][rR]' file Which is easier to understand? Do these following expressions match exactly the same lines as the above two expressions? $ egrep 'beer|BEER' file $ egrep '[beer][BEER]' file $ egrep '[beer]|[BEER]' file *) Are these following extended regular expression lines exactly equivalent? $ egrep 'a(b|c)d' file $ egrep '(ab|ac)d' file $ egrep 'a(bd|cd)' file $ egrep 'abd|acd' file Hint: Yes. Concatenation and alternation of regular expressions obeys rules similar to multiplication and addition of numbers in arithmetic: ARITHMETIC: a*(b+c)*d = (a*b+a*c)*d = a*b*d+a*c*d REGEXP: a(b|c)d = (ab|ac)d = abd|acd Think of concatenation as "multiply" and alternation as "add". *) Are these following lines exactly equivalent? $ egrep 'labell?ed' file $ egrep 'label(l|)ed' file Can the "?" metacharacter always be replaced by a parenthesized expression using "|" with one empty alternataive? Hint: Yes. You never need to use "?" in an extended regular expression - it just makes some extended regular expressions shorter. *) Are these following lines exactly equivalent? $ egrep '0+' file $ egrep '00*' file Can the "+" metacharacter always be replaced by repeating the pattern and using "*" instead? Hint: Yes. You never need to use "+" - it just makes some extended regular expressions shorter (sometimes a *lot* shorter!). *) Are these following lines exactly equivalent? $ egrep 'a*b*c*' file $ egrep '[abc]*' file $ egrep '(abc)*' file Hint: No. Give a line that is matched by one but not the other. *) The following regular expressions give identical results when used by grep to select lines: $ grep '^a' /etc/passwd $ grep '^a.*' /etc/passwd $ grep '^a.*$' /etc/passwd Why do they give the same results? Which one is fastest? Don't write complex regular expressions when simple ones will do. *) The following regular expressions give identical results when used by grep to select lines: $ grep 'a$' /etc/passwd $ grep '.*a$' /etc/passwd $ grep '^.*a$' /etc/passwd Why do they give the same results? Which one is fastest? Don't write complex regular expressions when simple ones will do. *) Look for lines in the password file that contain four or more adjacent zeroes. Use an option to display just the count of lines, not the lines themselves. (Do not use "wc"; use an option to "grep".) *) Use an option to display just the file names of the header files in the /usr/include/ directory that contain the string "stdin". (Header files end in the two characters ".h".) Don't display the matching lines, just the names of the files containing a match. (Answer: about 16 files, including /usr/include/stdio.h .) *) Repeat the above question; but, use an option to grep that will do a case-insensitive match that will find "stdin", "STDIN", "sTdIn", etc. How does the list of files output differ from the previous question? (Hint: put both lists of files into temporary files and run "diff" to see the differences.) *) Use an option to display the count of words in /usr/share/dict/words that both begin and end with the lower-case letter 'a'. (Answer: 39 words) *) Use an option to display the count of words in /usr/share/dict/words that both begin and end with the lower-case letter 'a' and also contain a third letter 'a' somewhere in the middle. (Answer: 10 words.) *) Repeat the above question, but add an option to do a case-insensitive match. (Answer: 39 words.) *) Use options to display the count of words in /usr/share/dict/words that both begin and end with the letter 'a' and also contain a third and a fourth letter 'a' somewhere in the middle. Do a case-sensitive match. (Answer: 0 words.) Do a case-insensitive sensitive match. (Answer: 5 words.) *) Use grep to select words from the file /usr/share/dict/words that have all the vowels in ascending order, "a" before "e" before "i" before "o" before "u", with any number of other characters in between. (Answer: two words.) *) Use grep to select words from the file /usr/share/dict/words that have all the individual letters in the name "elvis" in the same order, "e" before "l" before "v" before "i" before "s", with any number of other characters in between the letters. (Answer: 13 words. The longest one is "relativistically".) *) Find which header files in the /usr/include/ directory contain the string "FILE". (Header files end in the two characters ".h".) Don't display the matching lines, just the names of the files containing a match. (Answer: about 82 files, including /usr/include/stdio.h .) *) Repeat the above question, but use an option to grep to match only the *word* "FILE", not the string FILE. (Answer: about 42 files.) *) Repeat the above question, but match the word "printf". (Answer: about 11 files, including /usr/include/error.h .) *) Do these give the same answer? 1. How many lines contain a character that is not the letter 'a'? 2. How many lines do not contain the letter 'a'? If they differ, give an example of a line that one matches but the other does not. How long is the shortest line output by each command? *) Do these command lines always give the same output? 1. grep '[^a]' 2. grep -v 'a' If they differ, give an example of a line that one matches but the other does not. How long is the shortest line output by each command? *) Do these command lines always give the same output? 1. grep '[^d][^o][^g]' 2. grep -v 'dog' If they differ, give an example of a line that one matches but the other does not. How long is the shortest line output by each command? *) How many lines in /usr/include/stdio.h do *not* contain any characters? (Note: A line with "no characters" still ends in a newline!) You can answer this two ways: 1. How many lines have the end of the line right after the start? 2. If you exclude all lines that contain any single character, how many lines are left over (count the non-matching lines)? Derive grep expressions to produce both answers. One expression will probably use an option to grep to "invert" the match and select only non-matching lines. (Answer: 132 lines) *) How many lines in /usr/include/stdio.h do *not* contain any blanks? You can answer this two ways: 1. How many lines contain only zero or more non-blank characters? 2. If you exclude all lines that contain a blank character, how many lines are left over (count the non-matching lines)? Derive grep expressions to produce both answers. One expression will probably use an option to grep to "invert" the match and select only non-matching lines. (Answer: 185 lines) *) How many lines in /usr/include/stdio.h do *not* contain any upper- or lower-case letters? You can answer this two ways: 1. How many lines contain only zero or more non-letter characters? 2. If you exclude all lines that contain a letter, how many lines are left over (count the non-matching lines)? Derive grep expressions to produce both answers. One expression will probably use an option to grep to "invert" the match and select only non-matching lines. (Answer: 134) (Time-saver: use a case-insensitive match.) *) The directory /usr/include/ is where C language keeps its standard header files on Unix, e.g. #include refers to the file "/usr/include/stdio.h". The file errno.h in the above directory contains the #define statements for Unix errors. Find the #define statement that defines the Unix "EPERM" error ("Operation not permitted"). Problem: Unfortunately, include files often contain other #include directives that include other files (that themselves contain #include directives of other files...), so you often can't find what you want by doing: grep -w EPERM /usr/include/errno.h # no results! File errno.h includes other include files, and one of those other include files must contain the actual EPERM definition. Solution: Use grep to first find the "include" lines in /usr/include/errno.h, then use grep to look for EPERM in each of those included files. If you don't find the definition there, look for more "include" lines in each of those included files and repeat the process, until you finally find the actual file containing the EPERM definition. (Manually follow the chain of #include directives.) What actual file contains the definition of EPERM? What is the value of EPERM? Use a grep command line to count how many #define statements are in this file. Modify the grep expression to count *only* the define statements that define error numbers. (Count only lines that have #define followed by any number of any character followed by a number preceded by a whitespace character [blank or tab]. You can use the POSIX bracketed [:space:] character class here [p.59-60].) (Answer: 122 lines) *) Write a small script to display just the line number of the first line on which a pattern is found in a file. Use this syntax: $0 pattern filename Examples: $ ./myline 'struct' /usr/include/stdio.h 45 $ ./myline 'errlist' /usr/include/stdio.h 554 Hints: Use grep to find the pattern in the file and use a grep option to output the line number along with the line. Use a common Unix command to select just the *first* line of grep output. Split the line number off from the beginning of this line and display just the number. (See the data_mining.txt file under Notes for techniques of splitting lines to get at fields. [Hint hint: use awk with the '-F:' option!]) Use pipes to connect all your commands - do not save output in temporary files! Your final script will probably contain three Unix commands in the pipeline, starting with grep. Validate your inputs before you use them in the script. (Check for missing arguments; make sure the filename is readable, etc.)