------------------------------------------------
Linux Shells by Example: Chapter 4 Reading Guide
------------------------------------------------
-IAN! idallen@ncf.ca

Here is a reading guide and some review questions for Chapter 4
"The Streamlined(sic) Editor".

Remember to read the text_errata.txt file (under Notes) and correct all
the mistakes in this Chapter before you read it.

Useful additional notes to read:
    regular_expressions.txt
    regular_expression_questions.txt

The data files for the examples in the textbook are under this directory:
    /home/alleni/cst/cdrom/chap04/

Note: The information in Table 4.3 (p.99) is partially duplicated in
       Table 3.2 and Table 2.1 (p.44).

Warning: Do not confuse the meaning of metacharacters used in regular
         expressions and those used in shell GLOB patterns.  The same
         characters are used; but, they often mean different things.

*)  What to study in Chapter 4:

    Study well all of Sections 4.1 to 4.6.

    YES 4.7.1 p
    YES 4.7.2 d
    YES 4.7.3 s/re/string/
    YES 4.7.4 address ranges using comma /foo/,/bar/p
    YES 4.7.5 multiple -e
    NO  4.7.6 r - skip
    YES 4.7.7 w file
    NO  4.7.8 a - skip
    NO  4.7.9 i - skip
    NO  4.7.10 n - skip
    NO  4.7.11 y/chars/CHARS/ - skip
    YES 4.7.12 q
    NO  4.7.13 h and g - skip
    NO  4.7.14 h and x - skip
    NO  4.8 sed scripting - skip
    NO  4.8.1 sed scripting - skip
    YES 4.8.2 sed Review (all except last example using 'l')

*)  How does sed work?  (p.94)  

*)  Does sed operate on all the lines in a file at once, or only one
    line at a time?  (p.94)

*)  Can sed process a file from last line to first line, or only from
    first line to last line?  In other words, once sed has processed a
    line, can it "back up" and process the line that came before it,
    or must sed always move forward in the file, without backing up?  (p.94)

*)  If you leave the addresssing numbers off of the front of a sed
    expression, does it only operate on one line or does it operate on
    every line read from the file? (p.94,95)

*)  In a sed line address, what does a dollar sign represent? (p.94)
    (It means the same thing in VI addresses, too!) 

*)  True or False: This sed command deletes only line 5 and 9: -e '5,9d'

*)  For this course, know the following sed commands and skip over the
    others (Table 4.1):
    
      d
      p
      q 
      s/re/string/g
      s/re/string/p
      s/re/string/w file
      w file

    Commands that only accept zero or one addresses preceding: q
      Example:  sed -e '/idallen/q' /etc/passwd

    Commands that accept an address range preceding: d p s w
      Example:  sed -n -e '1,/idallen/p' /etc/passwd
      Example:  sed -n -e '/idallen/,$w foo' /etc/passwd
      Example:  sed -n -e '1,10s/idallen/alleni/p' /etc/passwd
      Example:  sed -n -e '/:0:/w root' -e '/:[1-9][0-9]*:/w not' /etc/passwd

    Know these options (Table 4.2) and skip over the others:

     -e commmand     -n    -f

    The man page for sed will help you here.

*)  What command syntax do I issue to tell sed to delete lines that
    contain a regexp pattern? (p.97)

*)  What command syntax do I issue to tell sed to delete lines that do *not*
    contain a regexp pattern? (p.98)

*)  p.97  Curly braces aren't used that often in sed; but, they are
    useful in selecting a range of lines on which you want to do several
    other sed commands which might themselves have address ranges.

    Below is an example of a sed command that uses curly braces.
    Note that the entire expression is a single-quoted string to the shell.
    No shell processing will happen on any of the characters in the string.

    Here is the sed command using newlines to separate commands:
    
    $ sed -n -e '1,10{
       /root/p
       /root/!s/x/*/pg
    }' /etc/passwd

    Here is the same sed command using semicolons to separate commands:

    $ sed -n -e '1,10{ /root/p;  /root/!s/x/*/pg; }' /etc/passwd

    Explanation:

     option -n: suppress default "copy through" output from sed
                - only lines that are explicitly printed will appear
                - if nothing is printed by sed "p" commands, no output

     option -e: the next argument will be the sed command expression
                (it is single quoted here to protect it from the shell)

     1,10{...}: select lines 1 to 10 and do the commands contained in
                the curly braces (which will only operate on lines 1-10).

     /root/p: find lines containing the regexp /root/ and print them
                (but only in lines 1,10, due to curly braces)

     /root/!s/x/*/pg: find lines that do *not* contain the regexp /root/,
                change all "x" to "*" on the whole line, then and
                print the line only if the substitution succeeded
                (but only in lines 1,10, due to curly braces)

*)  EXAMPLE 4.5,4.6: How would you select and display only lines
    containing words (a word is a string of non-blank characters) that
    start with the letters "n" or "s" and end with the letter "t"?
    (Make sure you suppress the default sed output in your answer.)
    Hint: [ ]* matches a string of blank characters.  You need to
    match a string of *non*-blank characters.  Invert the expression.

*)  EXAMPLE 4.7,4.8: Can you specify a list of lines to delete, e.g.

      sed -e '1,3,5,7d' datafile

    (No.  You can't.  The "d" command only accepts an address range,
    i.e. start address and end address, not a list of addresses.
    How would you do the above using sed?  Hint: Multiple -e options.)

*)  The following command prints the last line twice, because the default
    action for sed is to output every line it reads in:

      sed -e '$p' datafile

    Why doesn't the following command print the last line just once?

      sed -e '$d' datafile

*)  Linux Tools Lab 2 Questions (p.132)

    1. skip
    2. use "-n" - it works everywhere
    3. use the man page
    4. skip
    5. do this
    6. do this
    7. do this
    8. do this
    9. do this
   10. do this
   11. do this (what regexp matches an entire line of characters?)
   12. do this
   13. do this (a blank line contains only zero or more spaces)
   14. skip
   15. skip
   16. skip

========================
sed substitution summary  (p.103-105)
========================

    Substitution command formats:

        s/regexp/chars/         # just like in VI - first match only
        s/regexp/chars/g        # just like in VI - globally on whole line

        s/regexp/chars/p        # print line only if substitution works
        s/regexp/chars/w file   # write line only if substitution works

        s/regexp/chars/gpw file # combine all three only if substitution works!

    In sed, the substitution can be followed by a few letters that
    indicate additional commands to be performed on this line, only if the
    substitution succeeds.  If the substitution fails, nothing happens.

==========================
Practice questions for sed
==========================

*)  Do all the practice questions that use VI, using "sed" instead.
    (See the chapter02guide.txt file for many VI practice questions.)
    Remember: Never use shell redirection to redirect output into any
    file used as input on a command line - the shell will erase the file.

*)  Examine the password file and do the following:
      1. copy every line containing :0: into a file named "roots"
      2. copy every line containing allen into a file named "ians"
      3. copy every line containing 0000 into a file named "zeroes"
    Long way: Use three grep commands and redirect output three times.
    Short way: Use one sed command line with three -e commands and write
    all three files at once using "w" commands (see Section 4.7.7 p.109).

*)  Implement lab08exercise.txt using a series of "sed" command lines.

    You are to decode a file of text by writing a script that uses the
    "sed" editor to make a correct series of deletions, substitutions,
    and replacements, in a given order.

    Step 1: Run this "doright" program and save the output in a file named
    "right.txt" in your account somewhere:

        $ ~alleni/cst/lab08exercise/doright >right.txt

    The file "right.txt" should be 247 lines.

    Step 2: Construct a shell script containing a series of individual "sed"
    command lines to perform the following substitution edits on the resulting
    "right.txt" file.  Your file of command lines will look similar to this:

        #!/bin/bash -u
        ... shell script label and header goes here ...
        # start with a copy of the data file to be modified
        cp right.txt file1  || exit 1
        sed -e '...your command...' file1 >file2  || exit 1
        sed -e '...your command...' file2 >file3  || exit 1
        ... repeat similar lines until ...
        sed -e '...your command...' file8 >file9  || exit 1
        sed -e '...your command...' file9 >file10 || exit 1
        # display the final result on standard output
        cat file10

    Start with one sed command and gradually add the others until you have
    a working file that performs all of the editing of lab08exercise.txt
    correctly.

    Unless you are told otherwise, globally change *all* occurrences on each
    line, not just the first occurrence.  (Use the "g" global substitution
    suffix shown in Table 4.1 (p.97) and Examples 4.11 and 4.14.)

    Step 3: Verify your work using the "diff" command.  Compare your edited
    file with the following file and ensure that there are no differences:

        $ diff file10 ~alleni/cst/lab08exercise/right-to-read.txt

    If your file is correctly edited, there will be no output from "diff".
    Any differences will be sent to your screen.

*)  Do these give the same answer?
      1. How many lines contain a character that is not the letter 'a'?
      2. How many lines do not contain the letter 'a'?
    If they differ, give an example of a line that one matches but the
    other does not.  How long is the shortest line output by each command?

*)  Do these command lines always give the same output?
      1. grep '[^a]'
      2. grep -v 'a'
    If they differ, give an example of a line that one matches but the
    other does not.  How long is the shortest line output by each command?

*)  Do these command lines always give the same output?
      1. grep '[^d][^o][^g]'
      2. grep -v 'dog'
    If they differ, give an example of a line that one matches but the
    other does not.  How long is the shortest line output by each command?

*)  Do these command lines always give the same output?
      1. sed -n -e '/[^a]/p'
      2. sed -n -e '/a/!p'
    If they differ, give an example of a line that one matches but the
    other does not.  How long is the shortest line output by each command?