============================================
Unix Shell I/O Redirection (including Pipes)
============================================
-Ian! D. Allen - idallen@idallen.ca - www.idallen.com

Contents of this file:

 1. Output Redirection - Standard Output and Standard Error
    1.1 Output redirection into files
    1.1.1 Standard Output ("stdout") and Standard Error ("stderr")
    1.2 Throwing away output via /dev/null
    1.3 Output redirection mistakes to avoid
 2. Input Redirection - Standard Input
    2.1 Not all commands read standard input
    2.2 shell redirection of standard input
 3. Redirection into programs (Pipes)
    3.1. Rules for Pipes
    3.2. Using commands as Filters
    3.3 Examples of pipes
    3.4 Misuse of redirection into programs
 4. Unique STDIN and STDOUT
 5. tr - a command that does not accept pathnames
 6. Do not redirect full-screen programs such as VIM
 7. Redirect *only* stderr into a pipe (ADVANCED!)

In the examples below, I use the meta-character ";" to put multiple
commands on one shell command line:   $ date ; who ; echo hi ; pwd
These behave as if you had typed each of them on separate lines.

==========================================================
1. Output Redirection - Standard Output and Standard Error
==========================================================

In output redirection, the shell (not the command) diverts (redirects)
most command output that would normally appear on the screen to some
other place, either into the input of another command (using a pipe
meta-character '|') or into a file (using a file redirect meta-character '>').

* Redirection is done by the shell, first, before finding the command;
  the shell has no idea if the command exists or will produce any output.

* You can only redirect the output that you can see.  If there is no
  visible output without redirection, adding redirection won't create any.

* Redirection can only go to *one* place.  You can't use multiple
  redirections to send output to multiple places. (See the "tee" command.)

* By default, error messages (called "standard error" or "stderr") are
  not redirected; only "normal output" (called "standard output" or "stdout")
  is redirected (but you can also redirect stderr with more syntax).

---------------------------------
1.1 Output redirection into files
---------------------------------

The shell meta-character '>' signals that the next word on the command
line is an output file (not a program) that should be created or truncated
(set to empty) and made ready to receive the standard output of a command:

    $ date >outfile

The spaces between '>' and the file name are optional:

    $ date   >   outfile

The file is always created or truncated to empty *before* the shell
finds and runs the command, unless you double up the character like this:

    $ date >> outfile      # output is *appended* to outfile; no truncation

An example of redirection of output into a file:

    $ echo hello                           # output goes to terminal (screen)
    hello

    $ echo hello >file                     # erase file; send output to file
    $ cat file                             # display what is in the file
    hello

    $ echo there >>file                    # append output to end of file
    $ cat file                             # display what is in the file
    hello
    there

It is the shell that creates or truncates the file and sets up the
redirection, not the command being redirected.  The command knows
nothing about the redirection - the redirection syntax is removed from
the command line before the command is found and executed:

    $ echo one two three                   # echo has three arguments
    one two three
    $ echo one two three >out              # echo still has three arguments
    $ cat out
    one two three

Shells handle redirection before they go looking for the command name
to run.  Indeed, you can have redirection even if the command is not
found or if there is no command at all:

    $ nosuchcommandxxx >out                # file "out" is created empty
    sh: nosuchcommandxxx: command not found
    $ wc out
     0 0 0 out                             # shell created an empty file

    $ >out                                 # file "out" is created empty
    $ wc out
     0 0 0 out                             # shell created an empty file

The shell creates or truncates the file "out" empty, and then it tries to
find and run the nonexistent command and fails.  The empty file remains.
Any existing file will have its contents removed:

    $ echo hello >out ; cat out
    hello
    $ nosuchcommandxxx >out
    sh: nosuchcommandxxx: command not found
    $ wc out
     0 0 0 out                             # shell truncated the file

Redirection is done by the shell *before* the command is run:

    $ mkdir empty
    $ cd empty
    $ ls -l
    total 0                                 # no files found

    $ ls -l >out                            # shell creates "out" first
    $ cat out                               # display output
    total 0
    -rw-r--r--  1 idallen idallen 0 Sep 21 06:02 out

    $ date >out
    $ ls -l
    total 4
    -rw-r--r--  1 idallen idallen 29 Sep 21 06:04 out

    $ ls -l >out                            # shell empties "out" first
    $ cat out                               # display output
    total 0
    -rw-r--r--  1 idallen idallen 0 Sep 21 06:06 out

The shell creates or empties the file "out" before it runs the "ls" command.

Explain this sequence of commands:

    $ mkdir empty
    $ cd empty
    $ cp a b
    cp: cannot stat `a': No such file or directory

    $ cp a b >a
    $                # why is there no error message from cp this time?

Explain this sequence of commands:

    $ date
    Wed Feb  8 03:01:11 EST 2012

    $ date >a
    $ cat a
    Wed Feb  8 03:01:21 EST 2012

    $ cp a b
    $ cat b
    Wed Feb  8 03:01:21 EST 2012

    $ cp a b >a
    $ cat b
    $                               # why is file b empty?

Shells don't care where on or in the command line you do the file
redirection.  The file redirection is done by the shell, then the
redirection syntax is removed from the command line before the command
is called.  The command actually being run doesn't see any part of the
redirection syntax; the number of arguments is not affected.

All the command lines below are equivalent to the shell; in every case
the echo command sees only three arguments and the three command line
arguments "hi", "there", and "mom" are all redirected into "file":

    $ echo hi there mom >file                   # echo has three arguments
    $ echo hi there >file mom                   # echo has three arguments
    $ echo hi >file there mom                   # echo has three arguments
    $ echo >file hi there mom                   # echo has three arguments
    $ >file echo hi there mom                   # echo has three arguments

The redirection syntax is removed by the shell before the command runs;
so, redirection syntax is never counted as arguments to a command.
Examples:

    $ echo hello there
      - shell calls "echo" with two arguments ==> echo(hello,there)
      - "echo" echoes two arguments on standard output
      - output appears in default location (standard output is your screen)

    $ echo hello there >file
      - shell creates "file" and diverts standard output into it
      - shell removes the syntax ">file" from the command line
      - shell calls "echo" with two arguments ==> echo(hello,there)
        (note NO CHANGE in arguments to "echo" from the previous example)
      - "echo" echoes two arguments on standard output
      - standard output is captured in output "file", NOT on your screen

    $ >file echo hello there
      - this is identical to the above example (the shell does not care
        where in the command line you put the redirection)
      - standard output is captured in output "file", NOT on your screen
      - you can put the redirection anywhere in the command line!

Redirection is done by the shell, first, even before finding the command:
  - the shell creates a new empty file or truncates (empties) an existing file
  - after doing the redirection, and removing the syntax from the command
    line, the shell finds and executes the command (if any):

Explain this sequence of commands:

    $ rm
    rm: missing operand

    $ touch file
    $ rm >file
    rm: missing operand               # why doesn't rm remove "file"?

    $ rm nosuchfile
    rm: cannot remove `nosuchfile': No such file or directory

    $ rm nosuchfile >nosuchfile
    $                                 # why is there no rm error message here?

You can only redirect the output that you can see!  *Only* what you see!
  - Redirection does not invent new output!  *ONLY WHAT YOU SEE!*
  - If you don't see any output from a command, adding redirection will
    simply have the shell create an empty file (no output):

  Example:  $ cp /etc/passwd x        # no output on standard output
            $ cp /etc/passwd x >out   # file "out" is created empty

            $ cd /tmp                 # no output on standard output
            $ cd /tmp >out            # file "out" is created empty

            $ touch x ; rm x          # no output from rm on standard output
            $ touch x ; rm x >out     # file "out" is created empty

Redirection can only go to *one* place:
  - the right-most file redirection wins (others create empty files)
  Example:  $ date >a >b >c     # output goes into file c; a and b are empty

Redirection to a file wins over redirection into a pipe:
  - see the following section on redirection into programs using "|" pipes
  - if you redirect into a file and a pipe, the pipe gets nothing
  Example:  $ date >a | cat     # output goes into file "a"; cat shows nothing

The redirection output file is emptied (truncated) unless you append via >>
  - the file is emptied *before* the shell looks for and runs the command
  - don't use output redirection files as input to the same command
  Bad Example:  $ sort a >a     # WRONG!  file "a" is truncated to be empty

1.1.1 Standard Output ("stdout") and Standard Error ("stderr")
--------------------------------------------------------------

Most commands have two separate output "streams", numbered 1 and 2:
  1. stdout - unit 1 - Standard Output (normal output)
  2. stderr - unit 2 - Standard Error Output (error and warning messages)

The normal (non-error) "unit 1" outputs on your screen come from the
"standard output" ("stdout") of the command.  Stdout is the output
from "printf" and "cout" statements in C and C++ programs, and from
"System.print" and "System.println" in Java.  This is the expected,
usual output of a command.

The error message "unit 2" outputs on your screen come from the "standard
error output" ("stderr") of the command.  Stderr is the output from
"fprintf(stderr" and "cerr" statements in C and C++ programs, and from
"System.err.print" and "System.err.println" in Java.  Programs print on
this output only for error messages.

The stdout and stderr mix together on your terminal screen.  They look
the same on the screen, so you can't tell by looking at your screen
what comes out of a program on stdout and what comes out of a program
on stderr.

To show a simple example of stdout and stderr both appearing on your
screen, use the "ls" command and give it one file name that exists
and one name that does not exist (and thus causes an error message to
be displayed):

    $ ls -l /etc/passwd nosuchfile
    ls: nosuchfile: No such file or directory             # standard error
    -rw-r--r-- 1 root root 2209 Jan 19 20:39 /etc/passwd  # standard output

The stderr (error messages) output often appears first, before stdout,
due to internal I/O buffers used by commands for stdout.

Normally, both stdout and stderr appear together on your terminal.
The shell can redirect the two outputs individually or together into
files or into other programs.  The default type of output redirection
(whether redirecting to files or to programs using pipes) redirects *only*
standard output and lets standard error go, untouched, to your terminal.

Below are some examples all using the shell file redirect meta-character '>':

    $ ls /etc/passwd nosuchfile                 # no redirection used
    ls: nosuchfile: No such file or directory   # this on screen from stderr
    /etc/passwd                                 # this on screen from stdout

    $ ls /etc/passwd nosuchfile >out            # shell redirects only stdout
    ls: nosuchfile: No such file or directory   # only stderr appears on screen

    $ cat out
    /etc/passwd

You can redirect stdout and stderr separately into files using unit
numbers before the '>' meta-character: - stdout is always unit 1 and stderr
is always unit 2 (stdin is unit 0) - put the unit number immediately
(no blank) before the '>' meta-character.

  ">foo" (no preceding unit number) is a shell shorthand for "1>foo"
  ">foo" redirects the default unit 1 (stdout) only, not stderr
  ">foo" and "1>foo" are identical

You can also tell the shell to redirect standard error, unit 2, to a file:

    $ ls /etc/passwd nosuchfile 2>errors        # shell redirects only stderr
    /etc/passwd                                 # only stdout appears on screen

    $ cat errors
    ls: nosuchfile: No such file or directory

You can redirect stdout into one file and stderr into another file:

    $ ls /etc/passwd nosuchfile >out 2>errors   # shell redirects each one
    $                                           # nothing appears on screen

    $ cat out
    /etc/passwd

    $ cat errors
    ls: nosuchfile: No such file or directory

You needed a special syntax "2>&1" to redirect both stdout and stderr
safely together into a single file in the Bourne shells.  Read the syntax
"2>&1" as "send unit 2 to the same place as unit 1":

    $ ls /etc/passwd nosuchfile >both 2>&1      # redirect both into same file
    $                                           # nothing appears on screen

    $ cat both
    ls: nosuchfile: No such file or directory
    /etc/passwd

The order of >both and 2>&1 on the command line matters!

The ">both" stdout redirect must come first (to the left of) stderr "2>&1"
because you must set where stdout (unit 1) goes *before* you send stderr
(unit 2) to go "to the same place as unit 1".  Don't reverse these!

You must use the special syntax ">both 2>&1" to put both stdout and stderr
into the same file.  Don't use the following, which is not the same:

    $ ls /etc/passwd nosuchfile >wrong 2>wrong  # WRONG! DO NOT DO THIS!

    $ cat wrong
    /etc/passwd
    ccess nosuchfile: No such file or directory

This above WRONG example will cause stderr and stdout to overwrite each
other and the result is a mangled output file; don't do this.

The modern Bourne shells now have a special shorter syntax for redirecting
both stdout and stderr into the same output file:

    $ ls /etc/passwd nosuchfile &>both          # redirect both into same file
    $                                           # nothing appears on screen

    $ cat both
    ls: nosuchfile: No such file or directory
    /etc/passwd

You can now use either "&>both" or ">both 2>&1", but only the latter works
in every version of the Bourne shell (back to the 1960's!).  When writing
shell scripts, use the ">both 2>&1" version for maximum portability.

Output Redirection Summary:
--------------------------

Redirection is done by the shell.  Things happen in this order:

  1. First: All redirection (and file truncation) is done by the shell.
     The shell removes all the redirection syntax from the command line.
     This redirection and truncation happens even if no command executes.
     The command will have no idea that its output is being redirected.

  2. Second: The command (if any) executes and may produce output.
     The shell executes the command *after* doing all the redirection.
     (If the redirection fails, the shell does not run any command.)

  3. Third: The output from the command (if any) happens, and it goes into
     the indicated redirection output file.  This happens last.
     If the command produces no output, the output file will be empty.
     Adding redirection never creates output.

-----------------------------------------
1.2. Throwing away output using /dev/null
-----------------------------------------

There is a special file on every Unix system, into which you can
redirect output that you don't want to keep or see:   /dev/null

The following command generates some error output we don't like to see:

  $ cat * >/tmp/out
  cat: course_outlines: Is a directory           # errors print on STDERR
  cat: jclnotes: Is a directory                  # errors print on STDERR
  cat: labs: Is a directory                      # errors print on STDERR
  cat: notes: Is a directory                     # errors print on STDERR

We can throw away the errors (stderr, unit 2) into /dev/null:

  $ cat * >/tmp/out 2>/dev/null

The file /dev/null never fills up; it just eats output.  When used as
an input pathname, it always appears to be empty:

  $ wc /dev/null
    0 0 0 /dev/null

System Administrators: Do not get in the habit of throwing away all
the error output of commands!  You will also throw away legitimate
error messages and nobody will know that these commands are failing.

----------------------------------------
1.3 Output redirection mistakes to avoid
----------------------------------------

First, here is a summary of how correct use of redirection works:

  $ date >out
  1. shell first truncates the file "out" - file is now empty
  2. shell redirects standard output of command "date" into file "out"
  3. shell removes the ">out" syntax from the command line
  4. shell finds and runs the "date" command
  5. standard output of the date command goes to standard output (1 line)
   - standard output has been redirected by the shell to appear in file "out"

  Result: file "out" contains one line of output from "date"

Unix Big Redirection Mistake #1
-------------------------------

Do not use a redirection file as both output and input to a program or
a pipeline!  The sort command is used as the example program below -
anything that reads files and produces output is at risk:

  $ sort a >a    # WRONG! Redirection output file is used as sort input file!

  1. shell first truncates the file "a" - file is now empty
     - original contents of "a" are lost - truncated - GONE! - before
       the shell even goes looking for the "sort" command to run!
  2. shell redirects standard output of sort into the empty file "a"
  3. shell finds and runs the "sort" command with one file name argument "a"
     ==> i.e. sort(a)
  4. sort command opens the empty argument file "a" for reading
  5. standard output has been redirected by the shell to appear in file "a"
     - sorting an empty file produces no output; file "a" remains empty

  Result: File "a" is always empty, no matter what was in it before.

  RIGHT WAY (use two commands):  $ sort a >tmp  &&  mv tmp a
  RIGHT WAY (use special sort output option):  $ sort -o a a

Here is another incorrect example using the same output file as input:

  $ date >out
  $ wc out >out    # WRONG! Redirection output file is used as sort input file!
  1. shell first truncates the file "out" - file is now empty
     - original contents of "out" are lost - truncated - GONE! - before
       the shell even goes looking for the "wc" command to run!
  2. shell redirects standard output of wc into the empty file "a"
  3. shell finds and runs the "wc" command with one file name argument "out"
     ==> i.e. wc(out)
  4. wc command opens the empty argument file "out" for reading
  5. standard output has been redirected by the shell to appear in file "out"
     - counting an empty file produces 1 line "0 0 0 out" on standard output

  Result: The one line of wc output "0 0 0 out" is placed into file "out".
  File "out" now has one line, a word count of an empty file. The
  original contents of "out" were truncated away by the shell in step
  1 and never used.

  RIGHT WAY (use two commands):  $ wc out >tmp  &&  mv tmp out

Other incorrect redirection examples that DO NOT WORK:

    $ head file >file           # ALWAYS creates an EMPTY FILE
    $ tail file >file           # ALWAYS creates an EMPTY FILE
    $ uniq file >file           # ALWAYS creates an EMPTY FILE
    $ cat  file >file           # ALWAYS creates an EMPTY FILE
    $ grep 'foo' file >file     # ALWAYS creates an EMPTY FILE
    $ sum  file >file           # ALWAYS checksums an EMPTY FILE
    ...etc...

Never use the same file name for both input and output - the shell will
truncate the file before the command reads it.

Unix Big Redirection Mistake #2
-------------------------------

Do not use a wildcard/glob file pattern that picks up the name of the
output redirection file and causes it to become an unintended input file.

Bourne shells (e.g. BASH) will do the GLOB wildcard expansion *before*
the redirection file creation.  C Shells do the redirection file creation
first, which can be more of an unexpected problem.

The nl (number lines) program is used as the example program here -
anything that reads files and produces output is at risk:

    $ cp /etc/passwd bar   # create a file larger than a disk block
    $ touch foo
    $ nl * >foo  # WRONG! GLOB * input files match redirection output file!
    ^C           # interrupt this command immediately before your disk is full!
    $ ls -l
    -rw-rw-r--  1 idallen idallen    194172 Feb 15 05:19 bar
    -rw-r--r--  1 idallen idallen 289808384 Feb 16 05:20 foo

Here is what happens to make the output file "foo" grow forever:

    1. Shell expands "*" to match all the pathnames, that is "bar" and "foo".
    2. Shell truncates >foo and gets it ready to receive stdout of command.
    3. nl opens first file "bar" and sends the output to stdout (into foo).
    4. nl opens next file "foo" and starts reading from the top of the file,
       writing output to the bottom of the file.  This never finishes, and
       the file "foo" grows until all the disk space is used.

Result: an infinite loop that fills up the disk drive as "foo"
gets bigger and bigger.

Fix #1:  Use a hidden file name that GLOB doesn't match:

    $ nl * >.z
    - uses a hidden file name not matched by the shell "*" wildcard
    - the nl command is not given ".z" as an argument, so no loop occurs

Fix #2 (two ways):  Use a file in some other directory:

    $ nl * >../z
    $ nl * >/tmp/z
    - redirect output into a file that is not in the current directory
      so that it is not read by the nl command and no loop occurs

=====================================
2. Input Redirection - Standard Input
=====================================

Many Unix commands read input from files, if file pathnames are given
on the command line.  If *no* file names are given, these commands
usually read from standard input ("stdin"), which is usually connected
to your keyboard.  (You can send EOF to get the command to stop reading.)

Example of the cat command reading from a file, then reading stdin when
no files are supplied:

    $ cat /etc/passwd      # cat reads content from the file /etc/passwd
    [...many lines print here...]
    $

    $ cat                  # no files; cat reads standard input (your keyboard)
    you type lines here
    ^D                     # you signal keyboard EOF by typing ^D (CTRL-D)
    you type lines here    # this is the output from cat
    $

Other examples of commands that may read from pathnames or from standard input:

    less, more, cat, head, tail, sort, wc, grep, nl, uniq, etc.

Commands such as the above may read standard input.  They will read your
keyboard *only* if there are *no* pathnames to read on the command line,
and *no* input redirection is involved:

  $ wc foo        # wc opens and reads file "foo"; wc completely ignores stdin
  $ wc            # wc opens and reads standard input = your keyboard

  $ cat foo       # cat opens and reads file "foo"; cat completely ignores stdin
  $ cat           # cat opens and reads standard input = your keyboard

  $ tail foo      # tail opens and reads "foo"; tail completely ignores stdin
  $ tail          # tail opens and reads standard input = your keyboard

  [...etc. for all commands that can read from stdin...]

To tell a command to stop reading your keyboard, send it an EOF
(End-Of-File) indication, usually by typing ^D (Control-D).  If you
interrupt the command (e.g. by typing ^C), you may kill the command and
the command may not produce any output at all.

2.1 Not all commands read standard input
----------------------------------------

Not all commands read from standard input, because not all commands read
data from files supplied on the command line.  Examples of common Unix
commands that don't read any data from files or standard input:

     ls, date, who, pwd, echo, cd, hostname, ps, etc. # NEVER READ STDIN

All the above commands have in common the fact that they *never* open
any files for reading on the command line.  If a command never reads any
data from any files, it will never read from your keyboard, and it will
never read any data from standard input.

The Unix copy command "cp" obviously reads content from files, but
it never reads file data from standard input because, as written, it
always has to have both a source and destination pathname argument.
The cp command must always have an input file name.  It never reads stdin.

2.2 shell redirection of standard input
---------------------------------------

The shell meta-character '<' signals that the next word on the command
line is an *input* file (not a program) that should be made available
to a command on standard input.

Using the shell meta-character '<', you can tell the shell to use
input redirection to change from where standard input comes, so that it
doesn't come from your keyboard but instead comes from an input file.

You can only use standard input redirection on a command that would
otherwise read your keyboard.  If the command doesn't read your keyboard
(standard input) *without* the redirection, adding the redirection
does nothing and is ignored.  The redirection only works if, without
redirection, the command *would* read your keyboard.

If (and only if!) a command reads from standard input, the redirected
standard input will cause the program to read from whatever the shell
attaches to standard input.  Here are examples using the shell to attach
files to commands that are all reading standard input:

  $ cat food              # reads from file "food"
  $ cat                   # reads from stdin (from your keyboard)
  $ cat <food             # reads from stdin (now from the file "food")

  $ head food             # reads from file "food"
  $ head                  # reads from stdin (from your keyboard)
  $ head <food            # reads from stdin (now from the file "food")

  $ sort food             # reads from file "food"
  $ sort                  # reads from stdin (from your keyboard)
  $ sort <food            # reads from stdin (now from the file "food")

  [...etc. for all commands that can read from stdin...]

The shell does not know which commands will actually read input from
standard input; you can attach a file on standard input to any command.
A command that ignores standard input will ignore the attached file.

If a command is not reading from standard input, redirecting input into
the command will be ignored and do nothing.  The shell cannot force a
command to read from standard input.

For example, the date command and the sleep command never read from
standard input, and you can't force them by adding redirection:

  $ date                            # date never reads stdin
  Thu Feb 16 05:48:13 EST 2012
  $ date <file                      # date never reads stdin and ignores <file
  Thu Feb 16 05:48:15 EST 2012

  $ echo 30 >file                   # first, put the number 30 into a file
  $ sleep 10                        # sleep never reads stdin
  $ sleep 10 <file                  # sleep never reads stdin; ignores <file
  $ sleep    <file                  # sleep never reads stdin; ignores <file
  sleep: too few arguments

Many other common commands *never read standard input*, and so adding
input redirection to these commands does nothing:

  $ ls -l /bin                      # show pathnames under /bin
  $ ls -l /bin <input               # no difference; ls never reads stdin

  $ cd /bin                         # change to the /bin directory
  $ cd /bin <input                  # no difference; cd never reads stdin

  $ cp foo bar        # cp reads data from foo and writes to bar; ignores stdin
  $ cp foo bar <file  # no change; cp never reads file data from stdin

Commands that might take pathname arguments also ignore standard input if
any pathnames are present on the command line.  If supplied with pathname
arguments, the commands always read the pathnames and ignore stdin.

Here are more examples that DO NOT WORK as input redirection because the
command was *not* reading from standard input when redirection was added.
The following command lines all ignore standard input, because all the
commands have been given file name arguments to read instead:

  $ cat  food <file       # cat  reads from file "food", ignores "<file"
  $ sort food <file       # sort reads from file "food", ignores "<file"
  $ head food <file       # head reads from file "food", ignores "<file"
  $ tail food <file       # tail reads from file "food", ignores "<file"

  [...etc. for all commands that take pathnames or read from stdin...]

If there are pathnames on the command line, stdin is not used.  In all
the above incorrect examples, the shell will open the file "file"
and attach it and make it ready on stdin for the command; the command
itself will ignore stdin and read from the "food" pathname supplied on
the command line.  Attaching the "<file" on standard input is ignored.

Commands never read both pathnames *and* standard input; it's one or the
other, and command argument pathnames are always used instead of stdin.

So if a file can be supplied as a command line pathname or attached to a
command via standard input, what is the difference?  Note the difference
between "cat food" and "cat <food":

  $ cat food

  - the cat command has a pathname argument, which means it ignores stdin
  - the "cat" command is opening the file argument "food", not the shell
    - any errors will come from the "cat" command and will mention the file
      name "food", e.g.  cat: food: Permission denied
  - the cat command reads data from the file it opened itself

  $ cat <food

  - the cat command has no arguments, which means it will read standard input
  - the shell is performing standard input redirection from file "food",
    which means standard input for "cat" will come from file "food"
  - the shell itself is opening the file "food", not the "cat" command
    - any errors will come from the shell, not from the "cat" command,
      e.g.  bash: food: Permission denied
  - the cat command reads data from standard input, opened by the shell

For commands that display their input pathnames in their output, the
above difference is more significant.  If no pathnames are supplied on
the command line and all the data comes from standard input, there is
no file name available to the command to indicate in the output:

  $ wc -l /etc/passwd
  44 /etc/passwd

  - wc was passed the file name "/etc/passwd" as a command line pathname
    argument and so wc had to open the file itself
  - wc knows the file name, so it prints the name in the output

  $ wc -l </etc/passwd
  44

  - the shell opens the standard input redirection file "/etc/passwd"
    and attaches it to standard input for the command, which is "wc"
  - wc was given no file arguments and so reads the data from standard input;
    wc doesn't know the file name; only the shell knows the name, so
    wc does not print any file name.  wc cannot know the file name!

The above input redirection trick can be useful to get just the number
of lines in a file, without also getting the file name as well:

  $ echo "The number of lines is:" ; wc -l /etc/passwd
  The number of lines is:
  44 /etc/passwd                     # wrong - "44 /etc/passwd" is not a number

  $ echo "The number of lines is:" ; wc -l </etc/passwd
  The number of lines is:
  44                                 # correct - just the number, no name

WARNING: What about the following command lines - what is in "myfile"
when the command finishes?

    $ cat  <myfile >myfile                            # WRONG!
    $ sort <myfile >myfile                            # WRONG!
    $ head <myfile >myfile                            # WRONG!

Given the above, why is "myfile" not left empty in the following case?

    $ wc <myfile >myfile                              # WRONG!

====================================
3. Redirection into programs (Pipes)
====================================

Since the shell can redirect both the output of programs and the input
of programs, it can connect (redirect) the output of one program into
the input of another program.  This is called "piping" and uses the
"pipe" meta-character '|' (shift-'\'), e.g.  $ date | wc

3.1. Rules for Pipes
--------------------

  1. Pipe redirection is done by the shell, first, before file redirection.
  2. The command on the left of the pipe must produce some standard output.
  3. The command on the right of the pipe must want to read standard input.

The shell meta-character "|" ("pipe") signals the start of another command
on the command line.  The standard output (only stdout; not stderr)
of the command on the immediate left of the "|" is attached/connected
("piped") to the standard input of the command on the immediate right:

  $ date
  Mon Feb 27 06:37:52 EST 2012
  $ date | wc
    1 6 29

(Note that the newline character at the end of a line is counted by wc.)

You can approximate some of the behaviour of a pipe using a temporary
file for intermediate storage before using the second command:

  $ date >out ; wc <out     # output of date is saved and given to input of wc
    1 6 29

If you use a temporary file, the command on the left has to finish and
put *all* its output into the temporary file before the shell runs the
command on the right to read the file.  If the command on the left never
finished, the command on the right would never run.  Pipes don't have
this problem.  The output starts flowing immediately through the pipe
because *both* commands are actually running *simultaneously*:

  $ find / >out ; less out  # huge output of find has to finish first (slow)
  $ find / | less           # huge output of find goes directly into "less"

The pipe requires no temporary file, and so as soon as the command on the
left of the pipe starts producing standard output, it goes directly into
the standard input of the command on the right.  If the command on the
left never finishes, the command on the right will continue to wait for
more input, processing it as it appears.  If the command on the left does
finish, the command on the right sees an EOF (end-of-file) on the pipe
(its standard input).  As with EOF from a file, EOF usually means that
the command on the right will finish processing, produce its last output,
and exit.

Recognizing pipes and splitting a command line into piped commands is
done first, *before* doing file redirection.  File redirection happens
second (after pipe splitting), and if present, has precedence over pipe
redirection.  (The file redirection is done *after* pipe splitting,
so it always wins, leaving nothing for the pipe.)

  $ ls -l      | wc          # correct - output of ls goes into the pipe
    2 11 57

  $ ls -l >out | wc          # WRONG! - output of ls goes into the file
    0 0 0                    # wc reads an empty pipe and outputs zeroes

  - shell first splits the line on the pipe, redirecting the output of the
    command on the left into the input of the command on the right, but:
  - then the shell processes the standard output file redirection on the
    "ls" on the left and changes the "ls" standard output into the file "out"
  - finally, the shell finds and runs both commands simultaneously
  - all the standard output from "ls" goes into the file "out";
    nothing is available to go into the pipe
  - wc counts an empty input from the pipe and outputs: 0 0 0

As with file output redirection, you can only redirect into a pipe the
standard output that you can *see*; redirection never creates output:

    $ cp /etc/passwd x        # no output visible on standard output
    $ cp /etc/passwd x | cat  # no output is passed to "cat"

    $ cd /tmp                 # no output visible on standard output
    $ cd /tmp | head          # no output is passed to "head"

    $ touch x ; rm x          # no output from rm on standard output
    $ touch x ; rm x | wc     # no output is passed to "wc"
      0 0 0                   # wc counts an empty input from the pipe

As with file redirection, you need the special syntax "2>&1" to redirect
both stdout and stderr both into a pipe.  Recall that "2>&1" means
"redirect standard error to go to the same place as standard output",
so if standard output is already going into a pipe, "2>&1" will send
standard error there too:

  $ ls /etc/passwd nosuchfile            # no redirection used
    ls: cannot access nosuchfile: No such file or directory   # STDERR unit 2
    /etc/passwd                                               # STDOUT unit 1

  $ ls /etc/passwd nosuchfile | wc       # only stdin is redirected to "wc"
    ls: cannot access nosuchfile: No such file or directory   # STDERR unit 2
    1 1 12                               # stdout went into the pipe to "wc"

  $ ls /etc/passwd nosuchfile 2>&1 | wc  # both stdin and stderr redirected
    2 10 68                              # wc counts both lines from pipe

Remember: Redirection can only go to *one* place, and file redirection
    always wins over pipes, because it is done after pipe splitting.

  $ ls /bin >out              # all output from ls goes into file "out"
  $ ls /bin >out | wc         # WRONG! output goes into "out", not into pipe
      0 0 0                   # wc counts an empty input from the pipe

3.2. Using commands as Filters
------------------------------

Note that many Unix commands can be made to act as "filters" - reading
from stdin and writing to stdout, all supplied by the shell, without
opening any pathnames themselves.  With no file names on the command line,
the commands read from standard input and write to standard output.
The shell provides redirection for both standard input and standard
output:

    $ grep "/bin/sh" /etc/passwd | sort | head -5

The "grep" command above is reading from the filename argument /etc/passwd
given on the command line.  (When reading from files, commands do not
read from standard input.  File names take priority over standard input.)

The "sort" and "head" commands have no file names to read; this means
they read from standard input, which is set up to be pipes by the shell.
Both "sort" and "head" are acting as filters; they are reading from stdin
and writing to stdout.  (The "grep" command is technically not a filter -
it is reading from the supplied argument pathname, not from stdin.)

Remember: if file names are given on the command line, the commands
ignore standard input and only operate on the file names.  Look at this
small change to the above pipeline:

    $ grep "/bin/sh" /etc/passwd | sort | head -5 /etc/passwd    # WRONG!

Above is the same command line as the previous example, except the
"head" command is now ignoring standard input and is reading directly
from its /etc/passwd filename argument.  The "grep" and "sort" commands
are doing a lot of work for nothing, since "head" is not reading the
output of sort coming down the pipe.  The head command is reading from
the supplied file name argument /etc/passwd instead.  File names take
precedence over standard input.

*** Commands ignore standard input if they are given file names to read. ***

If a command does read from file names supplied on the command line,
it is more efficient to let it open its own file name than to use "cat"
to open the file and feed the data to the command on standard input.
(There is less data copying done!)

Do this:

    $ head /etc/passwd
    $ sort /etc/passwd

Do not do this (wasteful of processes and I/O):

    $ cat /etc/passwd | head     # DO NOT DO THIS - INEFFICIENT
    $ cat /etc/passwd | sort     # DO NOT DO THIS - INEFFICIENT

Advice: Let commands open their own files; don't feed them with "cat".

3.3 Examples of pipes
---------------------

Problem: Display only lines 6-10 of the password file:

  $ head /etc/passwd | tail -n 5    # last five lines of first ten: lines 6-10

Problem: Display only the second-last line of the password file:

  $ tail -n 2 /etc/passwd | head -n 1 # first line of last two lines

Problem: Which five files in current directory are largest:

  $ ls -s | sort -nr | head -n 5

  $ ls -la | sort -k 5,5nr | head -n 5

  - the sort command is sorting by the fifth field, numerically, in reverse

Problem: "Count the number of each kind of shell in /etc/passwd."

    $ cut -d : -f 7 /etc/passwd | sort | uniq -c

    - the cut command picks out colon-delimited field 7 in the password file
    - the sort command puts all the shell names in order
    - the uniq command counts the adjacent names

Problem: "Count the number of each kind of shell in /etc/passwd and
    display the results sorted in descending numeric order."

    $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr

    - use the previous pipeline and add to it:
    - sort the above output numerically and in reverse

Problem: "Count the number of each kind of shell in /etc/passwd and
         display the top two results sorted in descending numeric order."

    $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr | head -n 2

    - use the previous pipeline and add to it:
    - pick off only the top two lines of the above output

Problem: Which ten IP addresses are trying most often to break into my machine:

  # grep 'refused connect' /var/log/auth.log \
      | awk '{print $NF}' \
      | sort | uniq -c | sort -nr | head

  - the grep command picks off the sshd lines containing the IP address
  - the awk command is displaying just the last field on each input line
  - the first (leftmost) sort command puts all the IP addresses in order
  - the uniq command is counting how many adjacent addresses are the same
  - the second sort command is sorting the count numerically, in reverse
  - the head picks off only the top ten addresses

Problem: Display practice test and weekly file dates from the Course Notes:

  $ alias ee='elinks -dump -no-numbering -no-references'

  $ ee 'http://teaching.idallen.com/cst8207/12w/notes/' | grep 'practice'
  $ ee 'http://teaching.idallen.com/cst8207/12w/notes/' | grep 'week'

Problem: Display the dates of the Midterm tests from the Home Page:

  $ ee 'http://teaching.idallen.com/cst8207/12w/' | grep 'Midterm'

Problem: Display current Ottawa weather temperature and forecast:

  $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \
         | grep -A1 'Temp'
  $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \
         | grep -A2 'Today:'
  $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \
         | grep -A2 'Tonight:'

Problem: Display Ottawa tomorrow weather forecast:

  $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \
         | grep -A8 'Tonight:' | tail -n 5

Problem: Display the first top story from the BBC:

  $ ee 'http://www.bbc.co.uk/' |  grep -A9 'Top stories'

Problem: Display first top story in each subject from the BBC:

  $ ee 'http://www.bbc.co.uk/' |  grep -A3 'Top Story'

Problem: Display the current BBC weather for Vancouver:

  $ ee 'http://www.bbc.co.uk/weather/6173331' \
      | grep -A19 'Observations' | tail -n 20

Problem: Display the current Space Weather forecast for Canada:

  $ ee 'http://www.spaceweather.gc.ca/index-eng.php' \
      | grep -A10 'ISES Regional Warning Centre'

Problem: Display the current phase of the Moon:

  $ ee 'http://www.die.net/moon/' \
      | grep -A2 'Moon Phase' | head -n 3 | tail -n 1


3.4 Misuse of redirection into programs
---------------------------------------

People are often misled into thinking that adding redirection to a command
will create output that wasn't there before the redirection was added.
It isn't so.

Rules for Pipes:

  1. Pipe redirection is done by the shell, first, before file redirection.
  2. The command on the left of the pipe must produce some standard output.
  3. The command on the right of the pipe must want to read standard input.

If a Unix command that can open and read the contents of pathnames is not
given any pathnames to open, it usually reads input lines from standard
input (stdin) instead:

    $ wc /etc/passwd  # wc reads /etc/passwd, ignores stdin and your keyboard
    $ wc              # without a file name, wc reads stdin (your keyboard)

If the command is given a pathname, it reads from the pathname and
*always* ignores standard input, even if you try to send it something:

  $ wc               # without a file name, wc reads standard input (keyboard)
  $ date | wc        # wc opens and reads standard input, counts date output

  $ wc foo           # wc reads foo; wc does not read stdin
  $ date | wc foo    # WRONG! wc opens and reads foo; wc ignores stdin

The above applies to every command that reads file content, e.g.:

  $ date | head foo  # WRONG! head opens and reads foo; head ignores stdin
  $ date | less foo  # WRONG! less opens and reads foo; less ignores stdin

If you want a command to read stdin, you *cannot* give it any file name
arguments.  Commands with file name arguments *ignore* standard input;
they should not be used on the right side of a pipe.

Commands that are ignoring standard input (because they are opening and
reading from pathnames on the command line) will always ignore standard
input, no matter what silly things you try to send them on standard input:

  $ echo hi | head /etc/passwd   # WRONG: head has a pathname and ignores stdin
  $ echo hi | tail /etc/group    # WRONG: tail has a pathname and ignores stdin
  $ echo hi | wc .vimrc          # WRONG:   wc has a pathname and ignores stdin
  $ sort a | cat b               # WRONG:  cat has a pathname and ignores stdin
  $ cat a | sort b               # WRONG: sort has a pathname and ignores stdin

Standard input is thrown away if it is sent to a command that ignores it.
The shell *cannot* make a command read stdin; it's up to the command.
The command must *want* to read standard input, and it will *only*
want to read standard input if you *leave off all the file names*.

Commands that do not open and process the *contents* of files usually
ignore standard input, no matter what silly things you try to send them
on standard input.  All these commands will never read standard input:

  $ echo hi | ls          # NO: ls doesn't open files - always ignores stdin
  $ echo hi | pwd         # NO: pwd doesn't open files - always ignores stdin
  $ echo hi | cd          # NO: cd doesn't open files - always ignores stdin
  $ echo hi | date        # NO: date doesn't open files - always ignores stdin
  $ echo hi | chmod +x .  # NO: chmod doesn't open files - always ignores stdin
  $ echo hi | rm foo      # NO: rm doesn't open files - always ignores stdin
  $ echo hi | rmdir dir   # NO: rmdir doesn't open files - always ignores stdin
  $ echo hi | echo me     # NO: echo doesn't open files - always ignores stdin
  $ echo hi | mv a b      # NO: mv doesn't open files - always ignores stdin
  $ echo hi | ln a b      # NO: ln doesn't open files - always ignores stdin

Some commands *only* operate on file name arguments and never read stdin:

  $ echo hi | cp a b      # NO: cp opens arguments - always ignores stdin

Standard input is thrown away if it is sent to a command that ignores it.
The shell *cannot* make a command read stdin; it's up to the command.

Commands that might read standard input will do so only if *no* file
name arguments are given on the command line.  The presence of any file
arguments will cause the command to ignore standard input and process
the file(s) instead, and that means they cannot be used on the right
side of a pipe to read standard input.  File name arguments always win
over standard input.

Example of mis-used redirection:
--------------------------------

The very long sequence of pipes below is pointless - the last (rightmost)
command ("head") has a pathname and will open and read it, ignoring all
the standard input coming from all the pipes on the left:

  $ head /etc/passwd | sort | tail | sort -r | head /etc/passwd

The above mal-formed pipeline is equivalent to this (same output):

  $ head /etc/passwd

If you give a command a file to process, it will ignore standard input,
and so a command with a file name must not be used on the right side of
any pipe.

==========================
4. Unique STDIN and STDOUT
==========================

There is only one standard input and one standard output for each command.
Each can only be redirected to *one* other place.  You cannot redirect
standard input from two different places, nor can you redirect standard
output into two different places.

The Bourne shells (including bash) do not warn you that you are trying
to redirect the input of a command from two or more different places (and
that only one of the redirections will work - the others will be ignored):

    bash$ wc <a <b <c <d <e
    - the "wc" stdin comes from file "e" *only*
    - the other file names must exist (the shell will open every one of
      them), but they will be ignored because only the final redirection wins

    bash$ date | cat <food
    - the "date" output vanishes; cat reads stdin from file "food"
      (file redirection is done second and always wins over pipe redirection)

The Bourne shells (including bash) do not warn you that you are trying
to redirect the output of a command to two or more different places and
that only one of the redirections will work - the others will be ignored:

    bash$ date >a >b >c >d >e
    - the "date" output goes into file "e"; the other four output files
      are each created and truncated by the shell but they are all left
      empty because only the final redirection into "e" wins

    bash$ date >out | wc
          0 0 0
    - the "date" output goes into file "out"; nothing goes into the pipe
      (file redirection is done second and always wins over pipe redirection)

Some shells (including the "C" shells, but not the Bourne shells) will
try to warn you about silly shell redirection mistakes:

    csh% date <a <b <c <d
    Ambiguous input redirect.

    csh% date | cat <food
    Ambiguous input redirect.

    csh% date >a >b >c
    Ambiguous output redirect.

    csh% date >a | wc
    Ambiguous output redirect.

The C shells tell you that you can't redirect stdin or stdout to/from more
than one place at the same time.  Bourne shells do not tell you - they
simply ignore the "extra" redirections and do only the last one of each.

================================================
5. tr - a command that does not accept pathnames
================================================

The Unix "tr" command is one of the few (only?) Unix commands that reads
standard input but does *not* allow any pathnames on the command line -
you must *always* supply input to "tr" on standard input:

  $ tr 'a-z' 'A-Z' file1 file2 >out         # *** WRONG - ERROR ***
  tr: too many arguments
  $ cat file1 file2 | tr 'a-z' 'A-Z' >out   # correct for multiple files
  $ tr 'a-z' 'A-Z' <tmp >out                # correct for a single file

Note: System V versions of "tr" demand that character ranges appear
inside square brackets, e.g.:  tr '[a-z]' '[A-Z]'
Berkeley Unix and Linux do not use the brackets.
No version of "tr" accepts pathnames on the command line.
All versions of "tr" *only* read standard input.

Here is an incorrect example using the same output file as input:

  $ date >out
  $ tr ' ' '_' <out >out # WRONG! Redirection output file is used as input file!
  1. shell opens the file "out" as standard input to "tr" (due to "<out")
  2. shell next truncates the file "out" - file is now empty (due to ">out")
     - original contents of "out" are lost - truncated - GONE! - before
       the shell even goes looking for the "tr" command to run!
  3. shell finds and runs command "tr" with two string arguments
    ==> i.e. tr(' ','_')
  4. command "tr" reads from standard input (tr always reads stdin)
     - standard input is attached to "out", now an empty file
  5. standard output has been redirected by the shell to appear in file "out"
     - translating an empty input file produces no output in "out"

  Result: The file "out" is always empty.  File "out" gets a translated
  copy of an empty input file; the file "out" is always left empty.

  RIGHT WAY (use two commands):  $ tr ' ' '_' <out >tmp  &&  mv tmp out

Problem: convert lower-case to upper-case from the "who" command:

    $ who | tr 'a-z' 'A-Z'

Shell question: Are the single quotes required around the two arguments?
(Are there any special characters in the arguments that need protection?)

Using redirection, you can use a similar command to convert a lower-case
file of text into upper-case.

EXPERIMENT: Why doesn't this convert the file "myfile" to upper-case?

    $ date >myfile
    $ tr 'a-z' 'A-Z' <myfile >myfile                  # WRONG!
    $ wc myfile
     0 0 0 myfile                 # what happened?

Why is the file "myfile" empty after this command is run?

The following command line doesn't work because the programmer doesn't
understand the "tr" command syntax:

    $ tr 'a-z' 'A-Z' myfile >new                      # WRONG!

Why does this generate an error message from "tr"?  (The "tr" command is
unusual in its handling of command line pathnames.  RTFM)

The following command line redirection is faulty (input file is also
output file); however, it sometimes works for small files:

    $ cat foo bar | tr 'a' 'b' | grep "lala" | sort | head >foo   # WRONG!

There is a critical race between the first "cat" command trying to
read the data out of "foo" before the shell truncates it to zero when
launching the "head" command at the end of the pipeline.  Depending on
the system load and the size of the file, "cat" may or may not get
out all the data before the "foo" file is truncated or altered by the
shell in the redirection at the end of the pipeline.

Don't depend on long pipelines saving you from bad redirection!
Never redirect output into a file that is being used as input in the
same command or anywhere in the command pipeline.

===================================================
6. Do not redirect full-screen programs such as VIM
===================================================

Full-screen keyboard interactive programs such as the VIM text editor do
not behave nicely if you redirect their input or output - they really
want to be talking to your keyboard and screen; don't redirect them or
try to run them in the background using "&".  You can hang your terminal
if you try.

=================================================
7. Redirect *only* stderr into a pipe (ADVANCED!)
=================================================

  How do you redirect *only* stderr into the pipe, and let stdout go
  to the terminal?  This is tricky; on the left of the pipe you have
  to swap stdout (attached to the pipe) and stderr (attached to the
  terminal).  You need a temporary output unit (use "3") to record and
  remember where the terminal is (redirect unit 3 to the same place
  as unit 2: "3>&2"), then redirect stderr into the pipe (redirect
  unit 2 to the same place as unit 1: "2>&1"), then redirect stdout
  to the terminal (redirect unit 1 to the same place as unit 3: "1>&3"):

  $ ls /etc/passwd nosuchfile 3>&2 2>&1 1>&3 | wc  # switch STDOUT and STDERR
    /etc/passwd                                    # STDOUT appears on terminal
    1 9 56                                         # STDERR goes into the pipe

You seldom need to do this advanced trickery, even inside scripts.

-- 
| Ian! D. Allen  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/