============================================ Unix Shell I/O Redirection (including Pipes) ============================================ -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Contents of this file: 1. Output Redirection - Standard Output and Standard Error 1.1 Output redirection into files 1.1.1 Standard Output ("stdout") and Standard Error ("stderr") 1.2 Throwing away output via /dev/null 1.3 Output redirection mistakes to avoid 2. Input Redirection - Standard Input 2.1 Not all commands read standard input 2.2 shell redirection of standard input 3. Redirection into programs (Pipes) 3.1. Rules for Pipes 3.2. Using commands as Filters 3.3 Examples of pipes 3.4 Misuse of redirection into programs 4. Unique STDIN and STDOUT 5. tr - a command that does not accept pathnames 6. Do not redirect full-screen programs such as VIM 7. Redirect *only* stderr into a pipe (ADVANCED!) In the examples below, I use the meta-character ";" to put multiple commands on one shell command line: $ date ; who ; echo hi ; pwd These behave as if you had typed each of them on separate lines. ========================================================== 1. Output Redirection - Standard Output and Standard Error ========================================================== In output redirection, the shell (not the command) diverts (redirects) most command output that would normally appear on the screen to some other place, either into the input of another command (using a pipe meta-character '|') or into a file (using a file redirect meta-character '>'). * Redirection is done by the shell, first, before finding the command; the shell has no idea if the command exists or will produce any output. * You can only redirect the output that you can see. If there is no visible output without redirection, adding redirection won't create any. * Redirection can only go to *one* place. You can't use multiple redirections to send output to multiple places. (See the "tee" command.) * By default, error messages (called "standard error" or "stderr") are not redirected; only "normal output" (called "standard output" or "stdout") is redirected (but you can also redirect stderr with more syntax). --------------------------------- 1.1 Output redirection into files --------------------------------- The shell meta-character '>' signals that the next word on the command line is an output file (not a program) that should be created or truncated (set to empty) and made ready to receive the standard output of a command: $ date >outfile The spaces between '>' and the file name are optional: $ date > outfile The file is always created or truncated to empty *before* the shell finds and runs the command, unless you double up the character like this: $ date >> outfile # output is *appended* to outfile; no truncation An example of redirection of output into a file: $ echo hello # output goes to terminal (screen) hello $ echo hello >file # erase file; send output to file $ cat file # display what is in the file hello $ echo there >>file # append output to end of file $ cat file # display what is in the file hello there It is the shell that creates or truncates the file and sets up the redirection, not the command being redirected. The command knows nothing about the redirection - the redirection syntax is removed from the command line before the command is found and executed: $ echo one two three # echo has three arguments one two three $ echo one two three >out # echo still has three arguments $ cat out one two three Shells handle redirection before they go looking for the command name to run. Indeed, you can have redirection even if the command is not found or if there is no command at all: $ nosuchcommandxxx >out # file "out" is created empty sh: nosuchcommandxxx: command not found $ wc out 0 0 0 out # shell created an empty file $ >out # file "out" is created empty $ wc out 0 0 0 out # shell created an empty file The shell creates or truncates the file "out" empty, and then it tries to find and run the nonexistent command and fails. The empty file remains. Any existing file will have its contents removed: $ echo hello >out ; cat out hello $ nosuchcommandxxx >out sh: nosuchcommandxxx: command not found $ wc out 0 0 0 out # shell truncated the file Redirection is done by the shell *before* the command is run: $ mkdir empty $ cd empty $ ls -l total 0 # no files found $ ls -l >out # shell creates "out" first $ cat out # display output total 0 -rw-r--r-- 1 idallen idallen 0 Sep 21 06:02 out $ date >out $ ls -l total 4 -rw-r--r-- 1 idallen idallen 29 Sep 21 06:04 out $ ls -l >out # shell empties "out" first $ cat out # display output total 0 -rw-r--r-- 1 idallen idallen 0 Sep 21 06:06 out The shell creates or empties the file "out" before it runs the "ls" command. Explain this sequence of commands: $ mkdir empty $ cd empty $ cp a b cp: cannot stat `a': No such file or directory $ cp a b >a $ # why is there no error message from cp this time? Explain this sequence of commands: $ date Wed Feb 8 03:01:11 EST 2012 $ date >a $ cat a Wed Feb 8 03:01:21 EST 2012 $ cp a b $ cat b Wed Feb 8 03:01:21 EST 2012 $ cp a b >a $ cat b $ # why is file b empty? Shells don't care where on or in the command line you do the file redirection. The file redirection is done by the shell, then the redirection syntax is removed from the command line before the command is called. The command actually being run doesn't see any part of the redirection syntax; the number of arguments is not affected. All the command lines below are equivalent to the shell; in every case the echo command sees only three arguments and the three command line arguments "hi", "there", and "mom" are all redirected into "file": $ echo hi there mom >file # echo has three arguments $ echo hi there >file mom # echo has three arguments $ echo hi >file there mom # echo has three arguments $ echo >file hi there mom # echo has three arguments $ >file echo hi there mom # echo has three arguments The redirection syntax is removed by the shell before the command runs; so, redirection syntax is never counted as arguments to a command. Examples: $ echo hello there - shell calls "echo" with two arguments ==> echo(hello,there) - "echo" echoes two arguments on standard output - output appears in default location (standard output is your screen) $ echo hello there >file - shell creates "file" and diverts standard output into it - shell removes the syntax ">file" from the command line - shell calls "echo" with two arguments ==> echo(hello,there) (note NO CHANGE in arguments to "echo" from the previous example) - "echo" echoes two arguments on standard output - standard output is captured in output "file", NOT on your screen $ >file echo hello there - this is identical to the above example (the shell does not care where in the command line you put the redirection) - standard output is captured in output "file", NOT on your screen - you can put the redirection anywhere in the command line! Redirection is done by the shell, first, even before finding the command: - the shell creates a new empty file or truncates (empties) an existing file - after doing the redirection, and removing the syntax from the command line, the shell finds and executes the command (if any): Explain this sequence of commands: $ rm rm: missing operand $ touch file $ rm >file rm: missing operand # why doesn't rm remove "file"? $ rm nosuchfile rm: cannot remove `nosuchfile': No such file or directory $ rm nosuchfile >nosuchfile $ # why is there no rm error message here? You can only redirect the output that you can see! *Only* what you see! - Redirection does not invent new output! *ONLY WHAT YOU SEE!* - If you don't see any output from a command, adding redirection will simply have the shell create an empty file (no output): Example: $ cp /etc/passwd x # no output on standard output $ cp /etc/passwd x >out # file "out" is created empty $ cd /tmp # no output on standard output $ cd /tmp >out # file "out" is created empty $ touch x ; rm x # no output from rm on standard output $ touch x ; rm x >out # file "out" is created empty Redirection can only go to *one* place: - the right-most file redirection wins (others create empty files) Example: $ date >a >b >c # output goes into file c; a and b are empty Redirection to a file wins over redirection into a pipe: - see the following section on redirection into programs using "|" pipes - if you redirect into a file and a pipe, the pipe gets nothing Example: $ date >a | cat # output goes into file "a"; cat shows nothing The redirection output file is emptied (truncated) unless you append via >> - the file is emptied *before* the shell looks for and runs the command - don't use output redirection files as input to the same command Bad Example: $ sort a >a # WRONG! file "a" is truncated to be empty 1.1.1 Standard Output ("stdout") and Standard Error ("stderr") -------------------------------------------------------------- Most commands have two separate output "streams", numbered 1 and 2: 1. stdout - unit 1 - Standard Output (normal output) 2. stderr - unit 2 - Standard Error Output (error and warning messages) The normal (non-error) "unit 1" outputs on your screen come from the "standard output" ("stdout") of the command. Stdout is the output from "printf" and "cout" statements in C and C++ programs, and from "System.print" and "System.println" in Java. This is the expected, usual output of a command. The error message "unit 2" outputs on your screen come from the "standard error output" ("stderr") of the command. Stderr is the output from "fprintf(stderr" and "cerr" statements in C and C++ programs, and from "System.err.print" and "System.err.println" in Java. Programs print on this output only for error messages. The stdout and stderr mix together on your terminal screen. They look the same on the screen, so you can't tell by looking at your screen what comes out of a program on stdout and what comes out of a program on stderr. To show a simple example of stdout and stderr both appearing on your screen, use the "ls" command and give it one file name that exists and one name that does not exist (and thus causes an error message to be displayed): $ ls -l /etc/passwd nosuchfile ls: nosuchfile: No such file or directory # standard error -rw-r--r-- 1 root root 2209 Jan 19 20:39 /etc/passwd # standard output The stderr (error messages) output often appears first, before stdout, due to internal I/O buffers used by commands for stdout. Normally, both stdout and stderr appear together on your terminal. The shell can redirect the two outputs individually or together into files or into other programs. The default type of output redirection (whether redirecting to files or to programs using pipes) redirects *only* standard output and lets standard error go, untouched, to your terminal. Below are some examples all using the shell file redirect meta-character '>': $ ls /etc/passwd nosuchfile # no redirection used ls: nosuchfile: No such file or directory # this on screen from stderr /etc/passwd # this on screen from stdout $ ls /etc/passwd nosuchfile >out # shell redirects only stdout ls: nosuchfile: No such file or directory # only stderr appears on screen $ cat out /etc/passwd You can redirect stdout and stderr separately into files using unit numbers before the '>' meta-character: - stdout is always unit 1 and stderr is always unit 2 (stdin is unit 0) - put the unit number immediately (no blank) before the '>' meta-character. ">foo" (no preceding unit number) is a shell shorthand for "1>foo" ">foo" redirects the default unit 1 (stdout) only, not stderr ">foo" and "1>foo" are identical You can also tell the shell to redirect standard error, unit 2, to a file: $ ls /etc/passwd nosuchfile 2>errors # shell redirects only stderr /etc/passwd # only stdout appears on screen $ cat errors ls: nosuchfile: No such file or directory You can redirect stdout into one file and stderr into another file: $ ls /etc/passwd nosuchfile >out 2>errors # shell redirects each one $ # nothing appears on screen $ cat out /etc/passwd $ cat errors ls: nosuchfile: No such file or directory You needed a special syntax "2>&1" to redirect both stdout and stderr safely together into a single file in the Bourne shells. Read the syntax "2>&1" as "send unit 2 to the same place as unit 1": $ ls /etc/passwd nosuchfile >both 2>&1 # redirect both into same file $ # nothing appears on screen $ cat both ls: nosuchfile: No such file or directory /etc/passwd The order of >both and 2>&1 on the command line matters! The ">both" stdout redirect must come first (to the left of) stderr "2>&1" because you must set where stdout (unit 1) goes *before* you send stderr (unit 2) to go "to the same place as unit 1". Don't reverse these! You must use the special syntax ">both 2>&1" to put both stdout and stderr into the same file. Don't use the following, which is not the same: $ ls /etc/passwd nosuchfile >wrong 2>wrong # WRONG! DO NOT DO THIS! $ cat wrong /etc/passwd ccess nosuchfile: No such file or directory This above WRONG example will cause stderr and stdout to overwrite each other and the result is a mangled output file; don't do this. The modern Bourne shells now have a special shorter syntax for redirecting both stdout and stderr into the same output file: $ ls /etc/passwd nosuchfile &>both # redirect both into same file $ # nothing appears on screen $ cat both ls: nosuchfile: No such file or directory /etc/passwd You can now use either "&>both" or ">both 2>&1", but only the latter works in every version of the Bourne shell (back to the 1960's!). When writing shell scripts, use the ">both 2>&1" version for maximum portability. Output Redirection Summary: -------------------------- Redirection is done by the shell. Things happen in this order: 1. First: All redirection (and file truncation) is done by the shell. The shell removes all the redirection syntax from the command line. This redirection and truncation happens even if no command executes. The command will have no idea that its output is being redirected. 2. Second: The command (if any) executes and may produce output. The shell executes the command *after* doing all the redirection. (If the redirection fails, the shell does not run any command.) 3. Third: The output from the command (if any) happens, and it goes into the indicated redirection output file. This happens last. If the command produces no output, the output file will be empty. Adding redirection never creates output. ----------------------------------------- 1.2. Throwing away output using /dev/null ----------------------------------------- There is a special file on every Unix system, into which you can redirect output that you don't want to keep or see: /dev/null The following command generates some error output we don't like to see: $ cat * >/tmp/out cat: course_outlines: Is a directory # errors print on STDERR cat: jclnotes: Is a directory # errors print on STDERR cat: labs: Is a directory # errors print on STDERR cat: notes: Is a directory # errors print on STDERR We can throw away the errors (stderr, unit 2) into /dev/null: $ cat * >/tmp/out 2>/dev/null The file /dev/null never fills up; it just eats output. When used as an input pathname, it always appears to be empty: $ wc /dev/null 0 0 0 /dev/null System Administrators: Do not get in the habit of throwing away all the error output of commands! You will also throw away legitimate error messages and nobody will know that these commands are failing. ---------------------------------------- 1.3 Output redirection mistakes to avoid ---------------------------------------- First, here is a summary of how correct use of redirection works: $ date >out 1. shell first truncates the file "out" - file is now empty 2. shell redirects standard output of command "date" into file "out" 3. shell removes the ">out" syntax from the command line 4. shell finds and runs the "date" command 5. standard output of the date command goes to standard output (1 line) - standard output has been redirected by the shell to appear in file "out" Result: file "out" contains one line of output from "date" Unix Big Redirection Mistake #1 ------------------------------- Do not use a redirection file as both output and input to a program or a pipeline! The sort command is used as the example program below - anything that reads files and produces output is at risk: $ sort a >a # WRONG! Redirection output file is used as sort input file! 1. shell first truncates the file "a" - file is now empty - original contents of "a" are lost - truncated - GONE! - before the shell even goes looking for the "sort" command to run! 2. shell redirects standard output of sort into the empty file "a" 3. shell finds and runs the "sort" command with one file name argument "a" ==> i.e. sort(a) 4. sort command opens the empty argument file "a" for reading 5. standard output has been redirected by the shell to appear in file "a" - sorting an empty file produces no output; file "a" remains empty Result: File "a" is always empty, no matter what was in it before. RIGHT WAY (use two commands): $ sort a >tmp && mv tmp a RIGHT WAY (use special sort output option): $ sort -o a a Here is another incorrect example using the same output file as input: $ date >out $ wc out >out # WRONG! Redirection output file is used as sort input file! 1. shell first truncates the file "out" - file is now empty - original contents of "out" are lost - truncated - GONE! - before the shell even goes looking for the "wc" command to run! 2. shell redirects standard output of wc into the empty file "a" 3. shell finds and runs the "wc" command with one file name argument "out" ==> i.e. wc(out) 4. wc command opens the empty argument file "out" for reading 5. standard output has been redirected by the shell to appear in file "out" - counting an empty file produces 1 line "0 0 0 out" on standard output Result: The one line of wc output "0 0 0 out" is placed into file "out". File "out" now has one line, a word count of an empty file. The original contents of "out" were truncated away by the shell in step 1 and never used. RIGHT WAY (use two commands): $ wc out >tmp && mv tmp out Other incorrect redirection examples that DO NOT WORK: $ head file >file # ALWAYS creates an EMPTY FILE $ tail file >file # ALWAYS creates an EMPTY FILE $ uniq file >file # ALWAYS creates an EMPTY FILE $ cat file >file # ALWAYS creates an EMPTY FILE $ grep 'foo' file >file # ALWAYS creates an EMPTY FILE $ sum file >file # ALWAYS checksums an EMPTY FILE ...etc... Never use the same file name for both input and output - the shell will truncate the file before the command reads it. Unix Big Redirection Mistake #2 ------------------------------- Do not use a wildcard/glob file pattern that picks up the name of the output redirection file and causes it to become an unintended input file. Bourne shells (e.g. BASH) will do the GLOB wildcard expansion *before* the redirection file creation. C Shells do the redirection file creation first, which can be more of an unexpected problem. The nl (number lines) program is used as the example program here - anything that reads files and produces output is at risk: $ cp /etc/passwd bar # create a file larger than a disk block $ touch foo $ nl * >foo # WRONG! GLOB * input files match redirection output file! ^C # interrupt this command immediately before your disk is full! $ ls -l -rw-rw-r-- 1 idallen idallen 194172 Feb 15 05:19 bar -rw-r--r-- 1 idallen idallen 289808384 Feb 16 05:20 foo Here is what happens to make the output file "foo" grow forever: 1. Shell expands "*" to match all the pathnames, that is "bar" and "foo". 2. Shell truncates >foo and gets it ready to receive stdout of command. 3. nl opens first file "bar" and sends the output to stdout (into foo). 4. nl opens next file "foo" and starts reading from the top of the file, writing output to the bottom of the file. This never finishes, and the file "foo" grows until all the disk space is used. Result: an infinite loop that fills up the disk drive as "foo" gets bigger and bigger. Fix #1: Use a hidden file name that GLOB doesn't match: $ nl * >.z - uses a hidden file name not matched by the shell "*" wildcard - the nl command is not given ".z" as an argument, so no loop occurs Fix #2 (two ways): Use a file in some other directory: $ nl * >../z $ nl * >/tmp/z - redirect output into a file that is not in the current directory so that it is not read by the nl command and no loop occurs ===================================== 2. Input Redirection - Standard Input ===================================== Many Unix commands read input from files, if file pathnames are given on the command line. If *no* file names are given, these commands usually read from standard input ("stdin"), which is usually connected to your keyboard. (You can send EOF to get the command to stop reading.) Example of the cat command reading from a file, then reading stdin when no files are supplied: $ cat /etc/passwd # cat reads content from the file /etc/passwd [...many lines print here...] $ $ cat # no files; cat reads standard input (your keyboard) you type lines here ^D # you signal keyboard EOF by typing ^D (CTRL-D) you type lines here # this is the output from cat $ Other examples of commands that may read from pathnames or from standard input: less, more, cat, head, tail, sort, wc, grep, nl, uniq, etc. Commands such as the above may read standard input. They will read your keyboard *only* if there are *no* pathnames to read on the command line, and *no* input redirection is involved: $ wc foo # wc opens and reads file "foo"; wc completely ignores stdin $ wc # wc opens and reads standard input = your keyboard $ cat foo # cat opens and reads file "foo"; cat completely ignores stdin $ cat # cat opens and reads standard input = your keyboard $ tail foo # tail opens and reads "foo"; tail completely ignores stdin $ tail # tail opens and reads standard input = your keyboard [...etc. for all commands that can read from stdin...] To tell a command to stop reading your keyboard, send it an EOF (End-Of-File) indication, usually by typing ^D (Control-D). If you interrupt the command (e.g. by typing ^C), you may kill the command and the command may not produce any output at all. 2.1 Not all commands read standard input ---------------------------------------- Not all commands read from standard input, because not all commands read data from files supplied on the command line. Examples of common Unix commands that don't read any data from files or standard input: ls, date, who, pwd, echo, cd, hostname, ps, etc. # NEVER READ STDIN All the above commands have in common the fact that they *never* open any files for reading on the command line. If a command never reads any data from any files, it will never read from your keyboard, and it will never read any data from standard input. The Unix copy command "cp" obviously reads content from files, but it never reads file data from standard input because, as written, it always has to have both a source and destination pathname argument. The cp command must always have an input file name. It never reads stdin. 2.2 shell redirection of standard input --------------------------------------- The shell meta-character '<' signals that the next word on the command line is an *input* file (not a program) that should be made available to a command on standard input. Using the shell meta-character '<', you can tell the shell to use input redirection to change from where standard input comes, so that it doesn't come from your keyboard but instead comes from an input file. You can only use standard input redirection on a command that would otherwise read your keyboard. If the command doesn't read your keyboard (standard input) *without* the redirection, adding the redirection does nothing and is ignored. The redirection only works if, without redirection, the command *would* read your keyboard. If (and only if!) a command reads from standard input, the redirected standard input will cause the program to read from whatever the shell attaches to standard input. Here are examples using the shell to attach files to commands that are all reading standard input: $ cat food # reads from file "food" $ cat # reads from stdin (from your keyboard) $ cat file # first, put the number 30 into a file $ sleep 10 # sleep never reads stdin $ sleep 10 myfile # WRONG! $ sort myfile # WRONG! $ head myfile # WRONG! Given the above, why is "myfile" not left empty in the following case? $ wc myfile # WRONG! ==================================== 3. Redirection into programs (Pipes) ==================================== Since the shell can redirect both the output of programs and the input of programs, it can connect (redirect) the output of one program into the input of another program. This is called "piping" and uses the "pipe" meta-character '|' (shift-'\'), e.g. $ date | wc 3.1. Rules for Pipes -------------------- 1. Pipe redirection is done by the shell, first, before file redirection. 2. The command on the left of the pipe must produce some standard output. 3. The command on the right of the pipe must want to read standard input. The shell meta-character "|" ("pipe") signals the start of another command on the command line. The standard output (only stdout; not stderr) of the command on the immediate left of the "|" is attached/connected ("piped") to the standard input of the command on the immediate right: $ date Mon Feb 27 06:37:52 EST 2012 $ date | wc 1 6 29 (Note that the newline character at the end of a line is counted by wc.) You can approximate some of the behaviour of a pipe using a temporary file for intermediate storage before using the second command: $ date >out ; wc out ; less out # huge output of find has to finish first (slow) $ find / | less # huge output of find goes directly into "less" The pipe requires no temporary file, and so as soon as the command on the left of the pipe starts producing standard output, it goes directly into the standard input of the command on the right. If the command on the left never finishes, the command on the right will continue to wait for more input, processing it as it appears. If the command on the left does finish, the command on the right sees an EOF (end-of-file) on the pipe (its standard input). As with EOF from a file, EOF usually means that the command on the right will finish processing, produce its last output, and exit. Recognizing pipes and splitting a command line into piped commands is done first, *before* doing file redirection. File redirection happens second (after pipe splitting), and if present, has precedence over pipe redirection. (The file redirection is done *after* pipe splitting, so it always wins, leaving nothing for the pipe.) $ ls -l | wc # correct - output of ls goes into the pipe 2 11 57 $ ls -l >out | wc # WRONG! - output of ls goes into the file 0 0 0 # wc reads an empty pipe and outputs zeroes - shell first splits the line on the pipe, redirecting the output of the command on the left into the input of the command on the right, but: - then the shell processes the standard output file redirection on the "ls" on the left and changes the "ls" standard output into the file "out" - finally, the shell finds and runs both commands simultaneously - all the standard output from "ls" goes into the file "out"; nothing is available to go into the pipe - wc counts an empty input from the pipe and outputs: 0 0 0 As with file output redirection, you can only redirect into a pipe the standard output that you can *see*; redirection never creates output: $ cp /etc/passwd x # no output visible on standard output $ cp /etc/passwd x | cat # no output is passed to "cat" $ cd /tmp # no output visible on standard output $ cd /tmp | head # no output is passed to "head" $ touch x ; rm x # no output from rm on standard output $ touch x ; rm x | wc # no output is passed to "wc" 0 0 0 # wc counts an empty input from the pipe As with file redirection, you need the special syntax "2>&1" to redirect both stdout and stderr both into a pipe. Recall that "2>&1" means "redirect standard error to go to the same place as standard output", so if standard output is already going into a pipe, "2>&1" will send standard error there too: $ ls /etc/passwd nosuchfile # no redirection used ls: cannot access nosuchfile: No such file or directory # STDERR unit 2 /etc/passwd # STDOUT unit 1 $ ls /etc/passwd nosuchfile | wc # only stdin is redirected to "wc" ls: cannot access nosuchfile: No such file or directory # STDERR unit 2 1 1 12 # stdout went into the pipe to "wc" $ ls /etc/passwd nosuchfile 2>&1 | wc # both stdin and stderr redirected 2 10 68 # wc counts both lines from pipe Remember: Redirection can only go to *one* place, and file redirection always wins over pipes, because it is done after pipe splitting. $ ls /bin >out # all output from ls goes into file "out" $ ls /bin >out | wc # WRONG! output goes into "out", not into pipe 0 0 0 # wc counts an empty input from the pipe 3.2. Using commands as Filters ------------------------------ Note that many Unix commands can be made to act as "filters" - reading from stdin and writing to stdout, all supplied by the shell, without opening any pathnames themselves. With no file names on the command line, the commands read from standard input and write to standard output. The shell provides redirection for both standard input and standard output: $ grep "/bin/sh" /etc/passwd | sort | head -5 The "grep" command above is reading from the filename argument /etc/passwd given on the command line. (When reading from files, commands do not read from standard input. File names take priority over standard input.) The "sort" and "head" commands have no file names to read; this means they read from standard input, which is set up to be pipes by the shell. Both "sort" and "head" are acting as filters; they are reading from stdin and writing to stdout. (The "grep" command is technically not a filter - it is reading from the supplied argument pathname, not from stdin.) Remember: if file names are given on the command line, the commands ignore standard input and only operate on the file names. Look at this small change to the above pipeline: $ grep "/bin/sh" /etc/passwd | sort | head -5 /etc/passwd # WRONG! Above is the same command line as the previous example, except the "head" command is now ignoring standard input and is reading directly from its /etc/passwd filename argument. The "grep" and "sort" commands are doing a lot of work for nothing, since "head" is not reading the output of sort coming down the pipe. The head command is reading from the supplied file name argument /etc/passwd instead. File names take precedence over standard input. *** Commands ignore standard input if they are given file names to read. *** If a command does read from file names supplied on the command line, it is more efficient to let it open its own file name than to use "cat" to open the file and feed the data to the command on standard input. (There is less data copying done!) Do this: $ head /etc/passwd $ sort /etc/passwd Do not do this (wasteful of processes and I/O): $ cat /etc/passwd | head # DO NOT DO THIS - INEFFICIENT $ cat /etc/passwd | sort # DO NOT DO THIS - INEFFICIENT Advice: Let commands open their own files; don't feed them with "cat". 3.3 Examples of pipes --------------------- Problem: Display only lines 6-10 of the password file: $ head /etc/passwd | tail -n 5 # last five lines of first ten: lines 6-10 Problem: Display only the second-last line of the password file: $ tail -n 2 /etc/passwd | head -n 1 # first line of last two lines Problem: Which five files in current directory are largest: $ ls -s | sort -nr | head -n 5 $ ls -la | sort -k 5,5nr | head -n 5 - the sort command is sorting by the fifth field, numerically, in reverse Problem: "Count the number of each kind of shell in /etc/passwd." $ cut -d : -f 7 /etc/passwd | sort | uniq -c - the cut command picks out colon-delimited field 7 in the password file - the sort command puts all the shell names in order - the uniq command counts the adjacent names Problem: "Count the number of each kind of shell in /etc/passwd and display the results sorted in descending numeric order." $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr - use the previous pipeline and add to it: - sort the above output numerically and in reverse Problem: "Count the number of each kind of shell in /etc/passwd and display the top two results sorted in descending numeric order." $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr | head -n 2 - use the previous pipeline and add to it: - pick off only the top two lines of the above output Problem: Which ten IP addresses are trying most often to break into my machine: # grep 'refused connect' /var/log/auth.log \ | awk '{print $NF}' \ | sort | uniq -c | sort -nr | head - the grep command picks off the sshd lines containing the IP address - the awk command is displaying just the last field on each input line - the first (leftmost) sort command puts all the IP addresses in order - the uniq command is counting how many adjacent addresses are the same - the second sort command is sorting the count numerically, in reverse - the head picks off only the top ten addresses Problem: Display practice test and weekly file dates from the Course Notes: $ alias ee='elinks -dump -no-numbering -no-references' $ ee 'http://teaching.idallen.com/cst8207/12w/notes/' | grep 'practice' $ ee 'http://teaching.idallen.com/cst8207/12w/notes/' | grep 'week' Problem: Display the dates of the Midterm tests from the Home Page: $ ee 'http://teaching.idallen.com/cst8207/12w/' | grep 'Midterm' Problem: Display current Ottawa weather temperature and forecast: $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \ | grep -A1 'Temp' $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \ | grep -A2 'Today:' $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \ | grep -A2 'Tonight:' Problem: Display Ottawa tomorrow weather forecast: $ ee 'http://text.www.weatheroffice.gc.ca/forecast/city_e.html?on-118' \ | grep -A8 'Tonight:' | tail -n 5 Problem: Display the first top story from the BBC: $ ee 'http://www.bbc.co.uk/' | grep -A9 'Top stories' Problem: Display first top story in each subject from the BBC: $ ee 'http://www.bbc.co.uk/' | grep -A3 'Top Story' Problem: Display the current BBC weather for Vancouver: $ ee 'http://www.bbc.co.uk/weather/6173331' \ | grep -A19 'Observations' | tail -n 20 Problem: Display the current Space Weather forecast for Canada: $ ee 'http://www.spaceweather.gc.ca/index-eng.php' \ | grep -A10 'ISES Regional Warning Centre' Problem: Display the current phase of the Moon: $ ee 'http://www.die.net/moon/' \ | grep -A2 'Moon Phase' | head -n 3 | tail -n 1 3.4 Misuse of redirection into programs --------------------------------------- People are often misled into thinking that adding redirection to a command will create output that wasn't there before the redirection was added. It isn't so. Rules for Pipes: 1. Pipe redirection is done by the shell, first, before file redirection. 2. The command on the left of the pipe must produce some standard output. 3. The command on the right of the pipe must want to read standard input. If a Unix command that can open and read the contents of pathnames is not given any pathnames to open, it usually reads input lines from standard input (stdin) instead: $ wc /etc/passwd # wc reads /etc/passwd, ignores stdin and your keyboard $ wc # without a file name, wc reads stdin (your keyboard) If the command is given a pathname, it reads from the pathname and *always* ignores standard input, even if you try to send it something: $ wc # without a file name, wc reads standard input (keyboard) $ date | wc # wc opens and reads standard input, counts date output $ wc foo # wc reads foo; wc does not read stdin $ date | wc foo # WRONG! wc opens and reads foo; wc ignores stdin The above applies to every command that reads file content, e.g.: $ date | head foo # WRONG! head opens and reads foo; head ignores stdin $ date | less foo # WRONG! less opens and reads foo; less ignores stdin If you want a command to read stdin, you *cannot* give it any file name arguments. Commands with file name arguments *ignore* standard input; they should not be used on the right side of a pipe. Commands that are ignoring standard input (because they are opening and reading from pathnames on the command line) will always ignore standard input, no matter what silly things you try to send them on standard input: $ echo hi | head /etc/passwd # WRONG: head has a pathname and ignores stdin $ echo hi | tail /etc/group # WRONG: tail has a pathname and ignores stdin $ echo hi | wc .vimrc # WRONG: wc has a pathname and ignores stdin $ sort a | cat b # WRONG: cat has a pathname and ignores stdin $ cat a | sort b # WRONG: sort has a pathname and ignores stdin Standard input is thrown away if it is sent to a command that ignores it. The shell *cannot* make a command read stdin; it's up to the command. The command must *want* to read standard input, and it will *only* want to read standard input if you *leave off all the file names*. Commands that do not open and process the *contents* of files usually ignore standard input, no matter what silly things you try to send them on standard input. All these commands will never read standard input: $ echo hi | ls # NO: ls doesn't open files - always ignores stdin $ echo hi | pwd # NO: pwd doesn't open files - always ignores stdin $ echo hi | cd # NO: cd doesn't open files - always ignores stdin $ echo hi | date # NO: date doesn't open files - always ignores stdin $ echo hi | chmod +x . # NO: chmod doesn't open files - always ignores stdin $ echo hi | rm foo # NO: rm doesn't open files - always ignores stdin $ echo hi | rmdir dir # NO: rmdir doesn't open files - always ignores stdin $ echo hi | echo me # NO: echo doesn't open files - always ignores stdin $ echo hi | mv a b # NO: mv doesn't open files - always ignores stdin $ echo hi | ln a b # NO: ln doesn't open files - always ignores stdin Some commands *only* operate on file name arguments and never read stdin: $ echo hi | cp a b # NO: cp opens arguments - always ignores stdin Standard input is thrown away if it is sent to a command that ignores it. The shell *cannot* make a command read stdin; it's up to the command. Commands that might read standard input will do so only if *no* file name arguments are given on the command line. The presence of any file arguments will cause the command to ignore standard input and process the file(s) instead, and that means they cannot be used on the right side of a pipe to read standard input. File name arguments always win over standard input. Example of mis-used redirection: -------------------------------- The very long sequence of pipes below is pointless - the last (rightmost) command ("head") has a pathname and will open and read it, ignoring all the standard input coming from all the pipes on the left: $ head /etc/passwd | sort | tail | sort -r | head /etc/passwd The above mal-formed pipeline is equivalent to this (same output): $ head /etc/passwd If you give a command a file to process, it will ignore standard input, and so a command with a file name must not be used on the right side of any pipe. ========================== 4. Unique STDIN and STDOUT ========================== There is only one standard input and one standard output for each command. Each can only be redirected to *one* other place. You cannot redirect standard input from two different places, nor can you redirect standard output into two different places. The Bourne shells (including bash) do not warn you that you are trying to redirect the input of a command from two or more different places (and that only one of the redirections will work - the others will be ignored): bash$ wc a >b >c >d >e - the "date" output goes into file "e"; the other four output files are each created and truncated by the shell but they are all left empty because only the final redirection into "e" wins bash$ date >out | wc 0 0 0 - the "date" output goes into file "out"; nothing goes into the pipe (file redirection is done second and always wins over pipe redirection) Some shells (including the "C" shells, but not the Bourne shells) will try to warn you about silly shell redirection mistakes: csh% date a >b >c Ambiguous output redirect. csh% date >a | wc Ambiguous output redirect. The C shells tell you that you can't redirect stdin or stdout to/from more than one place at the same time. Bourne shells do not tell you - they simply ignore the "extra" redirections and do only the last one of each. ================================================ 5. tr - a command that does not accept pathnames ================================================ The Unix "tr" command is one of the few (only?) Unix commands that reads standard input but does *not* allow any pathnames on the command line - you must *always* supply input to "tr" on standard input: $ tr 'a-z' 'A-Z' file1 file2 >out # *** WRONG - ERROR *** tr: too many arguments $ cat file1 file2 | tr 'a-z' 'A-Z' >out # correct for multiple files $ tr 'a-z' 'A-Z' out # correct for a single file Note: System V versions of "tr" demand that character ranges appear inside square brackets, e.g.: tr '[a-z]' '[A-Z]' Berkeley Unix and Linux do not use the brackets. No version of "tr" accepts pathnames on the command line. All versions of "tr" *only* read standard input. Here is an incorrect example using the same output file as input: $ date >out $ tr ' ' '_' out # WRONG! Redirection output file is used as input file! 1. shell opens the file "out" as standard input to "tr" (due to "out") - original contents of "out" are lost - truncated - GONE! - before the shell even goes looking for the "tr" command to run! 3. shell finds and runs command "tr" with two string arguments ==> i.e. tr(' ','_') 4. command "tr" reads from standard input (tr always reads stdin) - standard input is attached to "out", now an empty file 5. standard output has been redirected by the shell to appear in file "out" - translating an empty input file produces no output in "out" Result: The file "out" is always empty. File "out" gets a translated copy of an empty input file; the file "out" is always left empty. RIGHT WAY (use two commands): $ tr ' ' '_' tmp && mv tmp out Problem: convert lower-case to upper-case from the "who" command: $ who | tr 'a-z' 'A-Z' Shell question: Are the single quotes required around the two arguments? (Are there any special characters in the arguments that need protection?) Using redirection, you can use a similar command to convert a lower-case file of text into upper-case. EXPERIMENT: Why doesn't this convert the file "myfile" to upper-case? $ date >myfile $ tr 'a-z' 'A-Z' myfile # WRONG! $ wc myfile 0 0 0 myfile # what happened? Why is the file "myfile" empty after this command is run? The following command line doesn't work because the programmer doesn't understand the "tr" command syntax: $ tr 'a-z' 'A-Z' myfile >new # WRONG! Why does this generate an error message from "tr"? (The "tr" command is unusual in its handling of command line pathnames. RTFM) The following command line redirection is faulty (input file is also output file); however, it sometimes works for small files: $ cat foo bar | tr 'a' 'b' | grep "lala" | sort | head >foo # WRONG! There is a critical race between the first "cat" command trying to read the data out of "foo" before the shell truncates it to zero when launching the "head" command at the end of the pipeline. Depending on the system load and the size of the file, "cat" may or may not get out all the data before the "foo" file is truncated or altered by the shell in the redirection at the end of the pipeline. Don't depend on long pipelines saving you from bad redirection! Never redirect output into a file that is being used as input in the same command or anywhere in the command pipeline. =================================================== 6. Do not redirect full-screen programs such as VIM =================================================== Full-screen keyboard interactive programs such as the VIM text editor do not behave nicely if you redirect their input or output - they really want to be talking to your keyboard and screen; don't redirect them or try to run them in the background using "&". You can hang your terminal if you try. ================================================= 7. Redirect *only* stderr into a pipe (ADVANCED!) ================================================= How do you redirect *only* stderr into the pipe, and let stdout go to the terminal? This is tricky; on the left of the pipe you have to swap stdout (attached to the pipe) and stderr (attached to the terminal). You need a temporary output unit (use "3") to record and remember where the terminal is (redirect unit 3 to the same place as unit 2: "3>&2"), then redirect stderr into the pipe (redirect unit 2 to the same place as unit 1: "2>&1"), then redirect stdout to the terminal (redirect unit 1 to the same place as unit 3: "1>&3"): $ ls /etc/passwd nosuchfile 3>&2 2>&1 1>&3 | wc # switch STDOUT and STDERR /etc/passwd # STDOUT appears on terminal 1 9 56 # STDERR goes into the pipe You seldom need to do this advanced trickery, even inside scripts. -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/