-------------------------------------- Order of Shell Command Line processing -------------------------------------- -IAN! idallen@ncf.ca If you do nothing else, memorize the Summary, below. The shells try to make the command line interface easier to use by modifying the command line you type in various ways. The shells provide command history, aliases and functions, variables, and pathname expansion. The command that is actually executed may differ quite a bit from the actual text you enter using the keyboard, after the shell has performed all of its expansion and substitution processes. The order in which the shell applies various processes (word splitting, quote processing, variable substitution, pathname expansion, etc.) to each command line is partially outlined in "Order of Expansion" (Chapter 10, p.358), except that the book is wrong. Correct the textbook p.358: Order of Expansion (p.358) ------------------ 0. Brace expansions (OMIT this for this course) 1. First, the shell does all tilde, parameter, variable, and command substitutions. NOTE: A variable only expands *once*, even if it contains what looks to be another variable: bash$ x=foo bash$ y='$x' bash$ echo $y $x Variables only expand *once*. 1B. If the shell expanded anything in #1, above, and the expansion is *not* quoted in any way, the shell word-splits the interpolated expansion text on white space into separate words again: bash$ x='one two' bash$ ./argv.sh $x Argument 0 is [./argv.sh] Argument 1 is [one] Argument 2 is [two] The unquoted variable $x has the interpolated text it contains re-split into separate words. If you don't want this to happen, double-quote your variables to protect the blanks. bash$ y='one two' bash$ ./argv.sh "$y" Argument 0 is [./argv.sh] Argument 1 is [one two] 2. Last, the shell does pathname expansion (GLOBbing) on any words containing unquoted GLOB characters. This happens *after* variable expansion, so an unquoted variable containing a GLOB pattern will be GLOB expanded (this is usually bad)! bash$ touch foo1 foo2 foo3 foo4 bash$ x='foo*' bash$ echo $x <== shell does GLOB on unquoted $x foo1 foo2 foo3 foo4 If you don't want GLOBbing to happen (and you almost never do!), double-quote your variables to protect the GLOB characters. bash$ echo "$x" <== quoting protects GLOB foo* Double-quotes permit the $-variables to expand; but, they stop the shell from splitting the expanded text on blanks, and they stop the shell from seeing GLOB characters. The important difference to note here, with respect to the textbook, is that textbook items 2,3,4,5,6 happen *all at the same time*, not one after the other. The shell does *not* first do tilde expansion, and then process the output of the tilde expansion to do parameter expansion. It does *not* do parameter expansion, and then process the output the parameter expansion and do variable expansion. It does *all* of these, only *once*, at the same time, across the whole command line. The full details of how the shell does command line processing for all possible circumstances takes several pages of documentation. To write scripts correctly, you need to know at least the basic rules. Fortunately, you can get by remembering just a few simple rules: ------------------------------- Summary: What You Must Remember ------------------------------- For the purposes of this script-writing course, you need to know the shell processing order for the following features (this is a summary of the rules spread through Chapter 10): Order of Shell Command Line Processing (VERY IMPORTANT!) -------------------------------------------------------- 1. Initial word splitting into tokens (split on whitespace, semicolons, pipes, etc.) and identification of quoted strings. 2. Substitution of (unquoted) aliases and shell functions, followed by more word splitting and quote processing. (Yes, aliases may contain blanks, pipes, and quoted strings!) This is the last time that quotes are special to the shell. The only quotes that behave like quotes are those that are in aliases or are typed and visible on the command line. Quotes hidden inside variables and expansions (below) are ordinary characters; they are *not* special to the shell! 3. Identification and removal of (unquoted) I/O redirection (pipes too). 4. Parameter, command, variable, and arithmetic expansion, followed by more word splitting unless the item is double-quoted. (But no quote processing is done on the interpolated text of the expansions - quotes are not special *inside* variables!) 5. Pathname expansion (GLOBbing, wildcards) (not when quoted). (But no word splitting or other special character processing is done after GLOBbing - special characters in file names [including blanks] will not be treated as special by the shell!) 6. Passing of the arguments to the command to be executed. Note that pathname expansion happens last, so that GLOB (wildcard) characters hidden inside parameters and variables will be expanded (often unintentionally) unless the variable is double-quoted. Many novice shell programmers forget this. Double-quote all your variables! Also note that if a GLOB pattern matches a filename with a blank, the blank is not treated as a special character (it doesn't generate multiple file names) because word splitting on blanks happens *before* GLOBbing. A GLOBbed filename containing one or more blanks (or any shell meta-characters) is still treated as a single file name argument by the shell, because the shell already did the word splitting and doesn't do word splitting again on GLOB results, even if the GLOB results contain blanks or other special characters. The same is *not* true for blanks inside unquoted variables and parameters: blanks coming from inside unquoted variables do cause multiple arguments (because the shell re-scans them for blanks), which is why you *must* put all your variables and parameters inside double-quoted strings to protect them from unexpected blank and GLOB expansion! After the shell is finished processing the command line, it identifies the first word (first token) on the command line as a command name, looks for it in the search path $PATH (only if the word doesn't contain a slash), and then runs that command (with its arguments) and waits for it to finish. (An "&" at the end of the line will cause the command to be run "in the background"; the shell will not wait for it to finish before prompting you for another command.) -------------------------- Strict order of processing -------------------------- Example: Because of the strict order of processing of the command line, once the shell has looked for redirection on the command line, it will not look for it again, even if redirection characters appear from inside subsequent expansion of variables or GLOB patterns: $ echo This does go into >out # shell sees redirection $ cat out This does go into $ x=">out" $ echo This does not go into $x # shell does not see redirection This does not go into >out Redirection is processed before variable expansion; a variable hides the redirection from being seen as special by the shell. Example: Once the shell has recognized quoted strings on the command line, it will not treat quotes specially again, even if the quotes appear from inside subsequent expansion of variables or GLOB patterns: $ echo one "two" three # shell sees quotes on line one two three $ x='one "two" three' $ echo $x # shell does not see embedded quotes one "two" three Quoted strings are recognized and processed before variable expansion; a variable hides the quotes from being seen as special by the shell. The quotes appear "too late" to be recognized as special by the shell. Example: Once the shell has expanded a variable, it does not try to expand it again, even if the variable contains what looks like another variable: bash$ x=foo bash$ echo $x foo bash$ y='$x' bash$ echo $y $x Variable expansion is only done once, left-to-right on the command line. A variable containing dollar signs hides the internal dollar signs from the shell; the internal dollar signs are not seen as special by the shell. --------- Examples: --------- Consider the following command sequence: $ mkdir empty $ cd empty $ ls -a . .. $ touch foobar fooforall $ ls -a . .. foobar fooforall $ x='foo*' # x contains 4 characters: foo* $ echo $x foobar fooforall How is this result "foobar fooforall" obtained from the given command? The shell will evaluate this command line using the six steps given above: $ echo $x 1. split the line into words; identify quoted pieces (there are no quotes visible in the given line): RESULT: echo $x 2. process aliases (none found) RESULT: echo $x 3. process redirection (none found) RESULT: echo $x 4. expand variables (expand variable $x into foo*) RESULT: echo foo* 4B. word-split any unquoted interpolated strings (unquoted foo* is re-scanned for blanks - none found) RESULT: echo foo* 5. expand GLOBs (match foo* against all file names in current directory) RESULT: echo foobar fooforall 6. execute the result OUTPUT: foobar fooforall Here are some more examples of how the shell's order of evaluation affects what results you get with characters that would normally be special to the shell: $ echo $(date) Sun Sep 22 21:37:37 EDT 2002 # visible command substitution works $ x='$(date)' # x contains 7 characters: $(date) $ echo Hello $x there. Hello $(date) there. # embedded substitution doesn't happen Above, the shell's order of evaluation means that a variable cannot contain another expanding variable or a working command substitution. It *looks* like a command substitution; but, it isn't seen that way by the shell. The $(date) is treated as ordinary text; nothing special. $ echo "hello" hello # visible quotes are seen by shell $ x='"hello there"' # x contains 11 characters: "hello there" $ echo Hello $x there. Hello "hello there" there. # hidden, embedded quotes are not special Above, the shell's order of evaluation means that a variable cannot contain working quotes. The shell looks for quotes *first*, before it looks for variables. Any quotes that appear in the command line from inside variables or GLOB expansions are not special to the shell. The double quotes around "hello there" are treated as ordinary text; nothing special. The second echo command, above, receives four separate command line arguments: arg1 - Hello arg2 - "hello arg3 - there" arg4 - there. Quotes are only recognized as special *before* variable and GLOB expansions. Example: $ echo hi ; date hi # visible semicolon makes 2 commands Sun Sep 22 21:38:38 EDT 2002 # visible semicolon makes 2 commands $ x=' ; date' # x contains 7 characters: ; date $ echo Hello $x there. Hello ; date there. # embedded semicolon is not special Above, the shell's order of evaluation means that a variable cannot contain a working semicolon. The shell tokenizes the command line, looking for blanks, quotes, and semicolons, *first*, before it looks for variables. Any semicolons that appear in the command line from inside variables are not special to the shell. The semicolon is treated as ordinary text; nothing special. Things that the shell recognizes when typed in on the command line may not be recognized when coming from inside shell variables. The most important things that might be inside variables, that could cause problems later in the command line processing, are spaces and GLOBs (wildcards). Both spaces and GLOB characters *are* treated as special characters by the shell, even when they appear in the command line from inside variables. To turn off the blanks and GLOB characters from being special, put the variable in double quotes: $ x='this contains blanks' $ echo Five arguments $x $ echo Three arguments "$x" $ echo "One argument $x" ---------- Exercises: ---------- Put the following actions in their proper order: - shell expands $-variables - shell expands pathname GLOBs (wildcards, e.g. *) - shell looks for the command name in $PATH and runs it - shell identifies and removes redirection - shell splits the command line into words (tokens) - shell word-splits unquoted variable expansions - shell expands aliases Explain: $ mkdir empty $ cd empty $ touch aa ab ac ad ae af $ x='a* b* c* d*' $ echo $x aa ab ac ad ae af b* c* d* $ echo "$x" a* b* c* d* $ echo '$x' $x Review: Order of shell processing of command line. Use the shell's order of evaluation information to answer the following questions, in this order (pay careful attention to the use of spaces and single and double quotes): $ mkdir empty $ cd empty $ ls -a (Q1) What output appears here? $ x="foo*" $ echo $x (Q2) What output prints here? $ echo "$x" (Q3) What output appears here? $ echo '$x' (Q4) What output appears here? $ touch foobar $ ls (Q5) What output appears here? $ echo $x (Q6) What output prints here? $ echo "$x" (Q7) What output appears here? $ echo '$x' (Q8) What output appears here? $ echo hi >$x $ ls (Q9) What output appears here? $ echo there >"$x" $ ls (Q10) What output appears here? $ echo mom >'$x' $ ls (Q11) What output appears here? $ rm $x $ ls (Q12) What output appears here? Explain: (Note: The ls command behaves differently when writing file names to a pipe than when writing onto your terminal directly. To a terminal, ls displays several file names on one line. To a pipe, it displays file names only one per line.) $ mkdir empty $ cd empty $ x='a b c d' # variable x contains seven characters: a b c d $ set | grep 'x=' x=a b c d $ touch $x $ ls | wc 4 4 8 # why are there four file names? What names? $ rm * $ touch "$x" $ ls | wc 1 4 8 # why is it only one file name now? What name? $ rm * $ touch '$x' $ ls | wc 1 1 3 # what is the name now? Review: Order of shell processing of command line. Explain: $ x='a b >out' $ echo $x a b >out The shell expands the variable "$x" to be "a b >out" (containing what looks like redirection) - why isn't the echo output "a b" being redirected into the file "out"? Why is the redirection being ignored? $ x='a b >out' $ sort $x sort: open failed: a: No such file or directory sort: open failed: b: No such file or directory sort: open failed: >out: No such file or directory $ ls $x ls: a: No such file or directory ls: b: No such file or directory ls: >out: No such file or directory Why isn't the output of the above commands being redirected into file "out"? Review: Order of shell processing of command line. Explain: $ rm * $ echo a b >out $ ls out Above, all the output of echo goes into file "out". $ rm * $ touch 'a b >out' $ ls a b >out $ echo * a b >out The shell expands the GLOB pattern "*" to be "a b >out" (containing what looks like redirection) - why isn't the echo output "a b" being redirected into the file "out", like it did above? Why is the redirection being ignored? Review: Order of shell processing of command line. What is the *exact* output of this command sequence: $ foo='bar haven' $ echo foo $foo "$foo" '$foo' bar NOTE: blanks will be counted in the answer How many arguments are seen by the echo command, above? Reminder: After an unquoted variable expands, the shell again performs word-splitting (tokenizing) on any blanks contained in the text that was inside the variable. What is the *exact* output of this command sequence: $ alias foo='echo hi ; echo "a b"' $ foo bar How many arguments are seen by the first echo command? How many arguments are seen by the second echo command? Reminder: After an alias expands, the shell re-tokenizes the entire contents of the alias expansion, just as if you had typed it on the command line yourself. Explain: $ echo a ; date a Sun Sep 22 22:34:27 EDT 2002 $ x=';' $ echo a $x date # the semicolon is hidden from the shell a ; date Reason: The shell splits up the line into tokens before it expands variables. By the time the semicolon comes out from inside the variable, the shell has already decided that this is only one single command line, not two - the semicolon hidden inside the variable is not treated specially this late in the command line processing. Most shell special characters can be safely hidden inside even unquoted variables. The major exceptions are GLOB characters and blanks, which the shell still treats specially even when coming from inside unquoted variables. Remember to double-quote all your variables and command substitutions, to prevent embedded blanks and GLOB characters from being seen by the shell!