-------------------------------------- Order of Shell Command Line processing -------------------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com The shells try to make the command line interface easier to use by modifying the command line you type in various ways. The shells provide command history, aliases and functions, variables, and pathname expansion. The command that is actually executed may differ quite a bit from the actual text you enter using the keyboard, after the shell has performed all of its expansion and substitution processes. The full details of how the shell does command line processing for all possible circumstances takes several pages of documentation. To write scripts correctly, you need to know the basic rules. Here's just a taste of the complexity (not to be memorized), from the man page for bash: EXPANSION Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion. The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion. On systems that can support it, there is an additional expansion available: process substitution. Only brace expansion, word splitting, and pathname expansion can change the number of words of the expansion; other expansions expand a single word to a single word. The only exceptions to this are the expansions of "$@" and "${name[@]}" as explained above (see PARAMETERS). [...] Word Splitting The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting. The above paragraphs aren't very clear, and they are only part of the explanation. Fortunately, you can get by remembering just a few simple rules: -------- Summary: -------- You cannot have a command substitution [e.g. $(date)] work from inside a variable or parameter substitution. Quotes and backslashes also don't work when hidden inside variables. For the purposes of script-writing, you need to know the following shell processing order for command lines: 1. Initial word splitting into tokens (split on whitespace, semicolons, pipes, etc.) and identification of quoted strings. 2. Substitution of (unquoted) aliases and shell functions, followed by more word splitting and quote processing. (Yes, aliases may contain blanks, pipes, and quoted strings!) Quoted strings are only recognized during this first pass over the command line. Quotes have no meaning if they appear later from inside variable or GLOB expansions or substitutions. 3. Identification and removal of (unquoted) I/O redirection (pipes too). 4. Parameter, command, variable, and arithmetic expansion, followed by more word splitting (not when quoted). (But no quote processing is done - quotes are not special inside variables!) 5. Pathname expansion (GLOBbing, wildcards) (not when quoted). (But no word splitting or other special character processing is done after GLOBbing - special characters in file names [including blanks] will not be treated as special by the shell when they appear as a result of a GLOB expansion.) 6. Passing of the arguments to the command to be executed. Note: Some texts say parameter expansion happens *before* variable expansion; this is wrong. They happen at the same time, left-to-right. You can't have a parameter that expands to contain a variable and then have the variable also expand, or vice-versa. Note that pathname expansion happens last, so that GLOB (wildcard) characters hidden inside parameters and variables will be expanded (often unintentionally) unless the variable is double-quoted. Many novice shell programmers forget this. Always double-quote your variables! Also note that if a GLOB pattern matches a filename with a blank, the blank is not treated as a special character (it doesn't generate multiple file names) because word splitting on blanks happens *before* GLOBbing. After GLOBbing, the shell doesn't treat any characters as special, even blanks; so, no further blank-splitting is done. File names containing blanks don't get split up after GLOB. In other words: a GLOBbed filename containing one or more blanks (or any shell meta-characters) is still treated as a single file name argument by the shell, because the shell already did the word splitting on blanks and doesn't do it again on GLOB results. The same is *not* true for blanks ang GLOB patterns inside unquoted variables and parameters, which are seen by the shell and do cause the shell to split them into multiple arguments, which is why you *must* put all your variables and parameters inside double-quoted strings to protect them from unexpected blank splitting and GLOB expansion! After the shell is finished processing the command line, it identifies the first word (first token) on the command line as a command name, looks for it in the search path $PATH (only if the word doesn't contain a slash), and then runs that command (with its arguments) and waits for it to finish. An "&" at the end of the line will cause the command to be run "in the background"; the shell will not wait for it to finish before prompting you for another command. Remember that redirection is "removed" from a command line, no matter where on the line it is found; so, the "first word" on a command line is the first word *after* any redirection is removed from the line. The following lines are all identical to the shell; echo has one argument: $ echo hi >out $ echo >out hi $ >out echo hi -------------------------- Strict order of processing -------------------------- Example: Because of the strict order of processing of the command line, once the shell has looked for redirection on the command line, it will not look for it again, even if redirection characters appear from inside subsequent expansion of variables or GLOB patterns: $ echo This does go into >out # shell sees redirection $ cat out This does go into $ x=">out" $ echo This does not go into $x # shell does not see it This does not go into >out Redirection is processed before variable expansion; a variable hides the redirection from being seen as special by the shell. Example: Once the shell has recognized quoted strings on the command line, it will not treat quotes specially again, even if the quotes appear from inside subsequent expansion of variables or GLOB patterns: $ echo one "two" three # shell sees quotes one two three $ x='one "two" three' $ echo $x # shell does not see quotes one "two" three $ echo " * " * $ quote='"' $ echo $quote * $quote # shell does not see quotes " file1 file2 " Quoted strings are recognized and processed before variable expansion; a variable hides the quotes from being seen as special by the shell. The quotes appear "too late" to be recognized as special by the shell. Quotes hidden inside variables are not special characters. Example: Once the shell has expanded a variable, it does not try to expand it again, even if the variable contains what looks like another variable: $ x=foo $ echo $x foo $ y='$x' $ echo $y $x Variable expansion is only done once, left-to-right on the command line. A variable containing dollar signs hides the internal dollar signs from the shell; the internal dollar signs are not seen as special by the shell. --------- Examples: --------- Consider the following command sequence: $ mkdir empty $ cd empty $ ls -a . .. $ touch foobar fooforall $ ls -a . .. foobar fooforall $ x='foo*' # x contains 4 characters: foo* $ echo $x foobar fooforall How is this result "foobar fooforall" obtained from the given command? The shell will evaluate this command line using the six steps given above: 1. split the line into words; identify quoted pieces (no quotes): RESULT: echo $x 2. process aliases (none found) RESULT: echo $x 3. process redirection (none found) RESULT: echo $x 4. expand variables (expand variable $x into foo*) RESULT: echo foo* 5. expand GLOBs (match foo* against file names in directory) RESULT: echo foobar fooforall 6. execute the result OUTPUT: foobar fooforall Here are some more examples of how the shell's order of evaluation affects what results you get with characters that would normally be special to the shell: $ echo $(date) Sun Sep 22 21:37:37 EDT 2002 # visible command substitution works $ x='$(date)' # x contains 7 characters: $(date) $ echo Hello $x there. Hello $(date) there. # embedded substitution doesn't happen Above, the shell's order of evaluation means that a variable cannot contain another expanding variable or a working command substitution. It *looks* like a command substitution; but, it isn't seen that way by the shell. The $(date) is treated as ordinary text; nothing special. $ echo "hello" hello # visible quotes are seen by shell $ x='"hello there"' # x contains 11 characters: "hello there" $ echo Hello $x there. Hello "hello there" there. # embedded quotes are not special Above, the shell's order of evaluation means that a variable cannot contain working quotes. The shell looks for quotes *first*, before it looks for variables. Any quotes that appear in the command line from inside variables or GLOB expansions are not special to the shell. The double quotes around "hello there" are treated as ordinary text; nothing special. The second echo command, above, receives four separate command line arguments: arg1 - Hello arg2 - "hello arg3 - there" arg4 - there. Quotes are only recognized as special outside of variables and GLOB expansions. They don't work when hidden inside variables. Example: $ echo hi ; date hi # visible semicolon makes 2 commands Sun Sep 22 21:38:38 EDT 2002 # visible semicolon makes 2 commands $ x=' ; date' # x contains 7 characters: ; date $ echo Hello $x there. Hello ; date there. # embedded semicolon is not special Above, the shell's order of evaluation means that a variable cannot contain a working semicolon. The shell tokenizes the command line, looking for blanks, quotes, and semicolons, *first*, before it looks for variables. Any semicolons that appear in the command line from inside variables are not special to the shell. The semicolon is treated as ordinary text; nothing special. Things that the shell recognizes when typed in on the command line may not be recognized when coming from inside shell variables. The most important things that might be inside variables, that could cause problems later in the command line processing, are spaces and GLOBs (wildcards). Both spaces and GLOB characters *are* treated as special characters by the shell, even when they appear in the command line from inside variables. To turn off the blanks and GLOB characters from being special, put the variable in double quotes: $ x='this contains blanks' $ echo Five arguments $x $ echo Three arguments "$x" $ echo "One argument $x" ---------- Exercises: ---------- Put the following actions in their proper order: - shell expands $-variables - shell expands pathname GLOBs (wildcards, e.g. *) - shell looks for the command name in $PATH and runs it - shell identifies and removes redirection - shell splits the command line into words (tokens) - shell word-splits unquoted variable expansions - shell expands aliases Explain: $ mkdir empty $ cd empty $ touch aa ab ac ad ae af $ x='a* b* c* d*' $ echo $x aa ab ac ad ae af b* c* d* $ echo "$x" a* b* c* d* $ echo '$x' $x Review: Order of shell processing of command line. Use the shell's order of evaluation information to answer the following questions, in this order (pay careful attention to the use of spaces and single and double quotes): $ mkdir empty $ cd empty $ ls -a (Q1) What output appears here? $ x="foo*" $ echo $x (Q2) What output prints here? $ echo "$x" (Q3) What output appears here? $ echo '$x' (Q4) What output appears here? $ touch foobar $ ls (Q5) What output appears here? $ echo $x (Q6) What output prints here? $ echo "$x" (Q7) What output appears here? $ echo '$x' (Q8) What output appears here? $ echo hi >$x $ ls (Q9) What output appears here? $ echo there >"$x" $ ls (Q10) What output appears here? $ echo mom >'$x' $ ls (Q11) What output appears here? $ rm $x $ ls (Q12) What output appears here? Explain: (Note: The ls command behaves differently when writing file names to a pipe than when writing onto your terminal directly. To a terminal, ls displays several file names on one line. To a pipe, it displays file names only one per line.) $ mkdir empty $ cd empty $ x='a b c d' $ set | grep 'x=' x=a b c d $ touch $x $ ls | wc 4 4 8 # why are there four file names? What names? $ rm * $ touch "$x" $ ls | wc 1 4 8 # why is it only one file name now? What name? $ rm * $ touch '$x' $ ls | wc 1 1 3 # what is the name now? Review: Order of shell processing of command line. Explain: $ x='a b >out' $ echo $x a b >out The shell expands the variable "$x" to be "a b >out" (containing what looks like redirection) - why isn't the echo output "a b" being redirected into the file "out"? Why is the redirection being ignored? $ sort $x sort: open failed: a: No such file or directory sort: open failed: b: No such file or directory sort: open failed: >out: No such file or directory $ ls $x ls: a: No such file or directory ls: b: No such file or directory ls: >out: No such file or directory Why isn't the output of the above commands being redirected into file "out"? Review: Order of shell processing of command line. Explain: $ rm * $ echo a b >out $ ls out Above, all the output of echo goes into file "out". $ rm * $ touch 'a b >out' $ ls a b >out $ echo * a b >out The shell expands the GLOB pattern "*" to be "a b >out" (containing what looks like redirection) - why isn't the echo output "a b" being redirected into the file "out", like it did above? Why is the redirection being ignored? Review: Order of shell processing of command line. What is the *exact* output of this command sequence: $ foo='bar haven' $ echo foo $foo "$foo" '$foo' bar NOTE: blanks will be counted in the answer How many arguments are seen by the echo command, above? Reminder: After an unquoted variable expands, the shell again performs word-splitting (tokenizing) on any blanks contained in the text that was inside the variable. What is the *exact* output of this command sequence: $ alias foo='echo hi ; echo "a b"' $ foo bar How many arguments are seen by the first echo command? How many arguments are seen by the second echo command? Reminder: After an alias expands, the shell re-tokenizes the entire contents of the alias expansion, just as if you had typed it on the command line yourself. Explain: $ echo a ; date a Sun Sep 22 22:34:27 EDT 2002 $ x=';' $ echo a $x date a ; date Reason: The shell splits up the line into tokens before it expands variables. By the time the semicolon comes out from inside the variable, the shell has already decided that this is only one single command line, not two - the semicolon hidden inside the variable is not treated specially this late in the command line processing. Most shell special characters can be safely hidden inside even unquoted variables. The major exceptions are GLOB characters and blanks, which the shell still treats specially even when coming from inside unquoted variables. Remember to double-quote all your variables and command substitutions, to prevent embedded blanks and GLOB characters from being seen by the shell!