================================================ Unix/Linux Shell Command Line Quoting Mechanisms ================================================ - Ian! D. Allen - idallen@idallen.ca - www.idallen.com Another Quote Tutorial: http://www.grymoire.com/Unix/Quote.html Shells read input and find and run commands, passing command line arguments to the commands. Shells treat a number of characters specially on the command lines; these are called shell "meta-characters". The most common shell meta-character is the blank or space character that shells use to separate arguments. The shell does not pass any blanks and spaces to the command; the shell uses the blanks and spaces to separate and identify individual command line arguments. One blank separates arguments the same way as ten or a hundred. For example: $ echo a b c # one blank between arguments a b c $ echo a b c # many blanks between; same thing a b c Other shell meta-characters include: "$", "*", ";", ">", "?", "&", "|", etc. "Quoting" is the generic name given to the action of protecting shell meta-characters from being treated specially by the shell. A "quoted" blank does not separate arguments. A "quoted" semicolon does not separate commands. A "quoted" asterisk (or star "*") does not match file names. Quoting is used to prevent the shell from acting on and expanding the meta-characters. The quoting causes the shell to ignore the special meaning of the character, so that the character gets passed unchanged to a command as part of an argument. Quoting is done using either double quotes, single quotes, or backslash characters. While technically we only need to apply the quoting to the individual shell meta-characters we want to protect from the shell, not to the whole command line argument, it looks better to surround the whole argument with matching quote characters. For example, we can use double or single quotes to "quote" the asterisk and blanks in argument to the echo command: $ echo "* "star # quote only the four meta-characters * star $ echo "* star" # quoting the whole argument looks better * star $ echo '* star' # single quotes work, too * star Backslashes may also be used to "quote" or turn off the special meaning of each immediately following character, one at a time. For example: $ echo \*\ \ \ star * star Quotes and backslashes tell the shell which parts of the input to treat as ordinary (not special) characters. The quoting delimits (identifies) a string of characters; the quoting mechanism is removed and is not part of the string passed to the command. Only the characters being quoted are passed to the command. The shell removes the quoting mechanism before passing the delimited text as an argument to a command. In the all the examples above, the echo command receives *one* command line argument containing an asterisk, three spaces, and the word star. The quotes and backslash characters used to protect the meta-characters are removed before the argument is passed to the command. The command sees only the one eight-character argument, no matter what kind of quoting was used. All the quoting syntax is removed. ======================================= Which programs have quoting mechanisms? ======================================= Quotes, both double and single, and backslashes are treated as special shell meta-characters when fed as input to the Unix shells, usually by typing them on shell command lines or putting them inside shell scripts. Quotes and backslashes are *not* special characters when fed as input to most *other* non-shell programs such as "cat", "head", "sort", "wc", etc.: $ echo "It's a nice day" <= shell will process a quoted string It's a nice day. <= note how quotes have been removed $ cat "It's a nice day" <= typed input to cat command "It's a nice day" <= unchanged program output on stdout ^D $ head -1 "It's a nice day" <= typed input to head command "It's a nice day" <= unchanged program output on stdout $ $ wc "hi" <= typed input to wc command ^D 1 1 5 <= five characters, including the newline $ echo "wc" | wc <= shell processes quoted argument 1 1 3 <= only 3 characters, including the newline The shell treats quotes and backslashes specially when they are used on a command line or in a shell script. Shells do this because shells are designed to find and run commands, and supply arguments to the commands. Sometimes you need to supply arguments to commands that contain special meta-characters that the shell would otherwise expand or process. The quoting mechanism stops the shell from processing the special characters. Most other programs do not treat quotes or backslashes specially when they read them as input. They are just normal characters. =================================== Why do we need a quoting mechanism? =================================== The quoting mechanism protects the special shell meta-characters from expansion by the shell. The protected characters are passed as part of command arguments. The shells can use single- and double-quote characters to delimit and protect strings of characters used as arguments to commands. Backslashes also protect special characters from the shell; but, sometimes they are less convenient to use. For example, one could protect all the special shell meta-characters in a line of text, each with its own backslash: $ echo \*\*\*\ Here\ is\ a\ backslash\ \"\\\".\ \*\*\* *** Here is a backslash "\". *** Note that quotes and backslashes are themselves meta-characters to the shell, so if you want quotes and backslashes to appear in command line arguments you have to quote them with backslashes to protect them! Alternately, one could surround the shell meta-characters, and the whole argument string, with matching single quotes to protect the whole string from the shell: $ echo '*** Here is a backslash "\". ***' *** Here is a backslash "\". *** In most cases the use of quotes is easier than using a lot of backslashes. Which quotes can we use to surround a string that contains both single and double quotes? We need to put the single quotes inside of double quotes to protect them, and put the double quotes inside of single quotes to protect them: $ echo "single ' quote" single ' quote $ echo 'double " quote' double " quote Single quotes are not meta-characters inside of double-quoted strings, and double quotes are not meta-characters inside of single-quoted strings. To quote a command line argument that contains both types of quotes, we need to alternate quoting mechanisms in the same line: $ echo "single ' and "'double " quote' single ' and double " quote Note how the double quoted section ends at the second double quote, after which we immediately start a single-quoted section that extends to the end of the line. The single quote is treated as an ordinary character inside double quotes and the double quote is treated as an ordinary character inside single quotes. The quotes and backslashes are used by the shells to locate a string of characters to protect. The quotes and backslashes themselves are not part of the string; they are removed as the string of characters is collected. For example: $ echo "hi ho" hi ho $ echo hi\ \ ho hi ho The double quotes and backslashes above delimit a single six-character string. The shell collects one six-character command line argument, and passes that argument to the echo command. The quotes and backslashes used in the quoting mechanism are not part of the string - they only delimit the string and protect the meta-characters. The quoting mechanism is removed from the string as it is collected and before it is passed to the command. Without the quoting mechanism, blanks act as meta-characters to separate command line arguments and the shell creates two two-character arguments for the echo command, instead of one six-character argument: $ echo hi ho hi ho Blanks are meta-characters to the shell unless they are hidden from the shell by quoting. The shell splits the command line into tokens on any un-quoted blanks. You may type one blank to separate the arguments or a hundred blanks; it makes no difference to the shell or to the arguments: $ echo a b c a b c $ echo a b c a b c The three one-character string arguments passed to the echo command in the above two cases are identical; the echo command sees none of the blanks because the shell uses unquoted blanks to separate arguments. This next echo command line has six string arguments. None of the quoting mechanism is part of the arguments; it is all stripped out: $ echo "a b" c 'd e' f "g h" "i j" a b c d e f g h i j ^^^^ ^ ^^^^ ^ ^^^^ ^^^^^^^ 1 2 3 4 5 6 Blanks that are quoted are not seen as special meta-characters by the shell; they do not separate arguments; they become ordinary characters made part of the argument string passed to the command being invoked. You can create a file name that is a blank using any of these quoting mechanisms to turn of its meta-character meaning: $ touch \ <= there is a space after the backslash $ touch " " <= there is a space inside the quotes $ touch ' ' <= there is a space inside the quotes ================ Quote Processing ================ Quotes are processed by the Unix shells from left-to-right. Let's examine how this works: $ echo 'one "two" three'four"five"six"seven 'eight' nine"ten one "two" threefourfivesixseven 'eight' nineten The first quote found by the shell, starting from the left, is a single quote. The shell collects all the characters up to the next single quote. All these characters are protected from further analysis by the shell. The words one, two, and three and the contained blanks are all inside this first single-quoted string. The double quotes, inside single quotes, have no special meaning to the shell. They are simply part of the string. Same with the blanks - they are quoted and thus do not perform their meta-character function of separating shell arguments. The single quotes themselves, used to delimit the quoted string, are not part of the string. The word four is outside of any quotes. Right after four we start a double-quoted string that extends until the next double-quote character. The word five is inside this double-quoted string. The double quotes themselves, used to delimit the string, are not part of the string. The word six is outside of any quotes. Right after six we start another double-quoted string that extends until the next double-quote character. The words seven, eight, and nine and the contained blanks are all inside this double-quoted string. The single quotes, inside double quotes, have no special meaning to the shell. They are simply part of the string. Same with the blanks - they are quoted and thus do not perform their meta-character function of separating shell arguments. The double quotes themselves, used to delimit the string, are not part of the string. The word ten is outside of any quotes, but since it is not separated from the other characters by any blanks, it is still part of the same shell argument. The given command line argument to "echo" was built up of text that contains three quoted strings. Each quoted string connected immediately to another non-blank character, and all the blanks are inside quoted strings were protected from the shell. Thus, the "echo" command is handed only *one* command line argument. (Command line arguments are separated by meta-character blanks. There were no unquoted blanks used.) The fact that this one argument was made up of several pieces on the command line, including both quoted and unquoted parts, is invisible to the echo command. The shell did the work; the echo command gets its one argument. The quotes seen by the shell are *not* part of the argument. The quotes delimit the string; they are not part of the string. Even though the shell used multiple quoting mechanisms to assemble the argument above, no unquoted blanks were found to separate arguments, so the string above is passed as *one* argument to the echo command. ======================== Single vs. Double Quotes ======================== Single quotes are "stronger" than double quotes. Nothing is special inside single quotes; the shell treats all the characters, no matter what they are, as part of the string being collected and it does not expand any of them inside single quotes. Inside double quotes, the shell still sees and expands *some* special meta-characters. The most important character expanded by the shell inside double quotes is the dollar sign that signals the start of a shell variable: $ echo '$SHELL' $SHELL <= single quotes prevent expansion $ echo "$SHELL" /bin/bash <= double quotes permit expansion Backslashes are also still special meta-characters inside double-quoted strings, so you can use backslashes inside double quotes to protect other double quotes (and other backslashes, and dollars, etc.): $ echo "This is a \"double\" quoted \$SHELL string and backslash \\." This is a "double" quoted $SHELL string and backslash \. For maximum protection and maximum quoting, use single quoted strings. ======================================= Quotes inside variables are not special ======================================= Due to the order of processing of the command line, all quoted strings are located and identified by the shell before any variables are expanded. This means quotes embedded inside variables are not seen as special characters by the shell: $ x='aaaaa " bbbbb' $ echo $x aaaaa " bbbbb In the above echo command line, the shell first looks for and identifies quoted strings (there are none) before it expands the variable $x, so the double quote inside $x appears "too late" - it is not treated as a special character by the shell. Only "exposed" quotes are treated as special on shell command lines, not quotes embedded inside variables. Exposed quotes are special; embedded quotes are not: $ mkdir empty ; cd empty; touch a b c d ; ls a b c d $ echo " * " * <= quotes protect the glob character $ x='"' $ echo $x * $x " a b c d " <= quotes inside variables are not special The quotes embedded inside the variable $x are not treated as special characters by the shell. Note: Blanks and glob (wildcard) characters embedded inside unquoted variables *are* seen as special to the shell and may cause multiple arguments to be created: $ x="a b c" $ touch $x ; ls <= creates three files; embedded blanks are special a b c $ touch "$x" ; ls <= creates another single file named: a b c a a b c b c d $ y='*' $ echo "$y" <= embedded glob char is protected by quoting * $ ls $y <= unprotected glob char matches four file names! a a b c b c You must double-quote all uses of variables to prevent the embedded blanks and glob (wildcard) characters from being seen as special by the shell after the variable expands. ===================================== Studies in Quoting Special Characters ===================================== Understand how the shell handles quotes and blanks: $ echo hi there hi there $ echo "hi there" hi there $ echo 'hi there' hi there Explain the above three outputs. How does the shell find arguments? How many arguments are passed to the "echo" command in each case? The "touch" command creates empty files by name. Try this: $ touch "a b" $ ls a b $ rm a b Explain the error message that is output by the above "rm" command. How many arguments are passed to the "touch" and "rm" commands? How can you remove a file name containing a special character? Here are some more things to try, and to understand. $ echo "'hello'" 'hello' $ echo '"hello"' "hello" $ mkdir empty ; cd empty ; touch a b c d ; ls a b c d $ echo ' * ' * $ echo '" * "' " * " $ echo '"' * '"' " a b c d " $ echo '"'" * "'"' " * " $ echo ' * ' * " * " * a b c d * $ echo \' * \' ' a b c d ' You must be able to predict the output of each of the above command lines without having to type them in to try them. How many arguments are there to the following echo command? $ echo abc \ def \\ ghi \ \ jkl How many characters are in each of the arguments? Use the "argv" program (available in the course notes area) to help you: $ ./argv abc \ def \\ ghi \ \ jkl Argument 0 is [./argv] Argument 1 is [abc] Argument 2 is [ ] Argument 3 is [def] Argument 4 is [\] Argument 5 is [ghi] Argument 6 is [ ] Argument 7 is [jkl] Since the shell is the program that parses arguments, the number of arguments passed to the argv program will be exactly the same as the number of arguments passed to the echo program (or to any program). When in doubt about how the shell will parse a command line, use echo or argv to confirm the arguments that the shell is providing. -------------------------------------------------------------------------- ADVANCED SHELL: Discussion of using quotes inside variables: How To -------------------------------------------------------------------------- How to make quotes inside variables work using the shell "eval" mechanism. From: "Ian! D. Allen" To: General Membership Discussion List Subject: Re: [oclug] Must be Friday..... > This doesn't work: > $ SUDO="/usr/bin/sudo -u root -p \"Enter password for user '%U': \"" > $ $SUDO ls > "Enter Most shell meta-characters such as quotes, backslashes, line separators, pipes, redirection, etc. don't have special meaning coming from inside shell variables. They also mean nothing if they appear in the command line due to GLOB pattern matches. The shell only treats them specially the first time it sees them on the command line: $ mkdir empty ; cd empty ; touch a b c # three visible files $ echo " * " * # the shell sees the quotes and protects the GLOB pattern $ q='"' $ echo $q * $q " a b c " # the hidden quotes have no special meaning $ echo hi ; date hi Sun Feb 25 18:55:34 EST 2007 $ s=';' $ echo hi $s date hi ; date # the hidden semicolon has no special meaning $ echo hi >out # put "hi\n" into file "out" $ r=">out" $ echo hi $r hi >out # the hidden >out has no special meaning > # eval $SUDO ls > Um.. okay, that works. > What is special about this case that requires the use of 'eval'? "eval" tells the shell to process the command line twice. This gives hidden meta-characters a second chance. The first time through, the shell expands the variable. The second time through, the shell now sees the quoted strings and treats them correctly (since they aren't hidden inside the variable any more). > # LS="ls -l" > # $LS /usr The above only works because no shell meta-character processing is needed. A simple variation fails: # LSW="ls -l | wc" # $LSW ls: |: No such file or directory ls: wc: No such file or directory Don't put commands inside variables - use shell aliases instead. > eval is the shell command that executes the contents of a shell variable. "eval" causes the shell to process what was in the variable as if you had typed it directly into the shell. It may "execute" and it may not, depending what else is on the command line in front of it: $ v="date" $ date=foo $ eval $v Sun Feb 25 19:06:12 EST 2007 $ eval echo $v date $ eval echo \$$v foo The last example above is how we do crude associative arrays in older versions of the Bourne shell. Assignment also has to use eval, since the shell won't assign to a name hidden inside a variable: $ echo $v $date date foo $ $v=hoho # wrong way bash: date=hoho: command not found $ eval $v=hoho # right way $ echo $v $date date hoho -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/