============================================= Return Code, Exit Status, test, if, and while ============================================= -Ian! D. Allen - idallen@idallen.ca ------------------------- Exit Status / Return Code ------------------------- Every process (command) on Unix can return a small integer to the operating system when it finishes, indicating a "return status" or "exit code". By convention, returning a zero usually means "the command worked" (success, good status); any other number means "it didn't work" (failure, bad status). Think of it as "there is only one way to succeed [only one zero]; there are many ways to fail [many non-zero]". The meaning of a particular non-zero exit code varies from command to command; there is no standard set of non-zero exit codes that apply across all commands. See the man pages for each command for details. The only standard is that "a zero exit code means success". (Don't confuse this with C language programs where a zero is FALSE!) Unix shells record the exit status of the most recently run command in a shell variable. (The exit status does not appear on the screen unless you show the contents of the variable that records it.) In Bourne shells (including BASH) that variable has the awkward name "$?": $ date Sat Oct 11 09:07:06 EDT 2003 $ echo $? 0 $ grep nosuchpattern /etc/passwd $ echo $? 1 $ grep nosuchpattern nosuchfile grep: nosuchfile: No such file or directory $ echo $? 2 Only the exit status of the immediately preceding command is available: $ test 1 -gt 2 $ echo $? 1 $ echo $? 0 C programmers may find the shell return zero meaning "success" backwards from what they are used to, where zero means FALSE and anything else means TRUE. Consider that most commands only have one way of succeeding (exit zero), but many ways of failing (exit anything non-zero). A zero return status means that the command succeeded. Some badly-written commands fail but still return zero. Sorry! ----------------------- Setting the Return Code ----------------------- In C or C++ language programs, the "exit(int i)" function sets the return code for the program as it exits. In shell scripts, the "exit" command takes a single numeric argument and does the same thing: #!/bin/sh -u echo "Hello World!" exit 0 If a script ends without any "exit" command, the return code of the script is the exit status of the last command executed by the script. A more complex example: #!/bin/sh -u grep "$1" /etc/passwd status=$? # save the return code in a variable echo "The grep program looked for '$1' and returned status $status" exit $status # exit script with same code as returned by grep ------------------------------------ Conditional Shell Control Structures ------------------------------------ Unix shells are designed to find and run commands, not do arithmetic. This means that the programming control structures of shells are based on the exit status of running commands, not on mathematical expressions, as they are in C language. The shell version of a control structure runs a command, and chooses a thread of execution based on the exit status of the command: if grep nosuchpattern /etc/passwd ; then echo "I found nosuchpattern in file /etc/passwd" else echo "I did not find nosuchpattern in file /etc/passwd" fi After the shell keyword "if" comes a *command line*, not an arithmetic or logical expression as would be found in most programming languages. If the exit status of the command is zero (success), the TRUE branch of the IF is taken, otherwise (failure; return code non-zero) the FALSE branch is taken. The shell WHILE loop operates the same way, using a command exit status (not an arithmetic expression!) to control the loop: while who | grep "idallen" ; do echo "idallen is still online" sleep 60 done As you can see in the above example, the command being executed can be a shell pipeline. Only the exit status of the last command in a pipeline is used by the shell. As long as the "grep" command succeeds in finding the string "idallen" in the output of the "who" command, the loop will continue. When the grep command does not find "idallen", it returns a bad exit status and the loop finishes. Sometimes you want to complement the return code of a command and execute code if the command fails instead of succeeds. To do this, place a "!" at the very front of the command line: $ grep nosuchpattern /etc/passwd $ echo $? 1 $ ! grep nosuchpattern /etc/passwd $ echo $? 0 if ! grep nosuchpattern file ; then echo "nosuchpattern was not found in file" fi while ! who | grep "idallen" ; do echo "idallen is not online yet" sleep 60 done Just as in many programming languages, the "!" means "NOT". A command that normally returns 0 will appear to return 1 and a command that returns non-zero will appear to return zero. Note that using "!" will turn a non-zero exit code into zero, making it unavailable in an error message: if ! somecommand ; then echo "somecommand: failed with exit status $?" # WRONG WAY! fi The above message always prints "status 0" on failure, since the "!" alway turns a non-zero exit code to zero. If you want to echo the exact non-zero exit status of a command, you can't use "!" in front of it: if somecommand ; then : do nothing on success else echo "somecommand: failed with exit status $?" # right way fi The shell built-in command ":" does nothing and ignores all its arguments. --------------------------- Making Shells do Arithmetic --------------------------- Despite the command-oriented nature of the Unix shell, people often want shells to manipulate numbers, do arithmetic, and execute conditional code based on numeric comparisons. Recent Bourne-style shells have added syntax to allow arithmetic expressions to follow the IF and WHILE keywords; but, this is not universal. We will follow the traditional shell syntax for handling numbers that works everywhere (at least back to 1972 or so). Since IF and WHILE each expect to be followed by a single Unix command, to compare numbers we must execute a command whose *return status* depends on the result of the command performing a comparison of numbers given on the command line. That comparison command is named "test", and it accepts blank-separated numeric arguments to be compared: $ test 1 -eq 1 $ echo $? 0 $ test 1 -gt 2 $ echo $? 1 Note that the "test" command normally has no output, unless something goes wrong. It only sets its return code, based on the tests you ask it to do. You use the "test" command in a conditional control structure just as you would use any other Unix command: if test 1 -eq 1 ; then echo "they are equal" else echo "they are not equal" fi The six numeric comparison operators are derived from similarly named operators from FORTRAN: -eq, -ne, -lt, -gt, -le, -ge. These operators do not use any shell meta-characters, unlike the traditional programming comparison operators "<", ">", "<=", ">=", etc. that look like redirection and would require quoting and protection from the shell. Usually one or both of the numbers in the comparison is a shell variable: if test $# -eq 0 ; then # $# contains the argument count echo "this script has no arguments" elif test $# -eq 1 ; then echo "this script has one argument" else echo "this script has more than one argument (I count $#)" fi The "test" command can test a wide variety of things - see the man page for the details. You should know the operation of these basic tests: - the six numeric tests: -eq, -ne, -lt, -gt, -le, -ge - string equality, inequality: =, != - the empty/nonempty string tests: -z, -n - basic file tests: -f, -d, -e, -s, -r, -w, -x - the Boolean NOT, AND, and OR operators: !, -a, -o Do not confuse or mix integer and string tests. The "test" command will generate errors if you try to test strings (even empty strings) using any of the six numeric comparison operators: $ test a -eq 1 test: a: integer expression expected $ test "" -eq 0 test: : integer expression expected --------------------------- Testing multiple conditions --------------------------- You can test more than one condition at the same time by separating the conditions with AND and OR Boolean operators. For the "test" command these are named "-a" and "-o": $ echo $$ # $$ contains the process ID 123 $ test $$ -gt 1000 -o $$ -lt 3000 ; echo $? 0 # return code 0 means success $ test $$ -gt 1000 -a $$ -lt 3000 ; echo $? 1 # return code 1 means failure The "test" command in the above example has seven arguments, consisting of two three-argument tests separated by a Boolean operator. It is still one single "test" command with one exit return code. As in C language, the Boolean operators must separate complete "test" expressions. You might think like this: test x to see if it is bigger than zero and less than ten but, you must code this for both shells and C language: test x to see if it is bigger than zero \ AND test x to see that it is less than 10 $ test $x -gt 0 -a -lt 10 # WRONG WRONG WRONG $ test $x -gt 0 -a $x -lt 10 # RIGHT Be careful when testing multiple conditions at the same time that you do not make the failure error message unhelpful: if test $x -gt 0 -a -f "$file" -a $y -lt 27 -a -z "$string" ; then ... do something useful ... else echo 1>&2 "$0: Error: ... ??? ..." fi The ??? error message would have to say "Error: $x is <= 0 or $file does not exist, is inaccessible, or is not a file, or $y is >= 27, or '$string' is not a null string". Which is it? Such a complex error message is not helpful to the users of your scripts! Use separate tests and separate error messages for each condition; don't bunch them together using Boolean -a or -o operators: # split the huge condition into more readable error messages: # if ! test $x -gt 0 ; then echo 1>&2 "$0: Error: $x is not > 0" exit 1 fi if ! test -f "$file" ; then echo 1>&2 "$0: Error: '$file' does not exist or is not a file" exit 1 fi if ! test $y -lt 27 ; then echo 1>&2 "$0: Error: $y is not < 27" exit 1 fi if ! test -z "$string" ; then echo 1>&2 "$0: Error: '$string' is not a null string" exit 1 fi ... all tests passed; now do something useful ... --------------- Syntactic Sugar --------------- (Syntactic sugar is a feature added to a language that makes it easier or more elegant for humans to use, but that does not increase the power or range of things that can already be done.) Someone in the Unix past decided that shell IF and WHILE statements should look more like the statements found in programming languages. The programmer came up with the idea of making an alias for the "test" command named "[" (left square bracket). The "test" command was rewritten so that, if it were called by the name "[", it would ignore a final argument of "]" (right square bracket). We could now replace this: if test 1 -eq 1 ; then echo "they are equal" fi with this, using "[" as an alias for "test": if [ 1 -eq 1 ] ; then echo "they are equal" fi This is still the "test" command, executing under the alias of the command name "[", and ignoring the final "]". Those square brackets look similar to the parentheses used in some programming languages; but, you must remember that they are *not* punctuation. Each one of those square brackets must be a separate blank-separated token to the shell, and that means both must be surrounded by blanks on both sides. Most shell scripts now use the "[" form of "test"; because, it looks nicer. (That's what we mean by "syntactic sugar".) Students of the shell must remember that this "[" form is *not* punctuation; it is simply a command name alias that happens to be a square bracket. Use blanks! ----------------- Avoiding Problems ----------------- Don't forget that shells aren't designed to do arithmetic. This is wrong: if $count -gt 0 ; then ... # WRONG WRONG WRONG The above line will have the shell expand the variable $count into some number and then try to execute that number as a Unix command. (The IF keyword must be followed by a Unix command name.) The error message "bash: 1: command not found" will not be very helpful. Always remember to code some Unix command name after IF and WHILE: if test $count -gt 0 ; then ... # RIGHT if [ $count -gt 0 ] ; then ... # RIGHT (syntactic sugar) Conversely, don't double up on the command name you use after IF: $ if [ grep foo /etc/passwd ] ; then date ; fi # WRONG WRONG WRONG [: foo: binary operator expected. The above syntactic-sugar line is equivalent to this incorrect line: $ if test grep foo /etc/passwd ; then date ; fi # WRONG WRONG WRONG test: foo: binary operator expected. The command name after the IF, above, is "test". The "test" command will see three command-line arguments ("grep", "foo", and "/etc/passwd") and complain that the middle one isn't an operator. If you want to test the return status of "grep", "grep" must be the command name that immediately follows the IF or WHILE keywords: $ if grep foo /etc/passwd ; then date ; fi # RIGHT The IF and WHILE keywords must always be followed by a command name, and that command name is exactly "[" in the syntactic-sugar rewritten form of "test". The following code does not work; because, blanks are missing around the first square bracket, making it an unknown command: $ if [1 -eq 1 ] ; then echo "ALWAYS USE BLANKS" ; fi sh: [1: command not found The shell sees "[1" as a two-character command name that doesn't exist. The following fails for the same reason: $ if [a=b] ; then echo "ALWAYS USE BLANKS" ; fi bash: [a=b]: command not found Square brackets are not punctuation! Always use blanks around "[" and "]". The arguments to the "test" command must always be separate command line arguments. This also fails because of the missing blank: $ if [ 1 -eq 1] ; then echo "ALWAYS USE BLANKS" ; fi [: missing `]' The "[" command is looking for the argument "]" (not "1]") as its last command line argument. The "test" command behaves differently depending on the number of arguments you pass to it: $ test 1 -eq 2 # three arguments: operator in middle $ test -f file # two arguments: operator on left $ test string # one argument: -n assumed on left If the "test" command has only one single command line argument, it defaults to using "-n" as the implied operator (test for non-empty string) on the one argument. The following one-argument tests are always TRUE, though they may not appear that way at first to human eyes: if test a=b ; then # THIS IS TRUE (good return code) ! if [ a=b ] ; then # THIS IS TRUE (good return code) ! if test 1=2 ; then # THIS IS TRUE (good return code) ! if [ 1=2 ] ; then # THIS IS TRUE (good return code) ! if test 0 ; then ... # THIS IS TRUE (good return code) ! if [ 0 ] ; then ... # THIS IS TRUE (good return code) ! In all the above lines, the "test" command has only one command line argument (not counting the trailing "]" that is always ignored). If the single argument to "test" is not the empty string, "test" returns a good status and the IF succeeds. The "test" command is defaulting to use an implied "-n" operator on the left. The shell is actually executing these tests: if test -n "a=b" ; then # THIS IS TRUE (good return code) ! if [ -n "a=b" ] ; then # THIS IS TRUE (good return code) ! if test -n "1=2" ; then # THIS IS TRUE (good return code) ! if [ -n "1=2" ] ; then # THIS IS TRUE (good return code) ! if test -n "0" ; then ... # THIS IS TRUE (good return code) ! if [ -n "0" ] ; then ... # THIS IS TRUE (good return code) ! The above are always true, because the three-character strings "a=b" and "1=2" are not empty and never will be empty, and the single-character string "0" is also never the empty string. Another common mistake is to forget how the shell uses the redirection metacharacters "<" and ">". Here are two identical mistakes: if test 1 > 2 ; then ... # THIS IS WRONG but TRUE - must use -gt not > if [ 1 > 2 ] ; then ... # THIS IS WRONG but TRUE - must use -gt not > The above two lines have the shell use redirection (">") to create a file named "2" and redirect the output of the "test" command into it. (The "test" command produces no output; the file remains empty.) The "test" command itself ends up with only one single command line argument, the digit "1". With one argument and no operators, the "test" command returns 0 if the argument is not the empty string ("test -n 1"). The string "1" is never empty, so the above test, and the IF, succeeds. The negation operator "!" may be used to the left of any single test used inside the "test" comand: if test ! -r "$file" ; then ... if [ ! -r "$file" ] ; then ... if test ! -z "$string" ; then ... if [ ! -z "$string" ] ; then ... The test command uses "!" to invert (complement) the sense of a boolean test. If you combine the negation operator of "test" with the shell return code negation operator mentioned earlier that also uses "!", you can end up with unreadable code: if ! test ! -r file ; then ... # UNREADABLE if ! [ ! "abc" != "def" ] ; then ... # EVEN MORE UNREADABLE Don't do that. Rework the expression to use only a single "!" (or none at all). Keep the negation operator as an argument to "test"; don't place it before the opening square bracket alias: if test -r file ; then ... # good, readable if [ "abc" != "def" ] ; then ... # good, readable C language programmers sometimes confuse the syntax of the Boolean operators AND and OR in the Unix "test" command: if [ $# != 1 || -z "$1" ] ; then echo hi ; fi # WRONG WRONG WRONG [: missing `]' bash: -z: command not found The Bourne shell "||" and "&&" operators separate commands. Use "-a" and "-o" to separate Boolean clauses to the "test" command: if [ $# != 1 -o -z "$1" ] ; then echo hi ; fi # RIGHT You can use "||" and "&&" between commands if you make sure each command is complete: if [ $# != 1 ] || [ -z "$1" ] ; then echo hi ; fi # A BIT INEFFICIENT Above, the "||" separates two different and complete "test" command executions. Rather than using the "test" command twice, you can simply join them into one using the correct Boolean operator: if [ $# != 1 -o -z "$1" ] ; then echo hi ; fi # RIGHT Less code is better code. ----------------------------- Opposites and false opposites ----------------------------- Note that the opposite of "-n" (is not an empty string) is "-z" (is an empty string), just as "=" (string equality) and "!=" (string inequality) are opposite tests. test ! -z "$foo" is equivalent to test -n "$foo" The opposite of "-lt" is not "-gt", it is "-ge". (If you are not younger than your sister, you are either older or the same age.) The opposite of "-gt" is not "-lt", it is "-le". The "test" operators "-f" and "-d" are *not* opposites. If a pathname is not a file, it may or may not be a directory. It could be a directory or any number of other special file types under Unix. ("/dev/null" is a common example of a pathname that is not a directory or a plain file.) You cannot replace "! -f" with "-d" or vice-versa. --------------------- Failure of file tests --------------------- If a "test" file operator (-r, -w, -x, -f, -d, -s, -e) succeeds, you also know that the pathname exists and you have permission to traverse all the directories leading up to it. If a "test" file operator fails, it may also fail because you have no permission to search one of the directories in the path, or because the pathname simply doesn't exist. Without first testing if you can access "$path", the following error message is misleading: if [ ! -r "$path" ] ; then echo 1>&2 "$0: '$path' is not readable" # POOR ERROR MESSAGE fi While true, the above error message is incomplete. The item in "$path" might not even exist; or, you might not have permission to traverse all the directories in its pathname. Saying the overall pathname is not readable is true; but, it is only part of the truth. A more accurate error message would be: if [ ! -r "$path" ] ; then echo 1>&2 "$0: '$path' is missing, inaccessible, or not readable" fi If you want to be more specific in your error message, you need the following code: if [ ! -e "$path" ] ; then echo 1>&2 "$0: '$path' does not exist or is not accessible by you" else # the pathname exists and is accessible; test readability: if [ ! -r "$path" ] ; then echo 1>&2 "$0: '$path' is not readable by you" fi fi The test for readability is now done only if the pathname exists and is accessible; if the test fails, you know the (existing, acessible) pathname item is truly not readable. The error message is more accurate now. Any time one of the file operator tests fails, be accurate in your error message. State whether the failure is due to a missing or inaccessible pathname, or due to a failure of the actual test being performed on the (existing, accessible) pathname. --------- Less code --------- Consider this correct but amateur shell script code: grep foo /etc/passwd >/dev/null if [ $? -eq 0 ] ; then echo "I found foo in the password file" fi The programmer forgot that the IF statement already directly tests the return code of the command it executes. Calling up the "test" command to examine the shell variable for the return code of the previous command is superfluous. The "less code" version of the above amateur code is: if grep foo /etc/passwd >/dev/null ; then echo "I found foo in the password file" fi Don't write more code than you need to.