% Shell Script Problems -- arithmetic, syntax, test, boolean, etc. % Ian! D. Allen -- -- [www.idallen.com] % Winter 2018 - January to April 2018 - Updated 2021-12-11 04:23 EST - [Course Home Page] - [Course Outline] - [All Weeks] - [Plain Text] Avoiding Common Script Problems =============================== There are many ways to make mistakes in script programming. Here are some warnings about common errors. Writing too much code to test ============================= Shells are not particularly good about giving helpful error messages when shell scripts contain errors. For example, having a missing or non-integer argument to the `test` command may produce a vague error message: bash$ test 1 -eq bash: test: 1: unary operator expected sh$ test 1 -eq "" sh: 1: test: Illegal number: A common mistake when writing a new shell script is to write too many lines of code, run the new script, and then get too many error messages. Because you wrote so many lines, you don't know which line contains the error. Create shell scripts a few lines at a time, testing the script after you add each line or two so you know where the errors lie. Running the script using a shell with the debug options `-x` or `-v` set may also be helpful: $ bash -u -x ./myscript.sh $ bash -u -v ./myscript.sh Scripts don't do arithmetic =========================== Don't forget that shells aren't designed to do arithmetic. They find and run commands, and it is commands that you must use in `if` statements. This `if` statement below is wrong thinking; it forgets to use the `test` helper program to compare the numbers: #!/bin/sh -u if $# -gt 0 ; then # WRONG WRONG WRONG echo "Number of arguments is $#" fi The above `if` line will have the shell expand the variable `$#` into some number and then try to execute that number as a command. The error message will not be very helpful and will depend on the number of arguments: $ ./myscript.sh a b c ./myscript.sh: 2: ./myscript.sh: 3: not found $ ./myscript.sh a b c d e f g h ./myscript.sh: 2: ./myscript.sh: 8: not found You can see the problem if you use the `-x` option to the shell: $ bash -ux ./myscript.sh a b c d + 4 -gt 0 ./myscript.sh: line 2: 4: command not found The shell `if` keyword must always be followed by a command name. Always remember to code some command name after `if`: if test $# -gt 0 ; then # RIGHT: use test helper command to compare if [ $# -gt 0 ] ; then # RIGHT (syntactic sugar for test helper) Don't mix square brackets and command names =========================================== Conversely, don't double up on the command name you use after `if`: $ if [ fgrep foo /etc/passwd ] ; then date ; fi # WRONG WRONG WRONG [: foo: binary operator expected. The above syntactic-sugar line is equivalent to this (also incorrect) line: $ if test fgrep foo /etc/passwd ; then date ; fi # WRONG WRONG WRONG test: foo: binary operator expected. The command name after the `if`, above, is `test`. The `test` command will see three command-line arguments (`fgrep`, `foo`, and `/etc/passwd`) and complain that the middle one isn't an operator. You can't use both commands `test` and `fgrep` at the same time. If you want to test the return status of `fgrep`, then `fgrep` must be the command name that immediately follows the `if` keyword: $ if fgrep foo /etc/passwd ; then date ; fi # RIGHT Don't put command names inside square brackets. The `[...]` square bracket syntax for `test` needs surrounding blanks ===================================================================== The `if` keyword must always be followed by a command name, and that command name is exactly one square bracket `[` in the syntactic-sugar form of the `test` helper command. The following code does not work because blanks are missing around the first square bracket, making it into an unknown command named `[1` or `[!`: $ if [1 -eq 1 ] ; then echo "ALWAYS USE BLANKS" ; fi sh: [1: command not found $ if [! -r /etc/passwd ] ; then echo "ALWAYS USE BLANKS" ; fi sh: [!: command not found The shell sees `[1` and `[!` as two-character command names that don't exist. The following incorrect statement fails for the same reason: $ if [a=b] ; then echo "ALWAYS USE BLANKS" ; fi bash: [a=b]: command not found Square brackets are not punctuation! Always use blanks around `[` and `]`: $ if [ 1 -eq 1 ] ; then ... # RIGHT! surround with blanks $ if [ ! -r /etc/passwd ] ; then ... # RIGHT! surround with blanks The arguments to the `test` command must always be separate command line arguments. This next line fails because of the missing blank before the required closing square bracket: $ if [ 1 -eq 1] ; then echo "ALWAYS USE BLANKS" ; fi [: missing `]' The `[` command is looking for the argument "left square bracket" `]` not `1]` as its last command line argument. The corrected line uses blanks: $ if [ 1 -eq 1 ] ; then ... # RIGHT! surround with blanks Always surround all the `test` helper command arguments with blanks. Don't forget blanks around `test` operators =========================================== The `test` helper command behaves differently depending on the number of arguments you pass to it: $ test 1 -eq 2 # three arguments: operator in middle $ test -f file # two arguments: operator on left $ test string # one argument: -n assumed on left: -n string If the `test` command has only one single command line argument, it defaults to using `-n` as the implied operator (test for non-empty string) on the one argument. The following one-argument tests are always TRUE, though they may not appear that way at first to human eyes: if test a=b ; then # WRONG! THIS IS TRUE (good return code) ! if [ a=b ] ; then # WRONG! THIS IS TRUE (good return code) ! if test 1=2 ; then # WRONG! THIS IS TRUE (good return code) ! if [ 1=2 ] ; then # WRONG! THIS IS TRUE (good return code) ! if test 0 ; then ... # WRONG! THIS IS TRUE (good return code) ! if [ 0 ] ; then ... # WRONG! THIS IS TRUE (good return code) ! In all the above lines, the `test` command has only one command line argument (not counting the trailing `]` that is always ignored). Since the single argument to `test` is not the empty string, `test` returns a good status and the `if` succeeds. The `test` command is defaulting to use an implied `-n` operator on the left. The shell is actually executing these tests for non-empty strings: if test -n "a=b" ; then # THIS IS ALWAYS TRUE (good return code) ! if [ -n "a=b" ] ; then # THIS IS ALWAYS TRUE (good return code) ! if test -n "1=2" ; then # THIS IS ALWAYS TRUE (good return code) ! if [ -n "1=2" ] ; then # THIS IS ALWAYS TRUE (good return code) ! if test -n "0" ; then ... # THIS IS ALWAYS TRUE (good return code) ! if [ -n "0" ] ; then ... # THIS IS ALWAYS TRUE (good return code) ! All the above tests are always true, because the three-character strings `a=b` and `1=2` are not empty strings and never will be empty, and the single-character string `0` is also never the empty string. If you want to perform equality tests, you must separate each argument by blanks so that `test` sees three separate arguments, not just one: if test a = b ; then # correct 3-argument syntax if [ a = b ] ; then # correct 3-argument syntax if test 1 = 2 ; then # correct 3-argument syntax if [ 1 = 2 ] ; then # correct 3-argument syntax Always keep the arguments to `test` separated by blanks. Don't use redirection operators `<` or `>` for `-lt` less or `-gt` greater ========================================================================== Another common mistake, usually made by programmers accustomed to other programming languages, is to use shell redirection metacharacters `<` and `>` instead of the correct operators `-lt` and `-gt` in `test` numeric comparisons. Here are two identical mistakes: if test 1 > 2 ; then ... # THIS IS WRONG - must use -gt not > if [ 1 > 2 ] ; then ... # THIS IS WRONG - must use -gt not > The above two lines have the shell first use redirection (`>`) to create a file named `2` and redirect the output of the `test` command into it. (The `test` command produces no output; the file remains empty.) The `test` command itself is left with only one single command line argument, the digit `1`. With one argument and no operators, the `test` command returns success if the argument is not the empty string (`test -n 1`). The string `1` is never empty, so the above test, and the `if`, always succeeds. The correct shell scripting form does not use the redirection syntax: if test 1 -gt 2 ; then ... # right syntax for "greater than" if [ 1 -gt 2 ] ; then ... # right syntax for "greater than" Do not use shell redirection metacharacters inside `test` expressions! The `test` string equality operator is `=` not `==` =================================================== If you're a programmer, you're used to doing equality comparisons using the `==` operator. In shell programming the `test` command uses the string comparison operator `=` (one equals) and not `==` (two equals): if [ "$1" = '--help' ] ; then ... # correct syntax uses '=' if [ "$1" == '--help' ] ; then ... # WRONG ! Some shells (e.g. `bash`) accept the incorrect `==` operator as well as `=` to compare strings, but the `/bin/sh` (a link to `/bin/dash`) shell on Ubuntu (the CLS) is not one of them: bash$ [ a = b ] # correct syntax uses one '=' bash$ [ a == b ] # WRONG ! but bash allows it anyway sh$ [ a = b ] # correct syntax uses one '=' sh$ [ a == b ] # WRONG ! causes error in /bin/sh sh: 1: [: a: unexpected operator Always use one single equals `=` to compare strings. Don't confuse an empty/null argument with a missing argument ============================================================ This script below has no argument: $ ./example.sh Given the above script command line, inside the script the value of `$#` (the number of arguments) is zero. The value of the first argument `$1` (and all following arguments) is undefined. These script command lines below both have a single empty or null string argument: $ ./example.sh '' $ ./example.sh "" Given the above script command lines, inside the script the value of `$#` is one because the script has one argument. The first argument itself `$1` is defined but has *zero* characters in it: test -z "$1" # this is TRUE inside the script [ "$1" = '' ] # this is TRUE inside the script An argument with no characters in it is not the same thing as a missing argument. An argument that is a space character is *not* null or empty. These script command lines below all have a single string argument that contains a space character: $ ./example.sh ' ' $ ./example.sh " " $ ./example.sh \ # there is a space after the backslash test -z "$1" # this is FALSE inside the script [ "$1" = '' ] # this is FALSE inside the script test -n "$1" # this is TRUE inside the script [ "$1" = ' ' ] # this is TRUE inside the script [ "$1" = " " ] # this is TRUE inside the script [ "$1" = \ ] # this is TRUE inside the script Remember the difference between: 1. A missing (undefined) argument. 2. A defined but null (empty) argument. 3. An argument containing a space (or many spaces). Don't use confusing double negatives or double exit status inversions ===================================================================== The exit status negation operator `!` may be used to the left of any single expression used inside the `test` command: if test ! -r "$file" ; then ... if [ ! -r "$file" ] ; then ... if test ! -z "$string" ; then ... if [ ! -z "$string" ] ; then ... The `test` command uses the exclamation point operator `!` to negate/invert/complement the exit status of a Boolean test. If you combine the negation operator of `test` with the shell return code negation operator that also uses `!`, you can end up with confusing or unreadable code: if ! test ! -r file ; then # CONFUSING if ! [ ! "abc" != "def" ] ; then # EVEN MORE CONFUSING Don't use confusing double-negative logic. Rework the expression to use only a single `!` or none at all: if test -r file ; then ... # same expression as above: readable if [ "abc" != "def" ] ; then ... # same expression as above: readable Keep the negation operator as an argument to `test`; don't place it before the opening square bracket alias to negate the return code of `test`. To test if a file is non-existent or exits but is not readable: if ! [ -r file ] ; then # NO: correct but awkward (do not use) if [ ! -r file ] ; then # YES: correct and preferred The shell return code negation operator `!` is almost never used to negate the return code of the `test` command itself. Always use `!` as an *argument* to the `test` command, inside the square brackets, never outside. Don't use shell Boolean operators `&&` or `||` for `-a` *AND* or `-o` *OR* ========================================================================== **C** and **Java** programming language programmers sometimes confuse the syntax of the Boolean operators **AND** `&&` and **OR** `||` inside the `test` command, where you should be using `-a` or `-o`: if [ $# != 1 -o -z "$1" ] ; then ... # YES: correct shell syntax if [ $# != 1 || -z "$1" ] ; then ... # NO: incorrect C language syntax The error messages for this incorrect use look like this: $ if [ $# != 1 || -z "$1" ] ; then echo hi ; fi # WRONG SYNTAX [: missing `]' bash: -z: command not found The Bourne shell `||` and `&&` operators separate shell commands in a manner similar to the semicolon `;`. You cannot use them inside `test` expressions. Use `-a` and `-o` to separate Boolean clauses to the `test` command: if [ $# != 1 -o -z "$1" ] ; then echo hi ; fi # RIGHT > Digression (optional reading): > > You can use the shell `||` and `&&` command separators between individual > `test` commands if you make sure each `test` command is complete: > > if [ $# != 1 ] || [ -z "$1" ] ; then echo hi ; fi # valid but inefficient > > Above, the `||` separates two different and complete `test` command > executions. Rather than using the `test` command twice, you can usually > simply join them into one using the correct `test` Boolean operator: > > if [ $# != 1 -o -z "$1" ] ; then echo hi ; fi # RIGHT > > Less code is better code. > > Sometimes, you *must* separate a `test` expression into two different > `test` commands. Consider these two almost equivalent lines: > > Line 1: if [ "$var" = "" -o "$var" -eq 0 ] ; then ... > Line 2: if [ "$var" = "" ] || [ "$var" -eq 0 ] ; then ... > > Line #1 will give a shell error *`Illegal number`* if the variable is > empty, because the entire single `test` expression is processed by the > shell and the shell doesn't accept an empty variable in a `-eq` expression. > Line #2 works correctly because there are two separate `test` expressions > and the shell never gets to the `-eq` expression if the variable is empty > in the first expression. Don't mix comparing strings and comparing numbers in `test` =========================================================== The `test` helper command has six ways to compare numbers and two ways to compare strings. Don't mix them up. In particular, don't use the numeric operators to try to compare strings; the error message isn't very obvious: $ if [ "$1" -eq "" ] ; then echo "Empty string" ; fi sh: [: Illegal number: The string comparison operators are `=` and `!=`, not `-eq` and `-ne`. > Programmers Note! The string comparison operator is a single `=`, not a > double `==` as used in many programming languages. Some shells let you use > a double `==`, but it's wrong and it won't work everywhere. > > if [ "$var" == 'abc' ] ; then ... # WRONG! not portable > if [ "$var" = 'abc' ] ; then ... # RIGHT! works everywhere The nicer-looking string equality comparison operators `=` and `!=` are sometimes used instead of the more correct numeric comparison operators `-eq` and `-ne` when you know the numbers being compared don't have leading zeroes (such as the shell exit status variable `$?` or the number of arguments variable `$#`): if [ $? != 0 ] ; then ... # equivalent to [ $? -ne 0 ] if [ $# = 1 ] ; then ... # equivalent to [ $# -eq 1 ] If you're not sure if the numbers have leading zeroes, then you *must* use the numeric equality comparison operators and not the string ones: var=00 if [ "$var" -eq 0 ] ; then ... # TRUE: number 00 is equal to number 0 if [ "$var" = 0 ] ; then ... # FALSE: string "00" is not equal to string "0" Opposites and false opposites in `test` ======================================= Boolan logic has some subtle consequences when applied to the operations performed by the `test` helper command. True Boolean opposites: `-n` and `-z`, `=` and `!=`, `-eq`, and `-ne` --------------------------------------------------------------------- The logical opposite of the `test` operator `-n` (is not an empty string) is `-z` (is an empty string), just as the opposite of `=` (string equality) is `!=` (string inequality), and the opposite of `-eq` (integer equality) is `-ne` (integer not equal). These are all correct opposites. Subtle Boolean opposites: `-lt` and `-ge` ----------------------------------------- The logical opposite of the `test` operator `-lt` (less than) is not `-gt` (greater than), it is `-ge` (greater than or equal to). (If you are not younger than your sister, you are either older or the same age.) The opposite of the `test` operator `-gt` is not `-lt`, it is `-le`. Files are not "not directories" ------------------------------- The `test` operators `-f` and `-d` are *not* opposites. If a pathname is not a file, it may or may not be a directory. It could be a directory or any number of other special file types under Unix/Linux. (`/dev/null` is a common example of a pathname that is not a directory or a plain file.) You cannot replace `test ! -f` with `test -d` or vice-versa. They are not opposites. Negating/inverting `test` pathname operators, e.g. `! -r` --------------------------------------------------------- The `test` pathname operators all return success (zero) only if the pathname is accessible (all the directories can be traversed) **AND** the pathname exists **AND** if it has the given pathname property. This means that the negation/inversion of a pathname operation has to include the possibility that the pathname does not exist or that it can't be accessed: if [ -r file ] ; then ... # succeed if pathname is accessible and readable if [ ! -r file ] ; then ... # succeed if pathname inaccessible, non-existent, or not readable Inverting the status of most of the pathname operators means that the resulting test might succeed either because the pathname can't be reached, **OR** the pathname doesn't exist, **OR** because the pathname exists but fails the test. You need to apply more programming logic if you want to know that a pathname actually exists but is not, for example, readable: if [ -e pathname -a ! -r pathname ] ; then ... # if path exists AND path is *not* readable Remember that inverting a pathname test may mean the inverted test succeeds because the pathname is not accessible or does not exist! The opposite of "pathname is readable" is "pathname is not accessible, OR pathname does not exist, OR pathname is not readable". The multiple causes of failure of `test` pathname tests ======================================================= If a `test` pathname operator (e.g. `-r`, `-w`, `-x`, `-f`, `-d`, `-s`, `-e`) succeeds, you also know that you have permission to traverse all the directories leading up to it and that the pathname actually exists. If a `test` pathname operator fails, it may also fail because you have no permission to search one of the directories in the pathname, or because the pathname simply doesn't exist. Without first testing if you can access the pathname and that it actually exists, the following error message is misleading: if [ ! -r "$path" ] ; then echo 1>&2 "$0: '$path' is not readable" # POOR ERROR MESSAGE fi While it is true that the pathname is not readable, the above error message is incomplete. You might not have permission to traverse all the directories in its pathname, or, the pathname might not even exist. Saying the overall pathname is not readable is true, but it is only part of the truth. A more accurate error message would be: if [ ! -r "$path" ] ; then echo 1>&2 "$0: '$path' is inaccessible, missing, or not readable" fi If you want to be more specific in your error message about why the pathname is not readable, you need code to test for existence first: if [ ! -e "$path" ] ; then echo 1>&2 "$0: '$path' does not exist or is not accessible by you" else # the pathname exists and is accessible; test readability: if [ ! -r "$path" ] ; then echo 1>&2 "$0: '$path' exists but is not readable by you" fi fi The test for readability is now done only if the pathname exists and is accessible; if the test for readability fails, you know the (existing, accessible) pathname item is truly not readable. The error message is more accurate now. Any time one of the `test` pathname operator tests fails, be accurate in your error message. State whether the failure is due to a missing or inaccessible pathname, or due to a failure of the actual test being performed on the (existing, accessible) pathname. Multiple `test` expressions cloud error message =============================================== Be careful in `if` statements when testing multiple conditions at the same time that you do not make the failure error message unhelpful: if [ "$x" -gt 0 -a -f "$file" -a "$y" -lt 27 -a -n "$string" ] ; then ... do something useful ... else echo 1>&2 "$0: Error: ... what do you say here ??? ..." fi The `???` error message above would have to say what failed, and there are so many possibilities for failure that the message becomes unreadable. The error would have to read like this: `Error: $x is <= 0 or $file is inaccessible, does not exist, or is not a file, or $y is >= 27, or '$string' is a null string`. Which failure was it? Such a complex error message is not helpful to the users of your scripts! Use separate tests and separate error messages for each test condition; don't bunch them together using Boolean `-a` or `-o` operators: # Split the huge condition into more readable error messages. # Test each condition separately and exit if any condition fails. # if [ "$x" -le 0 ] ; then echo 1>&2 "$0: Error: x value $x is <= 0" exit 1 fi if [ ! -f "$file" ] ; then echo 1>&2 "$0: Error: path '$file' is inaccessible, does not exist, or is not a file" exit 1 fi if [ "$y" -ge 27 ] ; then echo 1>&2 "$0: Error: y value $y is >= 27" exit 1 fi if [ -z "$string" ] ; then echo 1>&2 "$0: Error: string value '$string' is a null string" exit 1 fi ... all tests passed; now do something useful ... Use Less Code ============= Less code is better code. Consider this correct but amateur shell script code: fgrep "foo" /etc/passwd >/dev/null if [ $? -eq 0 ] ; then echo "I found foo in the password file" fi The programmer forgot that the `if` statement can directly test the return code of the command it executes. Calling up the `test` command to examine the shell variable for the return code of the previous command is superfluous. The "less code" version of the above amateur code is: if fgrep "foo" /etc/passwd >/dev/null ; then echo "I found foo in the password file" fi A real pro might have read the manual page for `fgrep` and knows that Linux `fgrep` has a `--quiet` (`-q`) option to suppress output, so the pro version becomes: if fgrep -q "foo" /etc/passwd ; then echo "I found foo in the password file" fi Don't write more code than you need to. **Less code is better code.** -- | Ian! D. Allen, BA, MMath - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [Course Home Page]: .. [Course Outline]: course_outline.pdf [All Weeks]: indexcgi.cgi [Plain Text]: 740_script_problems.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/