Week 7-8 Notes for DAT2330 - Ian Allen Supplement to Text, outlining the material done in class and lab. Remember - knowing how to find out the answer is more important than memorizing the answer. Learn to fish! RTFM![*] ([*] Read The Fine Manual) Look under the "Notes" button on the course web page for these study notes for the Linux textbook: chapter11.txt (Shell Script Programming [not all]) Complete these Floppix Labs on http://floppix.ccai.com/ Floppix Lab 26: BASH Scripts Note: An early version of "bash" is also available on ACADAIX. In this file: * Shell Script Programming * Shell Script Header * Commenting Shell Programs * Code Commenting Style * Testing Return Codes of Commands * Complementing Return Status * Writing and Debugging Shell Scripts * Naming Shell Scripts ======================== Shell Script Programming ======================== You are computer programmers. When writing programs (scripts are programs), you are not simply trying to "get it to work"; but, you are also (and most importantly) practicing and demonstrating good programming techniques. Employers tell us that programming techniques are more important than the question of which languages you know. Good programming techniques include such things as: 1. Correct documentation (using guidelines given here) 2. Writing clean, simple, robust code: "Less code is better code"! 3. Using correct indentation, style, and mnemonic names 4. Using correct modularization and scope (i.e. local vs. global variables) 5. Correct use of arguments and/or prompting for user input 6. Correct use of exit codes on success/failure 7. Choosing the correct control structures 8. Presenting accurate and useful help if user makes an error 9. Checking the return status of all commands used in the script These criteria will be assessed in marking your programs (shell scripts). =================== Shell Script Header =================== All Shell Scripts in this course must contain a set of comment lines similar to the following (my explanatory remarks are on the right): #!/bin/sh -u <== specify interpreter # $0 [ args... ] <== show arguments and syntax # This script echoes its arguments. <== give purpose of script # Ian Allen - idallen@ncf.ca <== author and email # Mon Jun 11 17:06:04 EDT 2001 <== date (optional) export PATH=/bin:/usr/bin <== set correct PATH umask 022 <== correct file creation mask echo "$@" <== your script goes here status=$? <== your script goes here exit $status <== set exit status on exit Don't put the name of the shell script inside the shell script, since the name will be wrong if you rename the script. Use $0 instead. You will need to set variable $status to the appropriate exit value for your script. (Zero means "okay", non-zero means "something went wrong".) Choose PATH to include the system directories that contain the commands your script needs. Directories /bin and /usr/bin are almost always necessary. System scripts may need /sbin and /usr/sbin. GUI programs will need the X11 directories. Choose appropriately. Choose the umask to permit or restrict access to files and directories created by the shell script. "022" is a customary value, allowing read and execute by group and others. "077" is used in high-security scripts; since, it blocks all permissions for group and others. You must test the return codes of all commands inside the script. Your script must exit non-zero if anything inside the script fails. (Read more on how to do this, below.) Prompts and error messages should be sent to Standard Error, not to Standard Output (Linux text p.308). Variables must be quoted correctly to prevent unexpected special character expansion by the shell. If you prepare a template file containing the above script model, you can copy and use it to begin your scripts on tests and save time (and avoid errors and omissions). Do not include my remarks. ========================= Commenting Shell Programs ========================= Comments should add to a programmer's understanding of the code. They don't comment on the syntax or language mechanism used in the code; since, both these things are obvious to programmers who know the language. (Don't comment that which is obvious to anyone who knows the language.) Programmer comments deal with what the line of code means in the *algorithm* used, not with syntax or how the language *works*. Thus: Do not use comments that state things that relate to the syntax or language mechanism used and are obvious to a programmer, e.g. # THESE ARE OBVIOUS AND NOT HELPFUL COMMENTS: x=$# # set x to $# <== OBVIOUS; NOT HELPFUL date >x # put date in x <== OBVIOUS; NOT HELPFUL test "$a" = "$b" # see if $a equals $b <== OBVIOUS; NOT HELPFUL cp /dev/null x # copy /dev/null to x <== OBVIOUS; NOT HELPFUL Better, programmer-style comments: loop=$# # initialize loop index to max num arguments date >tmpdate # put account starting date in temp file test "$itm" = "$arg" # see if search list item matches command arg cp /dev/null tmpdate # reset account starting date to empty Do not copy "instructor-style" comments into your code. Instructor-style comments are put on lines of code by teachers to explain the language and syntax functions to people unfamiliar with programming (e.g. to students of the language). Instructor-style comments are "obvious" comments to anyone who knows how to program; they should never appear in your own programs (unless you become an instructor!). ===================== Code Commenting Style ===================== Comments should be grouped in blocks, ahead of blocks of related code to which the comments apply, e.g. # Set standard PATH and secure umask for accounting file output. # PATH=/bin:/usr/bin ; export PATH umask 077 # Verify that arguments exist and are non-empty. # NARGS=3 if test "$#" -ne $NARGS ; then echo 1>&2 "$0: Expecting $NARGS arguments; you gave: $#" exit 1 fi for arg do if ! test -s "$arg" -a -f "$arg" ; then echo 1>&2 "$0: arg '$arg' is missing, empty, or a directory" exit 1 fi done Do not alternate comments and single lines of code! This makes the code hard to read: # THIS IS A BAD EXAMPLE OF COMMENTS MIXED WITH CODE !! # Set a standard PATH plus system admin directories PATH=/bin:/usr/bin:/sbin:/usr/sbin # export the PATH for other programs export PATH # Set secure umask for accounting file output. umask 077 # create empty lock file >lockfile || exit $? # attempt to create link to lock file ln lockfile lockfile.tmp || exit $? # copy password file in case of error cp -p /etc/passwd /tmp/savepasswd$$ || exit $? # remove guest account (can't quick-check return code on grep) grep -v '^guest:' /etc/passwd >lockfile.tmp # copy new file back to password file file system cp lockfile.tmp /etc/passwd.tmp || exit $? # fix the mode to be readable chmod 444 /etc/passwd.tmp || exit $? # use mv to do atomic update of passwd file mv /etc/passwd.tmp /etc/passwd || exit $? # remove lock file rm lockfile.tmp lockfile # THIS IS A BAD EXAMPLE OF COMMENTS MIXED WITH CODE !! Block comments and code are easier to read. Here is a block-comment version of the above code: # Set and export standard PATH plus system admin directories. # Secure umask protects files. # export PATH=/bin:/usr/bin:/sbin:/usr/sbin umask 077 # Create empty lock file and attempt to create link to lock file. # >lockfile || exit $? ln lockfile lockfile.tmp || exit $? # Copy password file in case of error. # cp -p /etc/passwd /tmp/savepasswd$$ || exit $? # Remove guest account. (Can't quick-check return code on grep.) # Copy new file back to password file file system. # Fix the mode to be readable. # Use mv to do atomic update of passwd file. # Remove the lock file. # grep -v '^guest:' /etc/passwd >lockfile.tmp cp lockfile.tmp /etc/passwd.tmp || exit $? chmod 444 /etc/passwd.tmp || exit $? mv /etc/passwd.tmp /etc/passwd || exit $? rm lockfile.tmp lockfile Why can't you exit the script if grep returns a non-zero status code? ================================ Testing Return Codes of Commands ================================ Just as you would never use a C Library function without checking its return code, you must never use commands in important shell scripts without at least a minimal checking of their return codes. At minimum, the shell script should exit non-zero if a command fails unexpectedly: grep -v '^guest:' /etc/passwd >lockfile.tmp cp lockfile.tmp /etc/passwd.tmp || exit $? chmod 444 /etc/passwd.tmp || exit $? mv /etc/passwd.tmp /etc/passwd || exit $? rm lockfile.tmp lockfile || exit $? The shell conditional execution syntax "||" is used here to test the return codes of the commands on the left and execute the command on the right if the command on the left returns a bad status (non-zero). Some commands naturally return a non-zero exit status even when they are doing what you expect (e.g. grep might not find what you were looking for - this might be okay), and cannot be tested using this simple method. Do not exit the script after a "grep" or "cmp" command retuns non-zero! More complex testing -------------------- Unfortunately, simply exiting non-zero doesn't tell the user of the script which script contained the command that failed: $ ./myscript cp: /etc/passwd.tmp: No space left on device If this script were being run in the background along with several other scripts also containing similar commands, or if this script were being run by a system daemon or delayed execution scheduler (atd or crond), we wouldn't know from which script the actual "cp" error message came. More work is needed to produce a truly useful error message. The full and proper way to handle non-zero return codes in scripts is by using error messages that contain the script name. This means you need "if" statements around *every* command that might fail! This is probably overkill for most hobby scripts; but, it is necessary for systems programming: grep -v '^guest:' /etc/passwd >lockfile.tmp status="$?" if [ "$status" -ne 1 -a "$status" -ne 0 ] ; then # grep returns 2 on serious error echo 1>&2 "$0: grep guest /etc/passwd failed; status $status" exit 1 fi if ! cp lockfile.tmp /etc/passwd.tmp ; then echo 1>&2 "$0: cp lockfile.tmp /etc/passwd.tmp failed; status $?" exit 1 fi if ! chmod 444 /etc/passwd.tmp ; then echo 1>&2 "$0: chmod /etc/passwd.tmp failed; status $?" exit 1 fi if ! mv /etc/passwd.tmp /etc/passwd ; then echo 1>&2 "$0: mv /etc/passwd.tmp /etc/passwd failed; status $?" exit 1 fi rm lockfile.tmp lockfile Now, the script tells you its name in the error message: $ ./myscript cp: /etc/passwd.tmp: No space left on device ./myscript: cp lockfile.tmp /etc/passwd.tmp failed; status 2 Now it's easy to tell from which script the above "cp" error came. Making your scripts detect errors and issue clear error messages is tedious but not difficult. Adding all the error checking makes the code much longer and harder to read and modify. If the script isn't doing anything important, simply exiting after a failed command may be sufficient; it only adds a few words to each line of the script. For a system script that must detect errors under all conditions (including "too many processes", "file system full", etc.), you must have all the additional error checking. The reward is a script that won't let you down when things go wrong, and that will tell you exactly what the problem is when one develops. =========================== Complementing Return Status =========================== Any command or command pipeline's return status can be complemented (reversed from good to bad or bad to good) using a leading "!" before the command, e.g. $ false $ echo $? 1 $ ! false $ echo $? 0 $ grep nosuchxxx /etc/passwd $ echo $? 1 $ ! grep nosuchxxx /etc/passwd $ echo $? 0 This is useful in shell scripts to simplify this: if grep "$var" /etc/passwd ; then : do nothing else echo 1>&2 "$0: Cannot find '$var' in /etc/passwd" fi to this: if ! grep "$var" /etc/passwd ; then echo 1>&2 "$0: Cannot find '$var' in /etc/passwd" fi The "!" prefix is also useful in turning "while" loops into "until" loops or vice-versa. =================================== Writing and Debugging Shell Scripts =================================== The Number One rule of writing shell scripts is: Start Small and Add One Line at a Time! Students who write a 10- or 100-line script and then try to test it all at once usually run out of time. An unmatched quote at the start of a script can eat the entire script until the next matching quote! Start your script with the Script Header (name of interpreter, PATH, umask, comments) and the single command "date". If that doesn't work, you know something fundamental is wrong, and you only have a few lines of code that you need to debug. (Is your interpreter correct? your PATH?) Add to this simple script one or two lines at a time, so that when an error occurs you know it must be in the last line or two that you added. Do not add 10 lines to a script! You won't know what you did wrong! You can ask the a shell to show you the lines of the script it is reading and executing by using the "-v" or "-x" (or both) option to the shell: $ sh -v -u ./myscript arg1 arg2 ... $ sh -x -u ./myscript arg1 arg2 ... The "-v" options displays the command lines as they are read by the shell (without any shell expansion). The "-x" option displays the command lines as they are passed to the commands being executed, after the shell has done all the command line expansion and processing. These options will allow you to see the commands as they execute, and may help you locate errors in your script. (Double-quote your variables!) Of course you can use -v and -x with an interactive shell too: $ sh -v $ echo $SHELL echo $SHELL /usr/bin/ksh $ echo * echo * a b c d $ sh -x $ echo $SHELL + echo /usr/bin/ksh /usr/bin/ksh $ echo * + echo a b c d a b c d $ sh -v -x $ echo $SHELL echo $SHELL + echo /usr/bin/ksh /usr/bin/ksh $ echo * echo * + echo a b c d a b c d Remember that if you use a shell to read a shell script ("sh scriptname"), instead of executing it directly ("./scriptname"), the shell will treat all the comments at the start of the shell script as comments. In particular, the comment that specifies the interpreter to use when executing the script ("#!/bin/sh -u") will be ignored, as will all of the options listed beside that interpreter. Only by actually *executing* the script will you cause the Unix kernel to use the interpreter and options given on the first line of the script. For example: $ cat test #!/bin/sh -u echo 1>&2 "$0: This is '$undefined'" $ ./test ./test: undefined: unbound variable $ sh test test: This is '' $ sh -u test test: undefined: unbound variable $ csh test Bad : modifier in $ ( ). All shells treat #-lines as comments and ignore them. Only the Unix kernel treats #! specially, and only for executable scripts. ==================== Naming Shell Scripts ==================== Often, you will want to put an example of how to run a shell script inside the shell script as a comment. You might create a script called "doexec" and write it as follows: #!/bin/sh -u # # doexec [ files... ] # # This script sets execute permissions on all its arguments. # -IAN! idallen@ncf.ca # Mon Jun 11 23:02:38 EDT 2001 PATH=/bin:/usr/bin ; export PATH umask 022 if "$#" -eq 0 ; then echo 1>&2 "$0: No arguments; nothing done" status=0 else if chmod +x "$@" ; then status=0 # it worked else status="$?" echo 1>&2 "$0: chmod exit status: $status" echo 1>&2 "$0: Could not change mode of some argument: $*" fi fi exit "$status" You would execute this script by typing: $ ./doexec filename1 filename2 filename3... This comment line in the doexec script: # doexec [ files... ] tells the reader that the script name is "doexec" and the files are the (optional) arguments to the script. But what if you rename the script to be something other than "doexec"? $ mv doexec fixperm $ ./fixperm foo bar The use of "$0" in the echo line for the error message ensures that the shell will print the actual script name in the error message, but the comment in the script is now wrong, since the program name is no longer "doexec". I don't want to have to edit the script and make a change such as "doexec" to "fixperm" every time I change the name of the script. The solution is never to put the actual name of a script inside the script, even as a comment. Wherever you refer to the name of the script, even in a comment, use the "$0" convention instead. So, the comment changes from: # doexec [ files... ] to be: # $0 [ files... ] In the comment, "$0" just means "whatever the name of this script is", without my having to actually write the script name. I don't want to use the actual script name, because I might change it. Since the line is a comment, ignored by the shell, the shell will never actually expand that "$0" to be the real name of the shell; it's just a convenient way of specifying the program name without actually naming it inside the script. Never put the name of a program inside the program; it might change!