Week 4 Notes for DAT2330 - Ian Allen Supplement to Text, outlining the material done in class and lab. Remember - knowing how to find out the answer is more important than memorizing the answer. Learn to fish! RTFM![*] ([*] Read The Fine Manual) Look under the "Notes" button on the course web page for these study notes for the Linux textbook: chapter5.txt (the Shell) Complete these Floppix Labs on http://floppix.ccai.com/ Floppix Lab 14: redirection Floppix Lab 15: pipes and filters Floppix Lab 18: the search path Floppix Lab 20: configuring the bash shell Floppix Lab 27: superuser Note: An early version of bash is also available on ACADAIX. (The default shell on ACADAIX is the Korn shell - "ksh".) -------------- The Unix Shell - Chapter 5 -------------- What is a shell for? To find and run programs (also called commands or utilities)! (It also does programming kinds of things; but, all that is usually to aid in the finding and running of programs.) Most (but not all) commands take what as arguments? Pathnames, either file names or directory names. The shell has features to make matching pathnames easier. (Not *all* programs take pathnames as arguments!) How does the shell help run commands? Shells look for commands in various places, using a list of directories stored in the $PATH environment variable. Shells provide aliases and variables to save typing the same things (commands or pathnames) over and over. Shells provide wildcards (glob patterns) to generate lists of pathnames as arguments for commands. Shells provide ways of recalling and editing the last commands you enter, to save retyping them. Shells provide ways of completing command and file names, to save typing. Where are wildcard characters (glob patterns) expanded? The shell does the wildcard (glob) expansion, NOT the commands. The wildcards are expanded before the shell looks for the command! Note how different commands treat the same set of arguments differently: $ echo * - args are interpreted as a list of text words $ ls -l * - args are interpreted as a list of pathnames (files or dirs) $ cat * - args are interpreted as a list of file names to open $ mail * - args are interpreted as a list of userids to email The shell does not know anything about the type of command being used when it prepares the argument list for a command. It is quite possible to prepare a list of arguments that don't make sense for the command being run, e.g. $ sleep * - this probably makes no sense Define: whitespace, arguments, wildcard, prompt ------------- Nested Shells ------------- Your default login shell on ACADAIX is the Korn shell (ksh). Your default login shell on Linux (including Floppix) is the Bourne-Again shell (bash). You can call up other shells by name: sh, ksh, bash, csh, tcsh (Not all shells may be installed on your system.) Each new shell is another Unix process. To get a list of your current processes, use: ps To exit from a shell: exit When you exit from your login shell, you log out from Unix. Another way to exit a shell is to type your EOF character. (Your EOF character is usualy CONTROL-D.) Some shells can be told to ignore EOF. Remember: all Unix programs (should) have manual pages! $ man sh $ man ksh $ man bash $ man csh $ man tcsh The shells sh, ksh, and bash (the "Bourne" shells) all have a common ancestry. They are all derived from the original Bourne shell "sh", and the programming features of these shells (if statements, for loops, etc.) all look and work the same way. The shells csh and tcsh (the "C" shells) are similar. Their syntax for programming is not the same as the Bourne shells. We do not cover the C shell syntax in this course; but, you can read about it in your Linux textbook. ------------------ Output Redirection ------------------ Output redirection diverts (redirects) output that would normally appear on the screen to some other place, either into the input of another command (a pipe) or into a file. This normal output on your screen is called the "standard output" ("stdout") of the command. Output redirection of stdout into files: $ echo hello - stdout goes to terminal $ echo hello >file - erase file; send stdout to file $ echo hello >>file - append stdout to end of file Shells don't care where on or in the command line you do the redirection. All these command lines do exactly the same thing to stdout: $ echo hi there mom >file $ echo hi there >file mom $ echo hi >file there mom $ echo >file hi there mom $ >file echo hi there mom Like wildcarding (called "globbing" on Unix), shells handle redirection before they go looking for the command to run. The command doesn't see any part of the redirection syntax. The redirection is done by the shell, then the redirection information is removed from the command line before the command is called. Redirection is never counted as arguments to a command. Examples: $ echo hello there - shell calls "echo" with two arguments ==> echo(hello,there) - "echo" echoes two arguments - output appears in default location (standard output is your screen) $ echo hello there >file - shell creates "file" and diverts standard output into it - shell calls "echo" with two arguments ==> echo(hello,there) (note NO CHANGE in arguments to "echo" from the previous example) - "echo" echoes two arguments - standard output is captured in output "file", NOT on your screen $ >file echo hello there - this is identical to the above example - standard output is captured in output "file", NOT on your screen - you can put the redirection anywhere in the command line! Unix Big Redirection Mistake #1: Do not do this kind of redirection: $ cat * >z - shell creates "z" and redirects all future standard output into it - shell expands wildcards; wildcard "*" includes file "z" that was just created by the shell (Note: Bourne shells will do the wildcard before the file creation; C Shells do the file creation first.) - shell finds and calls cat command with all file names as arguments ==> e.g. cat(a,b,c,d,e,file1,file2,...etc...,z) - cat command processes each argument, opening each file and sending the output into file "z" - when cat opens file "z", it ends up reading from the top of file "z" and writing to the bottom of file "z" at the same time! - Result: an infinite loop that fills up the disk drive as "z" gets bigger and bigger Fix #1: Use a hidden file name $ cat * >.z - uses a hidden file name not matched by the shell "*" wildcard - the cat command is not given ".z" as an argument, so no loop occurs Fix #2 (two ways): Use a file in some other directory $ cat * >../z $ cat * >/tmp/z - redirect output into a file that is not in the current directory so that it is not read by the cat command and no loop occurs Unix Big Redirection Mistake #2 Do not do this kind of redirection: $ cat a b >a - shell truncates file "a" and redirects command output into it - original contents of "a" are lost - truncated - GONE! - shell finds and calls cat command with two file name arguments ==> i.e. cat(a,b) - cat command processes contents of file "a" (now an empty file) - cat command processes contents of file "b" - output has been redirected by the shell to appear in file "a" - Result: file "a" gets a copy of "b"; original contents of "a" are lost Fix #1: Append to a $ cat b >>a - double-redirect syntax appends file "b" safely to the end of "a" Fix #2: Use a Temporary Third File $ cat a b >c # mv c a - the third file safely receives the output of "a" and "b" ------------------ Input Redirection: ------------------ Most Unix commands read input from files, if file names are given on the command line, and from standard input ("stdin") if no file names are given. (Not *all* commands read from standard input. Examples of common commands that never read from standard input: cp, mv, date, who, etc.) If a command reads from standard input, you can tell the shell to use input redirection to change from where the command reads: $ cat food - reads from file "food" $ cat - reads from stdin (keyboard) $ cat a >b >c >d >e - the "date" output goes into file "e"; the other files are created by the shell but are empty because only the final redirection wins bash$ date >out | wc 0 0 0 - the "date" output goes into file "out"; nothing goes into the pipe Some shells (including the "C" shells) will try to warn you about silly shell redirection mistakes: csh% date a >b >c Ambiguous output redirect. csh% date >a | wc Ambiguous output redirect. The C shells tell you that you can't redirect stdin or stdout to/from more than one place at the same time. Bourne shells do not tell you - they simply ignore the "extra" redirections and do only one of each. ----------- Redirection ----------- A command line to convert lower-case to upper-case from the "who" command: $ who | tr 'a-z' 'A-Z' Shell question: Are the single quotes required around the two arguments? (Are there any special characters in the arguments that need protection?) You can use a similar command to convert a lower-case file of IBM MVS JCL into upper-case. (You will do this in the MVS section of the course.) EXPERIMENT: Why doesn't this convert the file "myfile" to upper-case? $ tr 'a-z' 'A-Z' myfile Why is the file "myfile" empty after this command is run? What about the following command lines - what is in "myfile" when the command finishes? $ cat myfile $ sort myfile $ head myfile Given the above, why is "myfile" not empty in the following case? $ wc myfile The following command line doesn't work because the programmer doesn't understand the "tr" command syntax: $ tr 'a-z' 'A-Z' myfile >new Why does this generate an error message from "tr"? (The "tr" command is unusual in its handling of command line pathnames. RTFM) The following command line redirection is faulty; however, it sometimes works for small files: $ cat foo bar | tr 'a' 'b' | grep "lala" | sort | head >foo There is a critical race between the first "cat" command trying to read the data out of "foo" before the shell truncates it to zero when launching the "head" command at the end of the pipeline. Depending on the system load and the size of the file, "cat" may or may not get out all the data before the "foo" file is truncated or altered by the shell in the redirection at the end of the pipeline. Don't depend on long pipelines saving you from bad redirection! Never redirect output into a file that is being used as input in the same command or anywhere in the command pipeline. ------- Aliases ------- Watch out for "helpful" system admin that define aliases for your shells when you log in. (This is especially true on ACADAIX!) The aliases may mislead you about how Unix commands actually work. (For example, the "rm" command does *not* prompt you for confirmation. On some systems, "rm" is an alias for "rm -i", which *does* prompt.) To avoid pre-defined aliases, start up a fresh copy of the shell: $ alias [...many ACADAIX aliases print here...] $ bash bash$ alias [...no more aliases here...] To define your own aliases, look up "aliases" in the Linux Text index. You must put your aliases in a file to have them saved between sessions. ------------------------------------- Studies in Quoting Special Characters ------------------------------------- Understand how the shell handles quotes and blanks: $ echo hi there hi there $ echo "hi there" hi there $ echo 'hi there' hi there Explain the above three outputs. How many arguments are passed to the "echo" command in each case? The "touch" command creates empty files by name. Try this: $ touch "a b" $ ls a b $ rm a b Explain the error message that is output by the above "rm" command. How many arguments are passed to the "touch" and "rm" commands? Here are some more things to try, and to understand. $ echo "'hello'" $ echo '"hello"' $ touch a b c d $ echo ' * ' $ echo '" * "' $ echo '"' * '"' $ echo '"'" * "'"' $ echo ' * ' * " * " You must be able to predict the output of each of the above command lines without having to type them in to try them. -------------------------------------- More shell wildcards: Character Ranges -------------------------------------- As shown in your textbook, the shell can do globbing (wildcard expansion) on ranges of characters. Try these shell patterns in your home directory on ACADAIX: $ echo ../[a-c]* $ echo ../*[3-5] $ echo ../[r-t]*[13579] How would you count the number of pathnames generated by the shell? What would be different if you changed "echo" to "ls"? -------------------------------------------- Understanding the different types of "sort": -------------------------------------------- Explain the difference in output of these two "sort" pipelines: $ list="1 11 2 22 3 33 4 44 3 33 2 22 1 11" $ echo "$list" | tr ' ' '\n' | sort $ echo "$list" | tr ' ' '\n' | sort -n (The translate command "tr" is turning blanks into newlines so that the numbers appear on separate lines on input to sort.) Why is the sort output different in these two examples? ------------------------------------------------------------------- Using commands and pipes to "mine" and extract data from the system ------------------------------------------------------------------- Problem: "Print the fifth directory from your $PATH environment variable." Iterative solution built up slowly using simple commands: $ echo "$PATH" $ echo "$PATH" | tr ':' '\n' $ echo "$PATH" | tr ':' '\n' | head -5 $ echo "$PATH" | tr ':' '\n' | head -5 | tail -1 Problem: "Print the second-to-last directory from your $PATH environment variable." $ echo "$PATH" | tr ':' '\n' | tail -2 | head -1 Problem: "Sort the elements in the PATH variable in ascending order." $ echo "$PATH" $ echo "$PATH" | tr ':' '\n' $ echo "$PATH" | tr ':' '\n' | sort $ echo "$PATH" | tr ':' '\n' | sort | tr '\n' ':' Problem: "Keep only the first five elements of the PATH." $ echo "$PATH" | tr ':' '\n' | head -5 | tr '\n' ':' Problem: "How many unique shells are in the /etc/passwd file?" Build up the solution iteratively, starting with simple commands. The shell is the seventh colon-delimited field in the passwd file. Either the "awk" or "cut" commands can pick out a field from a file. We will use "cut" to pick out the 7th field delimited by a colon. Because the /etc/passwd file on ACADAIX is huge (and the output on our screen would be huge), we will start with the first 10 lines of the passwd file until we know we have the correct command line, then we will use the whole passwd file. $ head /etc/passwd $ head /etc/passwd | cut -d : -f 7 $ head /etc/passwd | cut -d : -f 7 | sort $ head /etc/passwd | cut -d : -f 7 | sort | uniq We have the correct command line. Now do the whole file: $ cat /etc/passwd | cut -d : -f 7 | sort | uniq - OR - $ cut -d : -f 7 /etc/passwd | sort | uniq - OR - $ cut -d : -f 7 /etc/passwd | sort -u Does this pipeline (the reverse of the above) give the same output? $ sort -u /etc/passwd | cut -d : -f 7 ------- Filters ------- Note that many Unix commands can act as filters - reading from stdin and writing to stdout. With no file names on the command line, the comands read from standard input and write to standard output. (You can redirect both.) If file names are given on the command line, the commands usually ignore standard input and only operate on the file names. $ grep "/bin/sh" /etc/passwd | sort | head -5 The "sort" and "head" commands are acting as filters. Each command is reading from stdin and writing to stdout. The "grep" command is not a filter - it is reading from the supplied argument pathname, not from stdin. If a command does read from file names supplied on the command line, it is more efficient to let it open its own files than to use "cat" to open the files and feed the data to the command on standard input. (There is less data copying done!) Advice: Let commands open their own files; don't feed them with "cat". Do this: $ head /etc/passwd $ sort /etc/passwd Do not do this (wasteful of processes and I/O): $ cat /etc/passwd | head # DO NOT DO THIS - INEFFICIENT $ cat /etc/passwd | sort # DO NOT DO THIS - INEFFICIENT Problem: "Now, count the number of each kind of shell in /etc/passwd." $ cut -d : -f 7 /etc/passwd | sort | uniq -c Problem: "Count the number of each kind of shell in /etc/passwd and display the results sorted in descending numeric order." $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr Problem: "Count the number of each kind of shell in /etc/passwd and display the top two results sorted in descending numeric order." $ cut -d : -f 7 /etc/passwd | sort | uniq -c | sort -nr | head -2