====================
The Unix/Linux Shell
====================
-IAN! idallen@idallen.ca

Some basic Shell concepts.  Contents:

 * What is a shell for?
 * Most (but not all) commands take what as arguments?
 * How does the shell help run commands?
 * What is a "Bourne" shell?  What is a "C" shell?
 * Basic Command Syntax
 * Shell search PATH
 * Shell command line aliases
 * Shell "history" of previous commands
 * Shell pathname completion
 * Shell GLOB (wildcard) patterns
 * Command line order of processing

--------------------
What is a shell for?
--------------------

To find and run programs.
("Programs" are also called commands or utilities.)

Shells also do programming kinds of things; but, that programming
is usually to aid in the finding and running of programs, not to
do any kind of substantial mathematical or business calculations.

Part of running a program involves supplying command line
arguments to that program; the shell helps with that, too.

---------------------------------------------------
Most (but not all) commands take what as arguments?
---------------------------------------------------

Most - but not all - Unix commands take pathnames as arguments.
Much of what people do online is manipulate files and directories.

"Pathnames" are names that might be file names or directory names.
(Unix also has names that are neither files nor directories,
e.g. the /dev/null pathname is a "character special" device.)

The shell has GLOB (wildcard) features to make matching pathnames easier.

-------------------------------------
How does the shell help run commands?
-------------------------------------

Command names are almost always the names of executable files.
(Some command names are built-in to each shell.)

Shells look for command name executable files in various places, using
a list of directories stored in the $PATH environment variable.

Shells provide aliases and variables to save typing the same
things (commands or pathnames) over and over.

Shells provide a "history" mechanism to recall and edit the last
commands you enter, to save retyping them.

Shells provide ways of completing command and file names, to save typing.

Shells provide wildcards (GLOB patterns) to generate large or small
lists of pathnames as arguments for commands.

-----------------------------------------------
What is a "Bourne" shell?  What is a "C" shell?
-----------------------------------------------

The shells sh, ksh, zsh, and bash (the "Bourne" shells) all have a
common ancestry.  They are all derived from the original shell "sh"
written in the 1970's by Stephen Bourne.  The programming features of
these shells (if statements, for loops, etc.) all look and work the
same way.  This is the best shell to study.

The shells csh and tcsh (the "C" shells) are similar, having a history
dating back to Bill Joy at Berkeley in the 1980's.  Their syntax for
programming is not the same as the Bourne shells.  We do not cover the
C shell syntax in this course; these shells are notoriously buggy.

--------------------
Basic Command Syntax
--------------------

Many Unix commands need both a VERB (what to do) and an OBJECT (what to
do it on).  The following incorrect attempts at Unix commands are wrong:

    $ /etc/passwd         (missing VERB; what are you trying to DO?)
    $ cat                 (missing OBJECT; catenate WHAT file?)

Remember to tell Unix both what you want to do and to what object you
wish to do it:

    $ less /etc/passwd
    $ vim file

-----------------
Shell search PATH
-----------------

After the shell has processed and removed I/O redirection on a command
line, the first token left on the line is assumed to be a command name.
Built-in commands (e.g. "cd", "history") are executed directly by
the shell.  Anything that is not built-in is assumed to be the name of
an executable program, and the shell attempts to find a program with
that name and run it.  (Shells find and run commands.)

If the name contains any slashes (e.g. "/bin/date", "foo/bar",
"./myprog"), the shell executes that pathname directly.

If the name contains no slashes, the shell looks for the executable
file in the list of directories kept in the $PATH environment variable.
The shell tries each directory, left-to-right, and runs the first
executable program it finds.  If the same name appears in multiple PATH
directories, only the first one found is run.

The $PATH variable contains the current list of directories searched by
the shell when it tries to find a command name to run.  Directories in
PATH are separated by colons, e.g.

    $ echo "$PATH"
    /bin:/usr/bin:/usr/X11R6/bin:/usr/games:/sbin:/usr/sbin:/usr/bin/X11

A leading or trailing colon, or two adjacent colons ("::"), indicate
the current directory.  (Do not put the current directory in PATH -
it is a security risk.)

Each entry in PATH should be a directory, not a file name.  Because PATH
is an environment variable, you can change it, and it is inherited by
child processes of your shell.

How it works:

    $ PATH=/bin:/usr/bin ; ls
    - shell tries to find "ls" in first directory in PATH: /bin
      - shell looks for /bin/ls - this exists, so it is executed

    $ PATH=/bin:/usr/bin ; gcc
    - shell tries to find "gcc" in first directory in PATH: /bin
      - shell looks for /bin/gcc - this is not found
    - shell tries to find "gcc" in next directory in PATH: /usr/bin
      - shell looks for /usr/bin/gcc - this exists, so it is executed

    $ PATH=/bin:/usr/bin ; nosuch
    - shell tries to find "nosuch" in first directory in PATH: /bin
      - shell looks for /bin/nosuch - this is not found
    - shell tries to find "nosuch" in next directory in PATH: /usr/bin
      - shell looks for /usr/bin/nosuch - this is not found
    - shell issues message "nosuch: Command not found"

    $ PATH=/bin:/usr/bin ; /usr/games/fortune
    - command name /usr/games/fortune contains slashes, $PATH is NOT used
    - shell executes /usr/games/fortune directly
    - $PATH is NOT used

    $ PATH=/bin:/usr/bin ; ./foo
    - command name contains slashes, $PATH is NOT used
    - shell executes ./foo directly
    - $PATH is NOT used

Your shell has some built in commands, e.g. echo, cd, umask, pwd, history
    - built-in commands are not looked up in $PATH
    $ PATH=/xxjunkxx ; date    # fails (because /xxjunkxx/date fails)
    $ PATH=/xxjunkxx ; pwd     # works because pwd is built-in to shell
    $ PATH=/xxjunkxx ; echo hi # works because echo is built-in to shell
    $ PATH=/xxjunkxx ; cd ..   # works because cd is built-in to shell

More examples:

  $ PATH=/bin:/usr/bin ; ls     # works because /bin/ls exists
  $ PATH=/xxjunkxx ; ls         # fails because /xxjunkxx/ls does not exist
  $ PATH=/bin/ls ; ls           # fails because /bin/ls/ls does not exist

Commands related to PATH:

  which   - tell which $PATH directory contains a command
  whereis - locate commands in "standard" directories (ignores $PATH)
            (also locates man pages for you, if any)

  Note that "whereis" may tell you that a command exists in some standard
  directory, but when you try to execute the command it may not be found,
  if the standard directory is not one of your PATH directories.

  The shell only looks in PATH, not in any "standard" places.

Summary:
  - only command names without slashes are looked up in $PATH
  - a command name containing a slash is NOT looked for in $PATH
  - slashes in a command name mean no PATH lookup; direct try to execute path
    $ ./a.out      # uses the a.out in "."; does not search $PATH
    $ /bin/ls      # executes /bin/ls; does not search $PATH
    $ ../foo       # executes the foo in the parent dir; no $PATH used
    $ foo          # "foo" is looked for in $PATH

Put PATH into all your shell scripts and export PATH

  If you don't set the PATH at the start of your script, the script will
  inherit the PATH from the person or program that executes your script.
  The inherited PATH may or may not contain the correct directories
  needed to find the commands used by your script - your script may fail.

  Choose PATH in your script to include the system directories that
  contain the commands your script needs.  Directories /bin and
  /usr/bin are almost always necessary.  System scripts may need
  /sbin and /usr/sbin.  GUI programs will need the X11 directories.
  Choose appropriately for the script.

--------------------------
Shell command line aliases
--------------------------

Watch out for "helpful" system admin that define aliases for your
shells when you log in.  (This is true on most versions of Linux.)
The aliases may mislead you about how Unix commands actually work.
(For example, the "rm" command does *not* prompt you for confirmation.
On some systems, when you log in, "rm" is made to be an alias for "rm -i",
which *does* prompt.)

To avoid pre-defined aliases, sometimes you can start up a fresh copy
of the shell that has no aliases defined:

    $ alias
    [...many aliases may print here...]

    $ bash
    bash$ alias
    [...no more aliases here...]

The other thing you can do is execute "unalias -a" to remove all your
aliases for the current shell.  You can put this into your shell start-up
file (e.g. .bashrc) to do it every time you start a new shell.

To define your own aliases, look up "aliases" in a Linux text index.
You must put your own alias definitions in a file to have them saved
between sessions (e.g. put them into your .bashrc file).

------------------------------------
Shell "history" of previous commands
------------------------------------

Most shells keep a record ("history") of the commands you type.  The
history is often saved in a file and restored when you next log in.
You can see the history list using the built-in "history" command.

Many shells allow you to use the UP-ARROW and DOWN-ARROW keys to move
up and down in the command history.  (Some shells use other key
sequences, such as ^P and ^N.)

Some shells let you select items from the history list by number (e.g.
"!123").  See your shell's man page.

-------------------------
Shell pathname completion
-------------------------

If you type part of a pathname on a comand line and then push the TAB
key, many shells will attempt to complete the pathname for you.  If
the pathname cannot be completed unambiguously, the shell will show
you a list of possible completions:

   bash$ echo /etc/pas<TAB>
   passwd      passwd-     passwd.OLD  

If you type part of a command name and push TAB, the shell will list
all the possible commands in your PATH that start with those letters:

   bash$ mkd<TAB>
   mkdep      mkdict     mkdir      mkdirhier  mkdosfs    

------------------------------
Shell GLOB (wildcard) patterns
------------------------------

The shell will treat words on the command line containing GLOB characters
as pathnames and try to match the patterns against pathnames to produce
a list of names.

See file: glob_patterns.txt

--------------------------------
Command line order of processing
--------------------------------

The shell parses what you type and changes your command line in a
particular order.  Ignoring history and aliases, that order is:

 1. quote processing and initial blank splitting into tokens and
    individual commands (splitting on semicolons and pipe characters)
 2. look for (and remove) pathname input/output redirection in each command
 3. expand $-variables (splitting unquoted variable contents on blanks!)
 4. look for GLOB patterns and match against pathnames

Because the shell follows the above order, different types of shell
metacharacters have meaning only at certain times.

Example - You can't put a working pathname redirect (e.g. ">foo") inside a
variable; because, the shell looks for redirection metacharacters *before*
the shell looks for and expands variables:

    $ x="> out"
    $ echo hi $x      # <- the shell doesn't find any redirection
    hi > out

The redirection metacharacter is "hidden" inside the variable and doesn't work.

Example - You can't put a working quote inside a variable; because, the
shell looks for quotes before the shell looks for and expands variables:

    $ x="'"
    $ touch a b c
    $ echo $x * $x    # <- the shell doesn't see any quotes here
    ' a b c '
    $ echo ' * '      # <- the shell does see these quotes
    ' * '

The quote character is "hidden" inside the variable and doesn't work.

Example - Even if a pathname matched by a GLOB pattern looks like a
shell variable (e.g. '$x'), it won't be expanded as a variable by the
shell because the shell looks for $-variable metacharacters before it
looks for and processes GLOB patterns against pathnames:

    $ touch '$x'       # <- create a file named $x
    $ x=foo            # <- create a variable $x containing 'foo'
    $ echo *           # <- the shell doesn't find any variable to expand
    $x

The $-variable is "hidden" inside the GLOB pattern and doesn't work.
The shell does the GLOB expansion *after* it has already done $-variable
processing and blank-splitting; none of the characters in a GLOB pathname
expansion are treated specially (even blanks).

If a pathname contains blanks or other special shell metacharacters
(e.g. spaces, semicolons, parentheses, etc.), none of these characters
will be treated as special by the shell because the shell does all that
special character processing before it looks for and processes GLOB patterns:

    $ touch "file with spaces"
    $ rm file*             # <- shell does not see any spaces in the GLOB name

    $ touch "date ; who"   # <- filename containing blanks and semicolon
    $ echo *               # <- shell does not see any semicolon
    date ; who

The shell does the GLOB expansion *after* it has already done all the
blank and semicolon processing; none of the characters in a pathname
are treated specially (not blanks, not redirection, not semicolons).

What you must remember, is that the reverse of all the above *is* true.
Though a GLOB pattern can't produce a working $-variable, a $-variable
*can* produce a working GLOB pattern (because GLOB processing is done
*after* $-variable expansion).  This has serious consequences!

The most critical thing to remember is that if an unquoted variable
contains a GLOB pattern or spaces, the GLOB pattern *will* be expanded,
and the interpolated text *will* be split on blanks, after the unquoted
variable is expanded:

    $ x='*'               # <- put a GLOB pattern into variable $x
    $ touch a b c         # <- create some files in the current directory
    $ echo $x             # <- shell expands $x, then expands * GLOB
    a b c

    $ touch 'file with spaces'
    $ y='file with spaces'
    $ ls $y               # <- shell expands $y, then splits on blanks
    ls: file: No such file or directory
    ls: with: No such file or directory
    ls: spaces: No such file or directory

To prevent the shell from expanding $-variables, you must always
double-quote all uses of variables:

    $ x='*'               # <- put a GLOB pattern into variable $x
    $ echo "$x"           # <- shell expands $x in quotes, quotes hide GLOB
    *

    $ y='file with spaces'
    $ ls "$y"             # <- shell expands $y in quotes, quotes hide blanks
    file with spaces

Inside double quotes, shell variables expand but GLOB patterns do not.
Inside double quotes, spaces are not seen by the shell - the text
interpolated by the variable expansion is not split on blanks.

Always double-quote your $-variables!