=================================
Programming Conventions and Style
=================================
-Ian! D. Allen - idallen@idallen.ca

Table of Contents
-----------------
  * Programming as Literature
  * Internal Documentation - Comments
  * Coding style
  * Program arguments and execution syntax - man page style
  * Typical Four-Part Algorithm Structure
  * Prompts and Error Messages
  * Error Messages
  * Commenting Programs - no useless comments
  * Code Commenting Style
  * Forbidden Functions

Programming as Literature
-------------------------

When writing programs, you are not simply trying to "get it to work"; but,
you are also (and most importantly in an academic setting) practicing
and demonstrating good programming techniques.  These techniques will
make it possible for other people to read and understand your programs
after you have written them.  Software maintenance costs far exceed the
costs of developing the original code - your code must be easy to read.

Employers tell us that programming techniques are more important than
the list of which languages you know.  Do you have good style?

Good programming techniques include such things as:

   *  Correct documentation (using guidelines given here)
      - internal documentation (labelling, function headers, etc.)
      - external documentation (man pages, user guides, testing, etc.)
   *  Writing clean, simple, robust code: "Less code is better code"!
      - checking the return status of important functions and syscalls
      - choosing the correct loop and branch control structures
      - rewriting the algorithm to minimize duplicated code
      - making it right before you make it clever or fast
   *  Using correct indentation, style, and mnemonic names
      - variable and file names reflect their use in the algorithm
   *  Using correct modularization and scope (i.e. local vs. global variables)
   *  Correct use of arguments and/or prompting for user input
      - some programs will prompt the user for input if no arguments are given
      - some programs will read standard input if no file names are given
   *  Correct use of exit codes on success/failure of the program
   *  Presenting accurate and useful help if user makes an error:
      1. error messages must appear on standard error, not on standard output
      2. error text must contain the name of the program issuing the error
         (use argv[0] or $0, never hard-code the name)
      3. always echo the input that is in error back to the user, and
      4. give a hint as to the valid input that was expected, e.g.:
           "myprog: Age 23 is not between 12 and 18"

These criteria will be assessed in marking your programs.

You must test the return codes of important functions and syscalls.
Your program must exit with a non-zero status code on a serious error.

Prompts and error messages must be sent to Standard Error (cerr, stderr,
System.err), not to Standard Output (cout, stdout, System.out).


Importing (Copying ) Code
-------------------------

You may import or copy code only with written permission of your
instructor.  Imported code must be clearly credited with its source.

Beware of importing code without understanding it.  If you are given
code to import, you must understand how the code works.

Internal Documentation - Comments
---------------------------------

Internal documentation is documentation placed in the same file as the
actual code you are writing, in the form of comments.  You must have
internal documentation for any code you write or import (with credit)
from other authors.  See the section below on comment style.

You must add comments to all your code, even code that you import (with
credit), and you must bring coding standards of the imported code up to
acceptable levels.

Coding style
------------

A program written in an idiosyncratic style is hard to read and hard to
maintain.  You have to write code that matches the style of the code in
your programming environment.  If the code you are working with doesn't
use TABs for indentation, your added code must not use TABs either, or
vice-versa.  Use the same brace style as the rest of the code.  Adopt the
style of the code you are modifying.

You may choose to update *all* of the code you import to your chosen
style.  If you are importing a large, active code base into your project,
making non-automated changes to the style of the imported code will make
getting future updates difficult.  Weigh the cost of making style changes
against the difficulty of being able to accept future patches to that
code.  (Patches will not work against code whose style has been modified.)

For most school assignments where imported code is allowed, the amount of
code being imported is small, no patches are anticipated, and students are
expected to bring the style of the imported code up to course standards.

In the absence of an existing code base to which you must adapt, your
style must match the published standards for your course or program
of study.

1)  Lines in source files should be kept to 80 characters or less.
    This makes it easy to do side-by-side revision comparisons
    between files.  (*Occasional* lines longer than 80 characters may
    be acceptable, if breaking the line makes it more ugly than having
    it wrap past 80 on the screen.)

    For a large real-world example of 80-column style, see the Linux kernel
    coding standard:  http://lxr.linux.no/source/Documentation/CodingStyle

2)  Indentation must be consistent.  TABs are positioned every 8 under
    Unix/Linux; code accordingly.  Leading TABs or spaces are acceptable;
    but, not both; don't have some lines use leading TABs and other
    lines use leading spaces.  The "indent" program can repair broken
    indentation (but be careful - it often mangles comments, so read
    the man page):   indent -kr -i8 *.c

Don't put the name of a program inside the program, since the name will
be wrong if you rename the executable.  Use argv[0] or $0 instead.


Program arguments and execution syntax - man page style
-------------------------------------------------------

The "man page style" for program execution syntax uses {} to surround
mandatory items and [] to surround optional items.  A program with one
mandatory item that is either a file or a directory and one optional item
that is one or more userids would have a Syntax line:

    $0  {file|directory} [userid...]

Multiple optional arguments must nest:

    $0 arg [ opt1 [ opt2 [ opt3 ] ] ]

which means that if you want to specify opt3 you have to specify opt1
and opt2 before it.  All are shown as optional.


Typical Four-Part Algorithm Structure
-------------------------------------

Functonally, most Unix run-once commands (not servers) typically have
this form:

  1) Input Stage
     - verify the correct number of command-line arguments
     - optionally prompt for any missing arguments
     - complain about missing or extra arguments and exit non-zero

  2) Validation Stage
     - validate the input: make sure numbers are numbers, files are
       files, directories are directories, etc.
     - test to make sure arguments are not empty or null strings
     - test to make sure things exist and are accessible
     - test to make sure things are readable/writable

  3) Processing Stage
     - run your algorithm on your validated input
     - solve the problem; generate the results
     - servers may stay at this stage "forever" until killed

  4) Output Stage
     - display the output

Don't mix up code among the four stages.  Keep them distinct:

    Do not duplicate your Processing Stage or Validation Stage for various
    combinations of Input arguments.  Where possible, collect all the
    input first; then Validate it *once*, then write your Processing
    code *once*.

    Do not duplicate your Output Stage for various combinations of Input
    arguments.  Generate your results and display the results *once*, at
    the end of the program.  Where possible, don't duplicate similar kinds
    of output statements all over your code.  Write one output statement
    that uses variables to contain the variable part of the output.

Less code is better code.


Prompts and Error Messages
--------------------------

Issue prompts on standard error before reading any input from the user;
otherwise, the user won't know what to enter, or when.  The prompt
should explain exactly what kind of input is expected; don't just say
"Enter input".  Menus are also considered to be part of the prompts,
and must also appear on standard error.

Always issue prompts and error messages on stderr (not on stdout), so that
the prompts and error messages don't get redirected into output files.

After reading input, it is often a good idea to echo the input back to
the user, so that they know what was entered.  This is called "echoing the
input back to the user" and improves the readability of test runs.

Optional: Your program doesn't need to prompt the user to enter input
if standard input is detected to be coming from something that is not
a terminal keyboard (e.g. it might be coming from a pipe or redirected
from a file).   See "man isatty".


Error Messages
--------------

Error messages must obey these four rules:

  1.  error messages must appear on standard error, not standard output
  2.  give the name of the program that is issuing the message (from argv[0])
  3.  state what input was expected (e.g. "expecting one file name")
  4.  display what the user actually entered (e.g. "found 3 (a b c)"),
      or at least some indication of what was entered (e.g. show a count).

*Never* say just "illegal input" or "invalid input" or "too many".
Always specify how many is "too many" or "too few":

    myprog: Expecting 3 file names; found 4 (a b c d)
    myprog: Student age 4 is not between 12 and 18
    myprog: Modify days -2 is less than zero

After detecting an error and issuing a message, the usual thing to do is
to exit the program with a non-zero return code.  Don't keep processing
bad data!

Error messages should only show information entered by the user (on the
command line or read from the keyboard) if that information is relevant
to the error:
 - if the error deals with something that isn't related to what the
   user entered, you don't need to show any of the user's input
 - if the error is related to what the user entered, you must display
   what the user entered as part of the error message


Commenting Programs - no useless comments
-----------------------------------------

Comments should add to a programmer's understanding of the code.
They don't comment on the syntax or language mechanism used in the code;
since, both these things are obvious to programmers who know the language.
(Don't comment that which is obvious to anyone who knows the language.)

"Programmer" comments deal with what the line of code means in the
*algorithm* used (the "why"), not with syntax or how the language *works*.

Comment on "why" this code is here, not "how" it works.

"Teacher" or "Instructor" comments talk about how the *language* works,
not about what the code means in the algorithm.  Do not include Instructor
comments in your own code - I already know what the language means.

Thus: Do not use comments that state things that relate to the syntax or
language mechanism used and are obvious to a programmer, e.g.

    # THESE COMMENTS BELOW ARE OBVIOUS AND NOT HELPFUL COMMENTS:
    #
    x=23               # set x to 23           <== OBVIOUS; NOT HELPFUL
    open(file,mode)    # open the file         <== OBVIOUS; NOT HELPFUL
    close(file)        # close the file        <== OBVIOUS; NOT HELPFUL
    x=sqrt(y)          # take the sqrt of y    <== OBVIOUS; NOT HELPFUL
    list.head = NULL   # set head of list to NULL <== etc.

Better, useful programmer-style comments:

    loop=argc                # initialize loop index to max num arguments
    open(acctlog,READ_ONLY)  # open the payroll accounting log r/o
    stdev=sqrt(variance)     # get stdev of student test scores
    close(pipe)              # child process must close this side of pipe

Do not copy "instructor-style" comments into your code.  Instructor-style
comments are put on lines of code by teachers to explain the language
and syntax functions to people unfamiliar with programming (e.g. to
students of the language).  Instructor-style comments are "obvious"
comments to anyone who knows how to program; they should never appear
in your own programs (unless you become an instructor!).

More on this:   http://www.c2.com/cgi/wiki?MethodCommenting
On not needing comments:  http://www.c2.com/cgi/wiki?ToNeedComments


Code Commenting Style
---------------------

Comments should usually appear on their own lines, indented to match
the code to which they apply.  In some cases you can add comments to the
ends of lines (e.g. for variable declarations); but, doing so restricts
the length of the comments and may make your code harder to edit and
maintain by other people.

Comments should be grouped in blocks, ahead of blocks of related code
to which the comments apply.  Do not alternate comments and single lines
of code, or put many comments off to the right of single lines of code.
Doing so makes the code hard to read.  Block comments and block code
are easier to read.  (*Occasional* lines longer than 80 characters,
and *occasional* end-line comments are tolerated.)

Here is a Before/After example taken from an actual student submission:

------------------------------------------------------------------------------
----- Before (bad style) -----------------------------------------------------
------------------------------------------------------------------------------
/*
This function take a command (starts with /) and parse and executes it
*/
void parsecmd(char *msg, size_t size, int fdmax, char ip_addr[][SIZE_IP],int sender){
        int startofmsg; //
        int i;
        int sent=0; // This variable is used to track if the msg was sent or not
        /*The fact that MAX_ARG is 256 this sets a limit of 255 for nicknames */
        char arg[MAX_ARG]; // Used to store IP in case of /msg or new nickname in case of /nick
        char pvt_msg[MAX_MSG]; // The pvt msg that will sent to a client
        if( strncasecmp("/msg",msg,SIZE_MSG) == 0){//Check if the first 5 characters represent the /msg (message) command
                if((startofmsg=striparg(arg,msg+SIZE_MSG,size-SIZE_MSG))==NOARG){//Get the IP and where the msg starts
                        sprintf(pvt_msg,"ERR:Your /msg messsage was mal-formulated. The message was not sent\n");
                        sendall(sender,pvt_msg,strlen(pvt_msg));// return error msg to the sender
                        return;
                }
                startofmsg += SIZE_MSG; // This gives us where the msg starts in src
                // For syntax reason, we assume that there's a \n at the end of this line
                // Also every new line must start with CHT: or it won't be reconized
                sprintf(pvt_msg,"CHT:Private message from %s:\nCHT:%.*s",ip_addr[sender-STD_FDS-1],size-startofmsg,msg+startofmsg);// Make the private msg
                for(i=0; i<fdmax; i++){
                        if(strcmp(ip_addr[i],arg) == 0){ // if the dest ip matches any that we know of
                                sendall(i+STD_FDS+1,pvt_msg,strlen(pvt_msg));// Send that IP the msg
                                //DEBUG
                                sent = 1; // Acknowledge that we sent the msg at least once
                        }
                }
                if(sent == 0){ // If our msg was never sent
                        if(arg[strlen(arg)-1] == '\n')// If no args were supplied arg will be a \n and when it's sent it will have 2 x \n which we be thought as an unknown string
                                arg[strlen(arg)-1] = ' ';// This fixes it
                        sprintf(pvt_msg,"ERR:No one was found with the IP: %s\n",arg);// Make error msg
                        sendall(sender,pvt_msg,strlen(pvt_msg));// send error msg to our original sender
                }
        }
        /*
        ADD more commands here
        */
        else{//Command was not found
                sprintf(pvt_msg,"ERR:Command unrecongnised\n");
                sendall(sender,pvt_msg,strlen(pvt_msg));
        }
        return;
}

------------------------------------------------------------------------------
----- After (better style) ---------------------------------------------------
------------------------------------------------------------------------------

#define MSG_PREFIX "CHT:"       // server->client chat msg protocol prefix
#define ERR_PREFIX "ERR:"       // server->client error protocol prefix

/*
 * Recognize the leading /command word on a string and process the line.
 * Arguments:
 *   cstr:    the string to parse from the attached client
 *   clen:    length of cstr (cstr may not end in \0)
 *   fdmax:   file descriptor max, used by /msg to loop over connected clients
 *   iplist:  list of known IP addresses for each connected client fd
 *   clifd:   fd of the client that sent cstr
 * Return value: nothing
 */
void parsecmd(char *cstr, size_t clen, int fdmax, char iplist[][SIZE_IP],
              int clifd)
{
        char pvt_msg[MAX_MSG];          // msg that will sent to a client

#define IS_COMMAND(cmd,str) (strncasecmp(cmd,str,sizeof(cmd)-1) == 0)

        /* Command: /msg <ipaddress> [message text]
         *   Send the message text to the given client <ipdaddress>
         */
        if( IS_COMMAND("/msg",cstr) ){
                int fd;                 // file descriptor
                int startofmsg;         // index of byte after IP address
                int sent = 0;           // was message sent to any client
                char ipaddr[MAX_ARG];   // parsed IP address after /msg

                /* Chop out string immediately following /msg into ipaddr.
                 * Bug: doesn't check that /msg is followed by whitespace:
                 *      /msgxxxx puts xxxx in IP and ignores actual IP
                 * Bug: striparg can't check for buffer overflow of ipaddr
                 * Bug: ipaddr may end up with \n on the end from striparg;
                 *      so, "/msg 127.0.0.1\n" fails to match 127.0.0.1 and
                 *      compensating code had to be added below
                 */
                startofmsg = striparg(ipaddr, cstr+SIZE_MSG, clen-SIZE_MSG);
                if( startofmsg == NOARG ){
                        sprintf(pvt_msg, "%sYour /msg messsage was "
                                "mal-formulated. The message was not sent\n",
                                ERR_PREFIX);
                        sendall(clifd, pvt_msg, strlen(pvt_msg));
                        return;
                }

                /* Copy the client IP address and message into pvt_message.
                 * Have to add back SIZE_MSG bytes to get actual offset
                 * of message start inside cstr so that client message
                 * starts at: cstr+startofmsg
                 * Bug: no check for buffer overflow of pvt_msg
                 */
                startofmsg += SIZE_MSG;
                sprintf(pvt_msg, "%sPrivate message from %s:\n%s%.*s",
                        MSG_PREFIX, iplist[clifd-STD_FDS-1],
                        MSG_PREFIX, clen-startofmsg, cstr+startofmsg);

                /* Loop over all clients trying to match ipaddr to IP list.
                 * Send the message to all matches and note success.
                 */
                for( fd = 0; fd < fdmax; fd++ ){
                        if( strcmp(iplist[fd], ipaddr) == 0 ){
                                sendall(fd+STD_FDS+1,pvt_msg,strlen(pvt_msg));
                                sent = 1;
                        }
                }
                if( sent == 0 ){
                        /* Message was never sent: send error back to client.
                         * Bug: have to compensate for bug where ipaddr
                         *      ends in \n (see striparg Bug above)
                         */
                        if( ipaddr[strlen(ipaddr)-1] == '\n' )
                                ipaddr[strlen(ipaddr)-1] = ' ';
                        sprintf(pvt_msg,
                                "%sNo one was found with the IP: %s\n",
                                ERR_PREFIX, ipaddr);
                        sendall(clifd, pvt_msg, strlen(pvt_msg));
                }
        }
        /* Future /command recognition code would go here:
         *   else if( IS_COMMAND("/nick",cstr) )
         */
        else {
                // Bug: Doesn't return what client typed back to client
                sprintf(pvt_msg, "%sCommand unrecongnised\n", ERR_PREFIX);
                sendall(clifd, pvt_msg, strlen(pvt_msg));
        }
        return;
}

Forbidden Functions
-------------------

Buffer overflows let malicious crackers break into machines.  Some C
library functions are unsafe, since they permit buffer overflows except
in controlled circumstances.  The following list of C library functions
are generally forbidden since each given function will generate a buffer
overflow when given sufficiently large input values:

    gets, strcpy, strcat, sprintf, vsprintf

There is no need to use any of the above functions; replace all of the
above with their buffer-safe equivalents:

   fgets, strncpy, strncat, snprintf, vsnprintf

Read the man pages for these functions!  Not all of them treat the buffer
size parameter the same way.  If the transfer operation was truncated
due to a buffer size limit, the trailing '\0' character may or may not
be present in the output buffer; or, the '\0' may be written *after* the
"n" characters have been copied (which might be *after* the end of your
buffer, if you aren't careful!).

Programs that operate as network-visible servers have zero tolerance
for unsafe coding practices.