================================= Programming Conventions and Style ================================= -Ian! D. Allen - idallen@idallen.ca Table of Contents ----------------- * Programming as Literature * Internal Documentation - Comments * Coding style * Program arguments and execution syntax - man page style * Typical Four-Part Algorithm Structure * Prompts and Error Messages * Error Messages * Commenting Programs - no useless comments * Code Commenting Style * Forbidden Functions Programming as Literature ------------------------- When writing programs, you are not simply trying to "get it to work"; but, you are also (and most importantly in an academic setting) practicing and demonstrating good programming techniques. These techniques will make it possible for other people to read and understand your programs after you have written them. Software maintenance costs far exceed the costs of developing the original code - your code must be easy to read. Employers tell us that programming techniques are more important than the list of which languages you know. Do you have good style? Good programming techniques include such things as: * Correct documentation (using guidelines given here) - internal documentation (labelling, function headers, etc.) - external documentation (man pages, user guides, testing, etc.) * Writing clean, simple, robust code: "Less code is better code"! - checking the return status of important functions and syscalls - choosing the correct loop and branch control structures - rewriting the algorithm to minimize duplicated code - making it right before you make it clever or fast * Using correct indentation, style, and mnemonic names - variable and file names reflect their use in the algorithm * Using correct modularization and scope (i.e. local vs. global variables) * Correct use of arguments and/or prompting for user input - some programs will prompt the user for input if no arguments are given - some programs will read standard input if no file names are given * Correct use of exit codes on success/failure of the program * Presenting accurate and useful help if user makes an error: 1. error messages must appear on standard error, not on standard output 2. error text must contain the name of the program issuing the error (use argv[0] or $0, never hard-code the name) 3. always echo the input that is in error back to the user, and 4. give a hint as to the valid input that was expected, e.g.: "myprog: Age 23 is not between 12 and 18" These criteria will be assessed in marking your programs. You must test the return codes of important functions and syscalls. Your program must exit with a non-zero status code on a serious error. Prompts and error messages must be sent to Standard Error (cerr, stderr, System.err), not to Standard Output (cout, stdout, System.out). Importing (Copying ) Code ------------------------- You may import or copy code only with written permission of your instructor. Imported code must be clearly credited with its source. Beware of importing code without understanding it. If you are given code to import, you must understand how the code works. Internal Documentation - Comments --------------------------------- Internal documentation is documentation placed in the same file as the actual code you are writing, in the form of comments. You must have internal documentation for any code you write or import (with credit) from other authors. See the section below on comment style. You must add comments to all your code, even code that you import (with credit), and you must bring coding standards of the imported code up to acceptable levels. Coding style ------------ A program written in an idiosyncratic style is hard to read and hard to maintain. You have to write code that matches the style of the code in your programming environment. If the code you are working with doesn't use TABs for indentation, your added code must not use TABs either, or vice-versa. Use the same brace style as the rest of the code. Adopt the style of the code you are modifying. You may choose to update *all* of the code you import to your chosen style. If you are importing a large, active code base into your project, making non-automated changes to the style of the imported code will make getting future updates difficult. Weigh the cost of making style changes against the difficulty of being able to accept future patches to that code. (Patches will not work against code whose style has been modified.) For most school assignments where imported code is allowed, the amount of code being imported is small, no patches are anticipated, and students are expected to bring the style of the imported code up to course standards. In the absence of an existing code base to which you must adapt, your style must match the published standards for your course or program of study. 1) Lines in source files should be kept to 80 characters or less. This makes it easy to do side-by-side revision comparisons between files. (*Occasional* lines longer than 80 characters may be acceptable, if breaking the line makes it more ugly than having it wrap past 80 on the screen.) For a large real-world example of 80-column style, see the Linux kernel coding standard: http://lxr.linux.no/source/Documentation/CodingStyle 2) Indentation must be consistent. TABs are positioned every 8 under Unix/Linux; code accordingly. Leading TABs or spaces are acceptable; but, not both; don't have some lines use leading TABs and other lines use leading spaces. The "indent" program can repair broken indentation (but be careful - it often mangles comments, so read the man page): indent -kr -i8 *.c Don't put the name of a program inside the program, since the name will be wrong if you rename the executable. Use argv[0] or $0 instead. Program arguments and execution syntax - man page style ------------------------------------------------------- The "man page style" for program execution syntax uses {} to surround mandatory items and [] to surround optional items. A program with one mandatory item that is either a file or a directory and one optional item that is one or more userids would have a Syntax line: $0 {file|directory} [userid...] Multiple optional arguments must nest: $0 arg [ opt1 [ opt2 [ opt3 ] ] ] which means that if you want to specify opt3 you have to specify opt1 and opt2 before it. All are shown as optional. Typical Four-Part Algorithm Structure ------------------------------------- Functonally, most Unix run-once commands (not servers) typically have this form: 1) Input Stage - verify the correct number of command-line arguments - optionally prompt for any missing arguments - complain about missing or extra arguments and exit non-zero 2) Validation Stage - validate the input: make sure numbers are numbers, files are files, directories are directories, etc. - test to make sure arguments are not empty or null strings - test to make sure things exist and are accessible - test to make sure things are readable/writable 3) Processing Stage - run your algorithm on your validated input - solve the problem; generate the results - servers may stay at this stage "forever" until killed 4) Output Stage - display the output Don't mix up code among the four stages. Keep them distinct: Do not duplicate your Processing Stage or Validation Stage for various combinations of Input arguments. Where possible, collect all the input first; then Validate it *once*, then write your Processing code *once*. Do not duplicate your Output Stage for various combinations of Input arguments. Generate your results and display the results *once*, at the end of the program. Where possible, don't duplicate similar kinds of output statements all over your code. Write one output statement that uses variables to contain the variable part of the output. Less code is better code. Prompts and Error Messages -------------------------- Issue prompts on standard error before reading any input from the user; otherwise, the user won't know what to enter, or when. The prompt should explain exactly what kind of input is expected; don't just say "Enter input". Menus are also considered to be part of the prompts, and must also appear on standard error. Always issue prompts and error messages on stderr (not on stdout), so that the prompts and error messages don't get redirected into output files. After reading input, it is often a good idea to echo the input back to the user, so that they know what was entered. This is called "echoing the input back to the user" and improves the readability of test runs. Optional: Your program doesn't need to prompt the user to enter input if standard input is detected to be coming from something that is not a terminal keyboard (e.g. it might be coming from a pipe or redirected from a file). See "man isatty". Error Messages -------------- Error messages must obey these four rules: 1. error messages must appear on standard error, not standard output 2. give the name of the program that is issuing the message (from argv[0]) 3. state what input was expected (e.g. "expecting one file name") 4. display what the user actually entered (e.g. "found 3 (a b c)"), or at least some indication of what was entered (e.g. show a count). *Never* say just "illegal input" or "invalid input" or "too many". Always specify how many is "too many" or "too few": myprog: Expecting 3 file names; found 4 (a b c d) myprog: Student age 4 is not between 12 and 18 myprog: Modify days -2 is less than zero After detecting an error and issuing a message, the usual thing to do is to exit the program with a non-zero return code. Don't keep processing bad data! Error messages should only show information entered by the user (on the command line or read from the keyboard) if that information is relevant to the error: - if the error deals with something that isn't related to what the user entered, you don't need to show any of the user's input - if the error is related to what the user entered, you must display what the user entered as part of the error message Commenting Programs - no useless comments ----------------------------------------- Comments should add to a programmer's understanding of the code. They don't comment on the syntax or language mechanism used in the code; since, both these things are obvious to programmers who know the language. (Don't comment that which is obvious to anyone who knows the language.) "Programmer" comments deal with what the line of code means in the *algorithm* used (the "why"), not with syntax or how the language *works*. Comment on "why" this code is here, not "how" it works. "Teacher" or "Instructor" comments talk about how the *language* works, not about what the code means in the algorithm. Do not include Instructor comments in your own code - I already know what the language means. Thus: Do not use comments that state things that relate to the syntax or language mechanism used and are obvious to a programmer, e.g. # THESE COMMENTS BELOW ARE OBVIOUS AND NOT HELPFUL COMMENTS: # x=23 # set x to 23 <== OBVIOUS; NOT HELPFUL open(file,mode) # open the file <== OBVIOUS; NOT HELPFUL close(file) # close the file <== OBVIOUS; NOT HELPFUL x=sqrt(y) # take the sqrt of y <== OBVIOUS; NOT HELPFUL list.head = NULL # set head of list to NULL <== etc. Better, useful programmer-style comments: loop=argc # initialize loop index to max num arguments open(acctlog,READ_ONLY) # open the payroll accounting log r/o stdev=sqrt(variance) # get stdev of student test scores close(pipe) # child process must close this side of pipe Do not copy "instructor-style" comments into your code. Instructor-style comments are put on lines of code by teachers to explain the language and syntax functions to people unfamiliar with programming (e.g. to students of the language). Instructor-style comments are "obvious" comments to anyone who knows how to program; they should never appear in your own programs (unless you become an instructor!). More on this: http://www.c2.com/cgi/wiki?MethodCommenting On not needing comments: http://www.c2.com/cgi/wiki?ToNeedComments Code Commenting Style --------------------- Comments should usually appear on their own lines, indented to match the code to which they apply. In some cases you can add comments to the ends of lines (e.g. for variable declarations); but, doing so restricts the length of the comments and may make your code harder to edit and maintain by other people. Comments should be grouped in blocks, ahead of blocks of related code to which the comments apply. Do not alternate comments and single lines of code, or put many comments off to the right of single lines of code. Doing so makes the code hard to read. Block comments and block code are easier to read. (*Occasional* lines longer than 80 characters, and *occasional* end-line comments are tolerated.) Here is a Before/After example taken from an actual student submission: ------------------------------------------------------------------------------ ----- Before (bad style) ----------------------------------------------------- ------------------------------------------------------------------------------ /* This function take a command (starts with /) and parse and executes it */ void parsecmd(char *msg, size_t size, int fdmax, char ip_addr[][SIZE_IP],int sender){ int startofmsg; // int i; int sent=0; // This variable is used to track if the msg was sent or not /*The fact that MAX_ARG is 256 this sets a limit of 255 for nicknames */ char arg[MAX_ARG]; // Used to store IP in case of /msg or new nickname in case of /nick char pvt_msg[MAX_MSG]; // The pvt msg that will sent to a client if( strncasecmp("/msg",msg,SIZE_MSG) == 0){//Check if the first 5 characters represent the /msg (message) command if((startofmsg=striparg(arg,msg+SIZE_MSG,size-SIZE_MSG))==NOARG){//Get the IP and where the msg starts sprintf(pvt_msg,"ERR:Your /msg messsage was mal-formulated. The message was not sent\n"); sendall(sender,pvt_msg,strlen(pvt_msg));// return error msg to the sender return; } startofmsg += SIZE_MSG; // This gives us where the msg starts in src // For syntax reason, we assume that there's a \n at the end of this line // Also every new line must start with CHT: or it won't be reconized sprintf(pvt_msg,"CHT:Private message from %s:\nCHT:%.*s",ip_addr[sender-STD_FDS-1],size-startofmsg,msg+startofmsg);// Make the private msg for(i=0; iclient chat msg protocol prefix #define ERR_PREFIX "ERR:" // server->client error protocol prefix /* * Recognize the leading /command word on a string and process the line. * Arguments: * cstr: the string to parse from the attached client * clen: length of cstr (cstr may not end in \0) * fdmax: file descriptor max, used by /msg to loop over connected clients * iplist: list of known IP addresses for each connected client fd * clifd: fd of the client that sent cstr * Return value: nothing */ void parsecmd(char *cstr, size_t clen, int fdmax, char iplist[][SIZE_IP], int clifd) { char pvt_msg[MAX_MSG]; // msg that will sent to a client #define IS_COMMAND(cmd,str) (strncasecmp(cmd,str,sizeof(cmd)-1) == 0) /* Command: /msg [message text] * Send the message text to the given client */ if( IS_COMMAND("/msg",cstr) ){ int fd; // file descriptor int startofmsg; // index of byte after IP address int sent = 0; // was message sent to any client char ipaddr[MAX_ARG]; // parsed IP address after /msg /* Chop out string immediately following /msg into ipaddr. * Bug: doesn't check that /msg is followed by whitespace: * /msgxxxx puts xxxx in IP and ignores actual IP * Bug: striparg can't check for buffer overflow of ipaddr * Bug: ipaddr may end up with \n on the end from striparg; * so, "/msg 127.0.0.1\n" fails to match 127.0.0.1 and * compensating code had to be added below */ startofmsg = striparg(ipaddr, cstr+SIZE_MSG, clen-SIZE_MSG); if( startofmsg == NOARG ){ sprintf(pvt_msg, "%sYour /msg messsage was " "mal-formulated. The message was not sent\n", ERR_PREFIX); sendall(clifd, pvt_msg, strlen(pvt_msg)); return; } /* Copy the client IP address and message into pvt_message. * Have to add back SIZE_MSG bytes to get actual offset * of message start inside cstr so that client message * starts at: cstr+startofmsg * Bug: no check for buffer overflow of pvt_msg */ startofmsg += SIZE_MSG; sprintf(pvt_msg, "%sPrivate message from %s:\n%s%.*s", MSG_PREFIX, iplist[clifd-STD_FDS-1], MSG_PREFIX, clen-startofmsg, cstr+startofmsg); /* Loop over all clients trying to match ipaddr to IP list. * Send the message to all matches and note success. */ for( fd = 0; fd < fdmax; fd++ ){ if( strcmp(iplist[fd], ipaddr) == 0 ){ sendall(fd+STD_FDS+1,pvt_msg,strlen(pvt_msg)); sent = 1; } } if( sent == 0 ){ /* Message was never sent: send error back to client. * Bug: have to compensate for bug where ipaddr * ends in \n (see striparg Bug above) */ if( ipaddr[strlen(ipaddr)-1] == '\n' ) ipaddr[strlen(ipaddr)-1] = ' '; sprintf(pvt_msg, "%sNo one was found with the IP: %s\n", ERR_PREFIX, ipaddr); sendall(clifd, pvt_msg, strlen(pvt_msg)); } } /* Future /command recognition code would go here: * else if( IS_COMMAND("/nick",cstr) ) */ else { // Bug: Doesn't return what client typed back to client sprintf(pvt_msg, "%sCommand unrecongnised\n", ERR_PREFIX); sendall(clifd, pvt_msg, strlen(pvt_msg)); } return; } Forbidden Functions ------------------- Buffer overflows let malicious crackers break into machines. Some C library functions are unsafe, since they permit buffer overflows except in controlled circumstances. The following list of C library functions are generally forbidden since each given function will generate a buffer overflow when given sufficiently large input values: gets, strcpy, strcat, sprintf, vsprintf There is no need to use any of the above functions; replace all of the above with their buffer-safe equivalents: fgets, strncpy, strncat, snprintf, vsnprintf Read the man pages for these functions! Not all of them treat the buffer size parameter the same way. If the transfer operation was truncated due to a buffer size limit, the trailing '\0' character may or may not be present in the output buffer; or, the '\0' may be written *after* the "n" characters have been copied (which might be *after* the end of your buffer, if you aren't careful!). Programs that operate as network-visible servers have zero tolerance for unsafe coding practices.