----------------------- Lab #07 for CST8165 due April 9, 2007 (Week 14) ----------------------- -Ian! D. Allen - idallen@idallen.ca Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Global weight: 5% of your total mark this term. Due date: before 10h00 AM Monday April 9 (Week 14) You have three weeks to do this; because, you will also be working on your Project course. Don't leave things until the last week. Interim submissions: in your Lab periods in Weeks 11, 12, and 13 You will submit whatever progress you have made on this assignment before the end of your lab periods in Weeks 11, 12, and 13 The on-line deliverables for this exercise are to be submitted on-line in the Linux Lab T127 using the "cstsubmit" method described in the exercise description, below. No paper; no email; no FTP. Late-submission date: I will accept without penalty exercises that are submitted late but before 12h00 (noon) on Tuesday, April 10. After that late-submission date, the exercise is worth zero marks. Interim work (whatever you have) must be submitted in Weeks 11, 12, and 13. Exercises submitted by the *due date* will be marked on-line and your marks will be sent to you by email after the late-submission date. Code submitted without your added useful comments (no matter what its source) will not be marked. Add useful comments to *all* submitted code (except the PigLatinTranslator.java file). Exercise Synopsis ----------------- Modify an existing Java-based HTTP server to serve Pig Latin. Document it as you modify it. Test it thoroughly using an automated test script. Write a "man page" for it. Where to work ------------- Submissions must run cleanly in the T127 Linux Lab, though you are free to develop and work on them anywhere you like. If you develop elsewhere, make sure the code works on the Linux Lab machines as well; that's where I test it! Resources / Documentation ------------------------- See week11notes.txt : "Coding an HTTP server (Java)" Suggested template from which to start coding a simple HTTP server: http://www.brics.dk/ixwt/examples/FileServer.java Using the sample code (above), you will implement a basic HTTP RFC 2616 server with Java class name "PigLatinHTTP" that handles the two methods that MUST BE supported by a general-purpose HTTP server. (RFC 2616 Section 5.1.1 p.36) The 145-line FileServer code (above) is a very good starting point; but, note its many flaws (including the lack of comments, that you will have to add to the code). This HTTP server does not adhere to the HTTP RFC in many respects - it only reads a single Request lines from a client and then closes the connection. Your PigLatinHTTP server must accept HTTP protocol versions 1.0 and 1.1 in requests from clients. (The sample code already does this.) Your server does *not* have to implement parts of HTTP not mentioned in this lab. In particular, you do *not* need to implement these more advanced HTTP features: - NO persistent connections - NO continuation lines - NO URIs with blanks or escapes (e.g. + or %20) - NO reading header lines from a client (only read one client request line) - NO checking for a mandatory "Host:" field (even for HTTP 1.1) The sample HTTP server does not conform to the HTTP client Request protocol; the server only reads one Request line per client and then closes the connection; this is okay. The server need only read a single request line from a client. The given sample code already does this; only one line is read from a client. If in doubt about what you need to implement, ask your instructor. Important Programming Notes: - Remember to return lines to your client ending in CR+LF, not just LF - Remember that error messages should appear on stderr, not stdout: Use System.err.println() not System.out.println(). You can get the program name string using PigLatinHTTP.class.getName(). - Make sure you use the ".equals(String)" method to compare Java strings. Code Quality and Portability ---------------------------- A. You must include useful comment blocks ahead of the code you write or modify. Code submitted without your added useful comments will not be marked (except for the PigLatinTranslator.java file). B. Marks are awarded for readability and elegance, not just correctness. If your code can't be read, you're useless in a team project. Manual and Automated Testing - autotest_http.sh ---------------------------- This section applies to testing your HTTP server. To test your HTTP server, you don't need a real HTTP client. Indeed, a real HTTP client (such as a web browser) will hide most of the Response lines that you need to see when testing your HTTP server. We will use the "netcat" command to act as our HTTP client. Using two windows, we can run both our our HTTP server and a netcat HTTP client simultaneously. In one window, we can start our HTTP server on localhost port 5555. In another window, we can start our netcat HTTP client and tell it to connect to localhost port 55555. We can then type HTTP client Requests into the client window, that will be read by our HTTP server running in the server window. To test an HTTP Request line automatically, we can "fake" a Request from an HTTP client to our HTTP server by putting the line we want the fake client to send in a text file and using the file as standard input to netcat: $ java PigLatinHTTP 55555 . & # start our HTTP server [...] $ nc -v localhost 55555 /tmp/foo.txt * $ nc -v localhost 55555 localhost [127.0.0.1] 55555 (?) open * GET /foo.txt HTTP/1.0 HTTP/1.0 200 OK Content-Type: text/plain Date: Mon Mar 19 12:57:21 EDT 2007 Server: IXWT FileServer 1.0 hi 4) Implement a 30-minute time-out for your server; so, it doesn't hang around forever. Apply method setSoTimeout(30*60*1000) to your open server socket, and catch the SocketTimeoutException and exit while your program is looping. Test this: Set the time-out to 5 seconds and make sure your code catches the time-out exception and exits cleanly. 5) Reduce the number of global variables. Modify the class code to remove all the global class variables except port and wwwhome. Move the removed variables into the function that uses them and pass them as arguments to all other functions that need them. 6) Make the remaining two class global variables "private". This class has no public global variables. 7) Hide the private methods of this class. Make all the methods in this class except main() and PigLatinHTTP() "private". 8) Modify the processRequest() method to return one or two strings on any errors in parsing or accessing the browser Request. Do not call errorReport() - return the error message string(s) instead. Return null if everything worked. Why two strings? Read on: 9) On return from processRequest() use errorReport() to process any non-null error message string(s) returned. This will be the only place in your program where errorReport() will be called; all other uses of errorReport() will have been replaced by returning one or two strings of error message text. If you return two strings from processRequest(), you can pass both strings to errorReport() for output. One string should contain the Status-Code and (short) Reason Phrase from 6.1.1 p.40-41. The other (longer) string may contain more explanatory material. Why return two strings? One string contains the Response Status-Code and short Reason Phrase (Section 6.1.1 p.39) that is returned as the first header line of a server Response; the other string is a more detailed error message that would be returned in the body of the Response message. For example, the first string might be the Status-Code and Reason Phrase "404 Not Found" and the second string might be "/tmp/nosuch - The requested URL was not found on this server". Your server might generate this Response message using these two strings (this is just an example - you may format the message more clearly): HTTP/1.0 404 Not Found 404 Not Found - /tmp/nosuch - The requested URL was not found on this server.

/tmp/nosuch - The requested URL was not found on this server.

/tmp/nosuch - The requested URL was not found on this server.
IDAllen IXWT PigLatinHTTP 1.0 at localhost Port 55555
The errorReport() function supplied by the original code doesn't handle the input strings very well; it currently accepts three strings but always concatenates the first two "code" and "title" together; so, why not just pass in two strings instead of three, or use "code" to index a table to find the text Reason Phrase? According to the RFC, the Status-Code and Reason Phrase (e.g. "404 Not Found") should be the only strings printed in the first Response line from your server (as shown in the above example). The longer string (named "msg") may be printed as further explanatory text in the body of the message seen by the browser (as shown above). Section 6.1.1 p.40-41 has a list of possible Status-Codes and Reason Phrases that you should review for possible use by your server. 10) Modify the processRequest() method to reduce indentation levels, using "return" where appropriate. Now that the method returns on error, you can remove many "else" clauses and shift all that code left, reducing indentation and making the program easier to read. 11) Fix the request parsing. It is broken and also not very "liberal": - If the client exits (^C) before issuing any request, the HTTP server generates a Null Pointer exception. - "GET//foo.txt HTTP/1.0" should fail as bad syntax, but does not. - "GET /foo.txt HTTP/1.0" could be accepted "liberally"; but, it fails. BE LIBERAL in the syntax you accept in client requests to your server. For example: Allow extra white-space after the method and before the HTTP version string. You may assume for this lab that the URI in the client request does not contain any embedded whitespace. (Blanks would usually be escaped as %20; however, your server doesn't need to handle blanks or escapes in URIs.) Improve the server's URL parsing section to fix the errors and also allow extra blanks before and after the URL. Hints: The "split()" method of String is useful for parsing into an array. The "trim()" method of String is useful for removing leading and trailing whitespace. 12) Implement the following server Response header fields: Server: (pick a name for your PigLatinHTTP server) Content-Length: (the length in bytes of the file being served) Content-Type: (the MIME type of the file being returned) Date: (the current date - section 14.18) Last-Modified: (the last modified date of the file being served - 14.29) Some of the above fields are already implemented for you. Java note: The java.io.File class has methods to return some of the above required information. The "lastModified()" and "length()" methods of File will be useful. The Date(int) class is useful for converting lastModified() seconds into a printable date string. 13) Continue to use the guessContentTypeFromName method to generate your Content-Type field. (The sample code already does this.) 14) Major Change: If the content type being requested is guessed to be "text/plain", return the "Pig Latin" version of the text in the file, not the regular text. You will find a PigLatinTranslator.java file in the course notes. See the comments at the top of the PigLatinTranslator.java file for how to use it in your PigLatinHTTP class. You can try the translator stand-alone by building and running it: $ javac PigLatinTranslator.java $ java PigLatinTranslator &1 | cat -v | tee test_out.txt Usage (run all tests; no prompting; no display on screen; use "tail -f"): $ ./autotest_http.sh &1 | cat -v >test_out.txt If you don't use "tee", then in a separate window you can run "tail -f test_out.txt" to see the progress of a test script in writing to the test_out file. Remember to edit autotest_http.sh.txt to update the incomplete list of tests at the bottom. 24) After you have run your tests, edit, title, and number each test output to match the test titles and numbers in your README.txt file. (Of course, you may set up your testing script to do this for you.) You don't have to copy the test output into the README.txt file if you can refer to the test output in test_out.txt by name or number. 25) Note: Nothing in your code or test structure can include absolute pathnames (except to well-known system files); in particular, you must not reference your own home directory or directories that are not public in your test scripts. Anyone must be able to run your tests. Writing the "man page" - httpserver.txt ---------------------- 26) Write a text "man page" file named httpserver.txt for your PigLatinHTTP server program that has the following standard man page headings: NAME SYNOPSIS DESCRIPTION ENVIRONMENT AUTHOR REPORTING BUGS COPYRIGHT SEE ALSO Use "man date" as your model for each of these sections. Lines must be shorter than 80 columns. (Optional Note: Unix man pages are actually written using a mark-up language named "troff", "nroff", or "groff" that is processed to create the on-line text man pages you see with the "man" command. You will find the nroff source to "man date" in the file /usr/share/man/man1/date.1.gz and you are free to optionally write your own man page by editing and modifying that nroff source format. Once you have your page written in nroff format, you can format it using "man ./file" where "./file" is tha pathname to your man page source. If the argument to man has a slash, it is taken to be an nroff source file to format. Writing in nroff source format is optional.) Assignment Review - review.txt ----------------- In a file named review.txt fill in the following information: 27) Progress: Document how much of the lab you completed and submitted in your interim submissions (a) at the end of Week 11, (b) at the end of Week 12, (c) at the end of Week 13. Give the Step Numbers. 28) Completion: For each of the 30 steps in the assignment, document whether you completed the step. 29) Objectives: Comment on the assignment objectives vs. the course outline and indicate if, on your opinion, the assignment is relevant to the course and contributes to your learning of the course material. 30) Difficulty Level: Using a scale from 0-5, where 0 is "easy", document how difficult the assignment was, and how much time you spent doing it. Submission and Marking Scheme ----------------------------- Submission Standards (see earlier labs for details): A. At the top of each and every submitted file, as comments, create an Exterior Assignment Submission label. This label identifies the file; it is not a substitute for proper documentation in the file. Your file will still need comments and function headers. B. For material you copy from other sources, credit the author and source. If the comments in the source you copy are not sufficient, you must fix and add to them, just as if you wrote the code yourself. (You do not need to add comments to the PigLatinTranslator.java code.) C. Submit all your source files for marking as Exercise 07 using a *single* cstsubmit command line (always submit all files together): $ ~alleni/bin/cstsubmit 07 \ PigLatinHTTP.java httpserver.txt PigLatinTranslator.java \ README.txt test_out.txt autotest_http.sh review.txt You must submit seven files. Submit all the files necessary to test and run your HTTP server. Do not submit object files or binary files. Note: Nothing in your code or test structure can include absolute pathnames (except to well-known system files); in particular, you must not reference your own home directory. D. To be marked, the files named above must have the exact names given. Code submitted without your added useful comments (no matter what its source) will not be marked. Add proper comments to *all* the code except the PigLatinTranslator.java file. E. If you aren't sure if you've submitted all the necessary files for your project, change to a new empty directory and use cstsubmit to fetch your submission back, expand it, and run your test script. It should work. F. All files submitted must be named correctly and have assignment headers. Marking Scheme -------------- I) User Manual - "man page" in httpserver.txt : 10% - Man page file name: httpserver.txt - follow the heading format given in "man date" - How do you use your HTTP server program? - What are the inputs and outputs? - What exit status is returned? - What environment variables or external data are used? - What other programs are similar to this or useful in conjunction with this? II) Your Automated Testing and README.txt : 60% - Your test plan is most of your mark. Even if your code doesn't work, you can write a comprehensive test plan. - File names: README.txt, autotest_http.sh, and output in test_out.txt - Number the sections in README.txt and test_out.txt; refer to them in your testing, and vice-versa (cross reference both files). - Your updated and enhanced autotest_http.sh and its output must refer to the numbered sections in your README file. - Your README.txt file must refer to the numbered sections in your updated autotest_http.sh and its test_out.txt output file. - Document what should happen for every possible type of good and bad input. - command line parsing and argument validation - HTTP method and URL parsing and validation - connections from local and remote HTTP clients - handling of the two major HTTP method types - other tests? - Submit your testing output file named "test_out.txt" - You must use and enhance the "autotest_http.sh" automated testing script for at least some of your tests. (You may need to do manual testing too.) - The HTTP RFC 2616 sets down the rules for your server behaviour; but, remember that you don't have to implement the tricky stuff. III) Coding Style: 30% - Source File: PigLatinHTTP.java - see note file programming_style.txt - are the comments you add useful in understanding what the code does? - no "useless comments" - see programming_style.txt - comments are in block form, not excessively interleaved with code - sparing use of end-of-line comments (stay within 80 columns!) - do the error messages appear on stderr and contain full information, including the name of the program issuing them? - never say "too many" - print the limit - never say "not enough" - tell exactly what was expected - user input should never be called "illegal" unless it's against the law - is this code easy to read and understand? - neat, organized, well-spaced - is the indentation consistent (Unix tabs are every 8 - "man expand") - is this code easy to change and maintain? - no "magic numbers" - document your constants and offsets - no duplicate code (uses functions for common actions, repeated code) - all input presumed hostile and handled safely - all function and system call return codes checked - "less code is better code" - "be liberal in what you accept" IV) Bonus Work: +10% - You may earn an additional 10% for bonus work (see above). All files submitted must be named correctly and have assignment headers.