------------------------- Week 12 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Keep up on your readings (Course Outline: average 4 hours/week homework) Review: ------ - design issues for HTTP (Tim Berners-Lee documents) - structure of Requests and Responses - using netcat with HTTP clients and servers - three methods of session tracking - absolute vs. relative URIs - handling unrecognized HTTP header lines - status codes - persistent connections HTTP - Hyper Text Transfer Protocol - continued ---- HTTP Methods - section 9 p.51 ------------ "Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to themselves or others." p.51 - safe methods should not have side-effects p.51 - GET and HEAD "SHOULD NOT" have any effect other than retrieval - the user did not request the side-effects, even if they happen "Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE SHOULD NOT have side effects, and so are inherently idempotent." p.51 - idempotent methods may have side-effects, but doing them once or more than once should not make a difference - e.g. GET, HEAD, PUT, DELETE are idempotent (can be done repeatedly) - OPTIONS and TRACE never have side-effects, are idempotent - a *sequence* of methods may not be idempotent, even if each method is: - "A sequence is idempotent if a single execution of the entire sequence always yields a result that is not changed by a reexecution of all, or part, of that sequence." e.g. "PUT, DELETE" is not an idempotent sequence because partial execution (e.g. just PUT) doesn't give the same effect as "PUT, DELETE" - A sequence made up entirely of methods that never have side effects is idempotent, by definition Q: Define HTTP "safe" and "idempotent" methods. What do they mean? Q: Give examples of HTTP "safe" methods. Q: Give examples of HTTP "idempotent" methods. Q: T/F A sequece of idempotent methods is always itself idempotent. GET - section 9.3 p.53 --- "The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field." "The semantics of the GET method change to a "partial GET" if the request message includes a Range header field." Q: Explain what is a "conditional GET"? Q: Explain what is a "partial GET"? HEAD - section 9.4 p.54 ---- "The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response." Q: What is the difference between the message headers returned by GET and HEAD? HTTP security ------------- - RFC 2616 was updated by 2817 to add Transport Layer Security - TLS http://tools.ietf.org/html/rfc2817 ftp://ftp.rfc-editor.org/in-notes/rfc2817.txt - 1997 meeting deprecated the practice of separate secure ports (having separate ports halves the number of usable ports!) "Parallel well-known port numbers have similarly been requested -- and in some cases, granted -- to distinguish between secured and unsecured use of other application protocols (e.g. snews, ftps). This approach effectively halves the number of available well known ports. At the Washington DC IETF meeting in December 1997, the Applications Area Directors and the IESG reaffirmed that the practice of issuing parallel "secure" port numbers should be deprecated. The HTTP/1.1 Upgrade mechanism can apply Transport Layer Security [6] to an open HTTP connection." Q: Why does the IETF deprecate the use of separate port numbers for secure versions of Internet protocols? ----------------------------------------------------------------------------- Sending electronic mail: SMTP ----------------------------- http://tools.ietf.org/html/rfc2821 - Remember: The protocol and ports used to send email (SMTP) are completely separate from the ports and protocols used to fetch email (POP3, IMAP)! SMTP - Simple Mail Transfer Protocol - RFC821 -> RFC2821 - April 2001 - 79 pages on top of TCP (95 pages) on top of IP (45 pages) - a "PUSH" protocol - sender initiates (HTTP is "PULL" protocol) - http://tools.ietf.org/html/rfc2821 "This document is a self-contained specification of the basic protocol for the Internet electronic mail transport. It consolidates, updates and clarifies, but doesn't add new or change existing functionality of the following: RFC822, DNS, RFC1123" - did not add to or change RFC821; dropped obsolete items Q: T/F RFC2821 replaced RFC821 and added new SMTP functionality Algonquin SMTP server --------------------- Algonquin network restrictions prevent access to other SMTP servers from on campus. You must connect to the Algonquin SMTP server to send email. In strict conformace with RFC 2821, the Algonquin SMTP server accepts only CR+LF line ends - you have to type ^V^M^M (CTRL-V RETURN RETURN) at the end of every line to make it work. $ nc -v outmail.algonquincollege.com smtp Connection to outmail.algonquincollege.com 25 port [tcp/smtp] succeeded! 220 mail4.algonquincollege.com -- Server ESMTP (Sun Java System Messaging Server 6.2-7.02 (built Jun 13 2006)) quit quit quit ... - connection hangs after the banner and it appears that it doesn't accept any further commands; because, the Sun server demands CR+LF line ends, not just LF line ends as given by "nc" (the Sun server is RFC-compliant; but, not very liberal in what it accepts!) - the fix is to enter ^V (CTRL-V followed by pushing the RETURN key twice) at the end of each line: $ nc -v outmail.algonquincollege.com smtp Connection to outmail.algonquincollege.com 25 port [tcp/smtp] succeeded! 220 mail4.algonquincollege.com -- Server ESMTP (Sun Java System Messaging Server 6.2-7.02 (built Jun 13 2006)) quit^V^M 221 2.3.0 Bye received. Goodbye. Q: T/F, the Algonquin SMTP server violates the SMTP RFC by requiring CRLF on the end of each line. * SMTP vs. Message Format - the SMTP *protocol* does not define the format of the *message* - the *message* delivered by the *protocol* has its own description: RFC822 -> RFC2822 "Internet Message Format" (51 pages) - http://tools.ietf.org/html/rfc2822 - the content of the message (including To/From message header lines) is independent of the To/From used in the SMTP protocol! Q: T/F The SMTP protocol RFC defines the format and headers of an email message * SMTP is a readable ASCII protocol on top of TCP - not binary! - you can run it using "nc" or telnet to port 25 - but you can't do it here at Algonquin College! - port 25 blocked leaving the College (must use College servers) - College servers implement long wait times before answering - to discourage spam programs that don't wait as long - SMTP wait times are documented in http://tools.ietf.org/html/rfc1122 "Timeouts are an essential feature of an SMTP implementation. If the timeouts are too long (or worse, there are no timeouts), Internet communication failures or software bugs in receiver-SMTP programs can tie up SMTP processes indefinitely. If the timeouts are too short, resources will be wasted with attempts that time out part way through message delivery." * a sample SMTP session: see Notes file smtp_session.txt Note the difference between the SMTP RFC2821 "envelope" FROM/TO lines and the RFC2822 Message From:/To: lines. The Message From:/To: lines need not be related to the SMTP RFC2821 envelope FROM/TO lines, and application writers are warned not to try to link them: (RFC 2821 Section 7.2) * Extending the original SMTP protocol "HELO" with "EHLO" - orignal SMTP "HELO" greeting had no protocol version number - no way to negotiate options or features - RFC1425 (1993) replaced HELO with new EHLO greeting, allowing extensions - http://tools.ietf.org/html/rfc1425 - awkward way to do protocol versioning - latest version of extensions: http://tools.ietf.org/html/rfc2821 - SMTP extensions (must be registered with IANA) ABNF: ehlo-cmd ::= "EHLO" SP domain CR LF Q: Is the EHLO case-sensitive? Q: Is the domain optional? - HELO vs. EHLO: http://tools.ietf.org/html/rfc2821 "Contemporary SMTP implementations MUST support the basic extension mechanisms. For instance, servers MUST support the EHLO command even if they do not implement any specific extensions and clients SHOULD preferentially utilize EHLO rather than HELO." - response to EHLO: http://tools.ietf.org/html/rfc2821 "Normally, the response to EHLO will be a multiline reply. Each line of the response contains a keyword and, optionally, one or more parameters. Following the normal syntax for multiline replies, these keyworks follow the code (250) and a hyphen for all but the last line, and the code and a space for the last line." - the response to EHLO is a list of options that indicates what optional features this email server offers Q: What SHOULD an SMTP client do if the server refuses EHLO? (RFC2821 section 2.2.1 p.7, section 3.2 p. 16) * Even clever people argue about the interpretation of the RFC documents: - http://www.imc.org/ietf-smtp/old-archive/msg01782.html "Certain individuals have the impression that the correct response to a RSET is ``close the connection'', and insist that RFC-821 backs them up. That seems to be an unusually bizarre interpretation, but by golly they insist that they Following The Standard (TM). It quickly became clear that attempting to reason with such individuals was hopeless." - http://www.imc.org/ietf-smtp/old-archive/msg01783.html "having just reread the text in 821, that construing RSET as a synonym for QUIT must require real creativity (or trying to think with one's head in a normally-uncomfortable position)," - SMTP continuation syntax: every line but the last of a multi-line response contains a "-" immediately following the response number, e.g. $ nc -v localhost smtp localhost.home.idallen.ca [127.0.0.1] 25 (smtp) open 220 elm.home.idallen.ca ESMTP Postfix (idallen@idallen.ca) EHLO idallen.ca 250-elm.home.idallen.ca 250-PIPELINING 250-SIZE 10240000 250-VRFY 250-ETRN 250-STARTTLS 250 8BITMIME Q: How does a SMTP server indicate continuation lines in a reply? * Reading RFC 2821 - the SMTP protocol http://tools.ietf.org/html/rfc2821 The RFC is the final word on the protocol. - note allowed order of SMTP commands p.39 - you cannot reject an address if the HELO/EHLO name doesn't match the IP - note the structure of SMTP reply codes p.40 Q: What is the meaning of the first digit of an SMTP response code? 1yz Positive Preliminary reply (not used in standard SMTP) 2yz Positive Completion reply 3yz Positive Intermediate reply 4yz Transient Negative Completion reply 5yz Permanent Negative Completion reply Q: Do SMTP protocol lines end in CR+LF or just LF? (RFC2821 p.12) Q: Do Internet Message lines end in CR+LF or just LF? (RFC2821 p.12, RFC2822 p.17-18) Q: SMTP commands are given as double-quoted upper-case strings in the RFC 2821. Does this mean they must be upper-case? Q: T/F The space following the three-digit SMTP respose code is mandatory and all clients MUST look for it, failing if it is not found. (RFC 2821 Section 4.2) Q: How must an SMTP client handle new response codes that it doesn't recognize? (RFC 2821 Section 4.2, 4.3.2) Q: T/F SMTP clients can figure out how to proceed based on just the first digit of an SMTP reply code; they can usually ignore the rest. (RFC 2821 Section 4.2, 4.2.1, 4.3.2) Q: T/F You can queue up and send multiple commands to an SMTP server without waiting for any responses. (RFC 2821 Section 4.3.1) Looking at RFC 2821 Section 4.3.2, there are three codes that might be returned by an SMTP server "if the corresponding unusual circumstances are encountered". Clients must be prepared to see these codes in response to any SMTP request. Q: T/F SMTP clients only need to handle the fixed set of requests listed as responses in the RFC document. Q: Looking at RFC 2821 Section 4.5.2, how must clients handle the sending of email message lines that start with a period? Q: What is the maximum length of an email address (local-part plus domain), as passed through the SMTP protocol? (RFC 2821 Section 4.5.3.1) Q: How long may an SMTP server delay before issuing the initial 220 Message greeting? (RFC 2821 Section 4.5.3.2) Q: Based on experience, what is the suggested policy for retrying failed attempts at sending a message? (RFC 2821 Section 4.5.4.1) Q: Should programs attempt to relate the MAIL and RCPT (envelope) email addresses with the addresses (that may be) present in the headers of the message body? (RFC 2821 Section 7.2) http://teaching.idallen.com/cst8165/07w/notes/smtp_session.txt Review of SMTP: - http://tools.ietf.org/html/rfc2821 - Sample SMTP session (long and short) in Notes: smtp_session.txt - SMTP controls the "envelope" TO/FROM, not the message To:/From: - a text-based protocol, easily run using netcat. - 3-digit numeric response codes (know these five groups) - 1yz Positive Preliminary reply (not used in standard SMTP) - 2yz Positive Completion reply - 3yz Positive Intermediate reply - 4yz Transient Negative Completion reply - 5yz Permanent Negative Completion reply Q: Name the five main categories of SMTP server responses Q: T/F SMTP clients can figure out how to proceed based on just the first digit of an SMTP reply code; they can usually ignore the rest. (RFC 2821 Section 4.2, 4.2.1, 4.3.2) SMTP MX records --------------- How does a mail client know to which SMTP server to connect when sending mail to a userid at some domain? It looks up the domain MX records in the DNS. An SMTP client queries the DNS for a domain to obtain "MX" (mail exchange) records that tell which machines accept SMTP mail for the domain: $ host -t mx algonquincollege.com algonquincollege.com mail is handled by 30 mailgate10.algonquincollege.com. algonquincollege.com mail is handled by 20 mailgate11.algonquincollege.com. $ host hotmail.com hotmail.com has address 64.4.32.7 hotmail.com has address 64.4.33.7 hotmail.com mail is handled by 5 mx2.hotmail.com. hotmail.com mail is handled by 5 mx3.hotmail.com. hotmail.com mail is handled by 5 mx4.hotmail.com. hotmail.com mail is handled by 5 mx1.hotmail.com. $ host idallen.ca idallen.ca has address 72.18.159.15 idallen.ca mail is handled by 0 idallen.ca. Q: How does an SMTP mailer know which computer to contact when sending mail to someone@domain.ca ? * SMTP Walk-Through (old RFC 821 version) with comments by Dan Bernstein http://cr.yp.to/smtp.html - comments based on original RFC 821 not RFC 2821 (but often relevant) RFC2822 - message format - http://cr.yp.to/immhf.html - "If you're a new implementor, you'll be shocked at how badly 822 was designed." - RFC2821 standards process "incompetence" by editor Klensin http://cr.yp.to/smtp/klensin.html - group concensus about HELO/EHLO didn't make the final draft! - "What an incredible display of incompetence!" Q: T/F RFC standards development has been a very organized process. ----------------------------------------------------------------------------- Coding an HTTP server (Java) ---------------------------- HTTP RFC: http://tools.ietf.org/html/rfc2616 Testing tools: http://teaching.idallen.com/cst8165/07f/notes/autotest_http.sh.txt http://teaching.idallen.com/cst8165/07f/notes/sample_http_test_out.txt W3C Java server (HTTP 1.1): Jigsaw http://www.w3.org/Jigsaw/ A working Java HTTP server with basic functionality (in 145 lines) is available here: http://www.brics.dk/ixwt/examples/FileServer.java - this version does not adhere to the HTTP RFC in many respects - needs comments on functionality (not on how Java works) - has many "public" items that should be made private - may be missing things such as closing opened files... (Older version: http://www.brics.dk/~amoeller/WWW/javaweb/index.html ) An overview of TCP, HTTP and servers using Java: http://www.brics.dk/ixwt/http.pdf Sun Guides/Tutorials on Java networking (mostly client side): http://java.sun.com/j2se/1.5.0/docs/guide/net/overview/overview.html http://java.sun.com/docs/books/tutorial/networking/index.html http://java.sun.com/docs/books/tutorial/networking/urls/index.html java.net references: http://java.sun.com/j2se/1.5.0/docs/api/java/net/package-summary.html java.net intro http://www.brics.dk/~amoeller/WWW/javaweb/javanet.html Java 5.0 (also known as 1.5) package documentation: http://java.sun.com/j2se/1.5.0/docs/ http://java.sun.com/j2se/1.5.0/docs/api/ - java.io.File, java.lang.String, etc. Java Notes (from a non-Java programmer) ---------- * On returning a pair of strings from a function I suggested that your HTTP server error function take two input strings. The first string is the Status Code and Reason Phrase from the HTTP RFC. The second string is text to put into the Message Body of the Response, giving more detail on the error, e.g.: "404 Not Found" "The Request /nosuchfile.html was not found on this server." * How to return a pair of strings from a function in Java: public class IanStrings { private String[] foo() { return new String[] { "string one", "string two" }; } public static void main(String[] args) { IanStrings istr = new IanStrings(); String[] result = istr.foo(); if (result != null) { System.out.println(result[0] + " and " + result[1]); } } } * On setting and using the setSoTimeout method - the action of using the method to set a time-out may raise a socket I/O exception, at the time you set the time-out (you need a try/catch) - later, when the timer triggers, it will raise the SocketTimeoutException - the above are different exceptions and will occur in different places in your program (and you need different try/catch for them) - to set the time-out, you need to know exactly where your HTTP server blocks waiting for input (which is the same place as all your previous servers) ------------------------------------------------------------------------- Eclipse IDE demo (in the T127 Lab - Fall 2007) ---------------------------------------------- - see also the NetBeans IDE from Sun Warning: Eclipse will need about 10MB of file space for each workspace! You can use your "N" drive to store unused files: $ share //algshare/home/ share: Attaching smb file system //algshare/home at /tmp/smb-abcd0001. Password: Spawning /bin/bash. Exit shell to unmount Samba share. $ cp -a workspace-old /tmp/smb-abcd0001 # use your userid not abcd0001 $ rm -rf workspace-old $ exit Unmounting /tmp/smb-abcd0001. Most of the actions below have keyboard shortcuts that are much faster than navigating menus. Preparing to run Eclipse and starting Eclipse .. If you have old .eclipse or workspace directories from a previous version of Eclipse (old version of Java), remove or rename them. .. Start eclipse (e.g. from the command line or a menu) .. Select a location for your workspace (e.g. accept the default) .. Close the "Welcome" tab (using the X box) Creating a new Project .. Select: File | New | Project .. Select a wizard: Java | Java Project (Next) .. enter Project name: PigLatinHTTP .. select link at bottom: Configure compliance .. select Compiler compliance level: 6.0 .. Apply .. Yes (full rebuild) .. OK .. In the New Java Project dialog, make sure the "Configure compliance" warning is gone .. Finish .. Open Associated Perspective? Yes Importing your FileServer.java file .. File | Import .. Select import source: General | File System (Next) .. Browse to the directory containing your FileServer.java file - you are selecting a *directory* here, not a *file* - enter "." to use your home directory .. In the directory listing, select FileServer.java .. Finish Opening the imported FileServer.java file in the editor .. In the Package tab, use the drop arrow to open PigLatinHTTP .. Use the drop arrow to open (default package) .. Right Click on FileServer.java and select Open - or double-click on the source file name - the FileServer.java tab should open with the source visible - make sure there are no error tags in the left margin of the code Running your project for the first time (setting arguments) .. Right click on the source code and select Run As | Java Application - you should see a "Usage:" message in the console window at the bottom of the screen (missing arguments) .. From the top menu bar select Run | Run .. Select the (x)= Arguments tab .. In "Program arguments" enter: 55555 /tmp .. select Apply .. select Run - you should see a successful start-up message in the console window "FileServer accepting connections on port 55555" .. Push the red square to kill the application. See "" in the console window Re-running your project .. select the green Arrow in the top menu bar See the console output window at the bottom of the screen. .. Push the red square to kill the application. See "" in the console window Adding more files to your project .. Use the Import facility to get more source files .. If the files are imported to the same project/directory, their classes will be available to your main program; you only need to use them Tips: - in the source code, hold your cursor over any word to get help on that word - use F2 to lock focus on the help and allow scrolling - right-click on the source and select Source | Format ------------------------------------------------------------------------- Automated Testing - use it right from the start ----------------- I've provided a script that will do automated testing of your HTTP server, and I've written a few simple automated tests. You must use this script to test your server, and you must organize the script and add your own tests to the script to test things that I haven't. No marks are awarded for using my random tests without modification. Don't be limited by the categories or tests I've coded in the script - my list of tests is incomplete and in a random order. Rewrite the test suite to suit yourself. Add more tests to the suite and organize and renumber the tests that are there into logical categories. If you start immediately using the automated testing script to test your server, you'll save time over doing manual testing and then having to repeat all your tests for handing in. Some programming disciplines have you write the test suite first, then write the code to pass all the tests. If a test doesn't exist for a function, the function is not considered implemented (because it can't be tested).