------------------------- Week 10 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) ------------------- INDEX to this file: - (continued...) reading the HTTP protocol: RFC 2616 - coding an HTTP server in Java - sniffing browser requests to servers without using Ethereal ------------------- Review: Q: Determine if google.ca, yahoo.ca, and facebook.com adhere to the first SHOULD clause in section 4.1 on p.31 - nc -v google.ca http OR telnet google.ca http HTTP RFC 2616 (continued...) -------------- Standards: http://www.w3.org/Protocols/ RFC: http://tools.ietf.org/html/rfc2616 ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt Errata: http://skrb.org/ietf/http_errata.html http://purl.org/NET/http-errata Issues: http://greenbytes.de/tech/webdav/draft-lafon-rfc2616bis-issues.html Mail Archives: http://lists.w3.org/Archives/Public/ietf-http-wg/ p.15 - ABNF extended with a "#rule" for comma-separated lists: ( *LWS element *( *LWS "," *LWS element )) becomes 1#element - implied *LWS can appear between any ajacent tokens or strings in the grammar Q: Describe what this ABNF HTTP rule means: 2#3("foo") p.15-16 - HTTP ABNF grammar is unaffected by LWS between tokens - HTTP 1.1 lines can continue ("fold") onto multiple lines if the continuation line begins with a space or horizontal tab - the only CRLF allowed is part of a continuation line - if you want a real CRLF, or a non-ISO-8859-1 character, in a header field, encode it as RFC2047 (MIME) Q: How can you fold a long line in HTTP 1.1? p.17 - must double-quote special characters used in message headers - some fields allow comments in parentheses Q: What do HTTP comments look like in message headers? Q: Can a relative Request-URI (client message to server) begin without a slash, i.e. can it be a relative pathname? (5.1.2 p. 36) Q: Can an HTTP client request an empty URI? (5.1.2) Q: T/F If a URI or "Host:" header field specify a host name that is not recognized on this server, the server MUST forward the request to the other host name. (5.2) - "request header fields" - section 5.3 p.38 - can only be extended with a protocol change - unknown fields are treated as "entity header" fields - for HTTP 1.1, the Host field is required (14.23 p.129), but may be empty - see also the "MUST" paragraph in section 9 p.51 - for virtual hosts, an absolute URI over-rides the "Host:" header (p.38) Q: List the names of the mandatory request header fields for HTTP 1.1 Q: T/F If you give the host name in a URI using HTTP 1.1, you don't need to send the Host: header field. HTTP Status Code and Reason Phrase - section 6.1.1 p.39 ---------------------------------- - 3 digits, machine-readable - only first digit has an assigned meaning (one of five) p.40 - five "classes" of response, based on the first digit Q: What are the five possible meanings of the first digit of an HTTP response? Q: T/F The "reason phrase" is defined by the HTTP protocol and should not be changed or replaced. Q: T/F HTTP 1.1 clients must understand the meaning all of the defined HTTP 1.1 status codes. Q: T/F An HTTP client need only understand five classes of response code. Q: If an HTTP server returns an unrecognized status code to a client, what SHOULD the client do with the response? (6.1.1 p.41) - "response header fields" - section 6.2 p.39 - can only be extended with a protocol change - unknown fields are treated as "entity header" fields Entity (section 7 p.42) ------ - the "entity" is the thing being transferred, e.g. image, text, etc. - "entity headers" give information about the entity being transferred - may include "extension header" fields - unrecognized extension headers SHOULD be ignored - entity body is 8-bit clean (unlike SMTP) - but a transfer coding may have been applied to assist transit - The sender of an HTTP 1.1 message SHOULD give the Content-Type - but if not (and only if not), the recipient MAY guess it by inspection (7.2.1 p.43) Q: T/F In the HTTP 1.1 protocol, senders MUST provide the entity Content-Type header field. Q: T/F A recipient may over-ride the Content-Type by inspecting the entity being transferred (or its URI). Q: If no Content-Type is specified, what type is assumed? (7.2.1) - the entity-Length of a message is calculated *before* transfer encodings have been applied (i.e. it is the actual length of the entity, regardless of how it might be altered to be transferred) - The Content-Length header, if present, MUST represent *both* the entity-length and the actual transfer-length. (4.4 p.33) - You MUST NOT send a Content-Length field if you apply a Transfer Encoding (because the Transfer Encoding might change the size). If a Transfer-Encoding field is present, you MUST NOT send Content-Length. Q: T/F The Content-Length is both the real size of the item being sent and the size of the actual data being transferred. Persistent Connections (HTTP 1.1 - section 8.1 p.44) --------------------------------- - persistent TCP connections have many advantages: - fewer TCP handshakes - reduced CPU, memory, latency - allow pipelining multiple requests without waiting for responses - longer connections allow better TCP congestion control - allows HTTP to evolve - no penalty for trying a feature then dropping back to previous version Q: T/F HTTP implementations MUST implement persistent connections. (8.1.1) Q: T/F A persistent connection MUST drop on an error condition. (8.1.2) Q: Describe three of four advantages of persistent TCP connections (8.1) - a "Connection:" header field can ask for explicit connection closing: Connection: close Q: How can you signal the end of an HTTP 1.1 persistent connection? Q: T/F You signal the end of an HTTP session using the same keyword as SMTP - QUIT. - persistent connections require that all messages have a self-defined message length - you can't end the message by closing the connection Q: Why do persistent connections need message lengths? - clients should not pipeline non-idempotent methods or non-idempotent sequences of methods, to avoid inconsistent state if the connection drops Q: Why not pipeline non-idempotent methods? (8.1.2.2 p.46) - HTTP does not define any time-out for persistent connections (actually, I can't find any time-out for *anything*!) - connection close events may happen at any time (asynchronous) - clients SHOULD limit to 2 the number of persistent connections to a server Premature Server Close - 8.2.4 p.50 ---------------------- - uses "binary exponential backoff" of T = R * 2**N Q: T/F HTTP servers MAY double their wait times on each retry. Methods - section 9 p.51 ------- - safe methods should not have side-effects (GET, HEAD) - the user did not request the side-effects, even if they happen - idempotent methods may have side-effects, but doing them once or more than once should not make a difference - GET, HEAD, PUT, DELETE - OPTIONS and TRACE never have side-effects, are idempotent - a *sequence* of methods may not be idempotent, even if each methods is - "A sequence is idempotent if a single execution of the entire sequence always yields a result that is not changed by a reexecution of all, or part, of that sequence." e.g. PUT, DELETE vs. PUT - A sequence that never has side effects is idempotent, by definition Q: Define "safe" and "idempotent" methods. Q: T/F A sequece of idempotent methods is always itself idempotent. GET - section 9.3 p.53 --- Q: What is a "conditional GET"? Q: What is a "partial GET"? Coding an HTTP server (Java) ---------------------------- HTTP RFC: http://tools.ietf.org/html/rfc2616 A working Java HTTP server with basic functionality (in 145 lines) is available here: http://www.brics.dk/ixwt/examples/FileServer.java - this version does not adhere to the HTTP RFC in many respects - needs comments on functionality (not on how Java works) - has many "public" items that should be made private - may be missing things such as closing opened files... (Older version: http://www.brics.dk/~amoeller/WWW/javaweb/index.html ) An overview of TCP, HTTP and servers using Java: http://www.brics.dk/ixwt/http.pdf Sun Guides/Tutorials on Java networking (mostly client side): http://java.sun.com/j2se/1.5.0/docs/guide/net/overview/overview.html http://java.sun.com/docs/books/tutorial/networking/index.html http://java.sun.com/docs/books/tutorial/networking/urls/index.html java.net references: http://java.sun.com/j2se/1.5.0/docs/api/java/net/package-summary.html Java 5.0 (also known as 1.5) package documentation: http://java.sun.com/j2se/1.5.0/docs/ http://java.sun.com/j2se/1.5.0/docs/api/ - java.io.File, java.lang.String, etc. Sniffing Browser HTTP Requests ------------------------------ To see what line a browser sends to an HTTP server, you can use Ethereal; or, for a quick dump, just use netcat on a spare port (e.g. 55555) and have the browser access http://localhost:55555/foobar : [ Start a fake HTTP server on a spare port, e.g. 55555 ] $ nc -v -l -p 55555 localhost # Debian/Ubuntu $ nc -v -l localhost 55555 # RedHat/Mandrake listening on [any] 55555 ... [ Start up your browser and connect to http://localhost:55555/foobar ] connect to [127.0.0.1] from localhost [127.0.0.1] 40757 GET /foobar HTTP/1.1 Host: localhost:55555 User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060216 Debian/1.7.12-1.1ubuntu2 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-ca,en-us;q=0.9,en-gb;q=0.7,en;q=0.6,fr-ca;q=0.4,fr-fr;q=0.3,fr;q=0.1 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive [ At this point, you can type back a server reply to the browser ] HTTP/1.1 200 this is my reply to the browser Content-Type: text/plain ab cd ef gh ^C (interrupt)