------------------------- Week 09 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) ------------------- INDEX to this file: - notes on the current assignment - overview of Application Layer - reading the HTTP protocol: RFC 2616 - example: a small HTTP server in Java ------------------- Current assignment ------------------ - Automated Testing - use it right from the start - flushing output to remote servers - Perl uses: $| = 1 - don't want to buffer SMTP lines! Q: Why should you turn off output buffering when sending protocol lines to a remote server? Overview of the Application Layer (slides) ------------------------------------------ - from Kurose/Ross: see course notes/kurose/ - includes HTTP slides HTTP design issues by T.B-L --------------------------- http://www.w3.org/Protocols/DesignIssues.html Q: Why did Tim Berners-Lee choose "Internet Protocol" instead of RPC for HTTP? Q: Name one advantage and one disadvantage of coding HTTP using RPC. Q: Does the HTTP server need to keep state information about the client? Q: Why is the stateless nature of HTTP a problem for such things as search systems? How can the problems be mitigated? Q: Did Tim's original "PORT" command make it into the final HTTP specification? HTTP is stateless; need session tracking ---------------------------------------- http://www.brics.dk/~amoeller/WWW/javaweb/sessions.html - URL rewriting - hidden form fields - cookies Q: Name and describe briefly two of three possible ways to implement implicit HTTP session tracking Q: What is an "HTTP Request"? an "HTTP Response"? HTTP RFC 2616 -------------- Standards: http://www.w3.org/Protocols/ http://tools.ietf.org/html/rfc2616 ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt Errata: http://skrb.org/ietf/http_errata.html http://purl.org/NET/http-errata Issues: http://greenbytes.de/tech/webdav/draft-lafon-rfc2616bis-issues.html Mail Archives: http://lists.w3.org/Archives/Public/ietf-http-wg/ - usually over TCP/IP, but any reliable protocol will do (p.13) Q: Does HTTP require a reliable protocol, or can it run over something unreliable such as UDP? - 1.0 required separate connections per request - 1.1 allows chaining requests (p.14) Q: What big change did HTTP 1.1 bring to the HTTP "one connection per request" model of HTTP 1.0? - unlike SMTP, HTTP has a version number! (p.17) - URI "absolute" vs. "relative" paths (p.19): "URIs in HTTP can be represented in absolute form or relative to some known base URI [11], depending upon the context of their use. The two forms are differentiated by the fact that absolute URIs always begin with a scheme name followed by a colon." p.19 - proxy servers require absolute URIs (p.36) - note that "absolute URI" is not the same as "absolute path" Q: Give examples of HTTP absolute and relative URIs. - path part of URI is case-sensitive; the host and scheme names are not (p.20) "When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:" p.20 Q: Which parts of an absolute URI are case-sensitive? - The HTTP protocol does not place any a priori limit on the length of a URI. - server may issue 414 (Request-URI Too Long) status (p.19) Q: What is the maximum length of a URI, as given in the HTTP spec? - HTTP headers can describe: - "content encoding" - a property of the original entity (p.23) - e.g. "gzip" - "transfer coding" - a property of the HTTP message (p.24) - e.g. "chunked" (transfer content in separate chunks, p.25) - may change how the entity is transferred Q: What is the difference between the "content encoding" header and the "transfer coding" header? - HTTP relaxes CRLF rule - allows consistent CR or LF or CRLF in text (but not in control sequences!) - 3.7.1 p.27 Q: T/F HTTP permits a client to send just CR or LF when communicating with an HTTP server (e.g. when sending a GET or HEAD request). - HTTP Request/Response messages do not use SMTP "continuation" method - message headers continue until an empty line: CRLF CRLF (p.31) Q: T/F The same generic HTTP message type is used both to send messages from client to server and from server to client. (section 4.1) Q: How do HTTP clients and servers detect the end of a series of message header fields (section 4.1)? Q: Is the CRLF at the end of the message headers optional? - leading empty lines SHOULD be ignored (section 4.1, p.31) Q: Determine if google.ca, yahoo.ca, and facebook.com adhere to the SHOULD clause in section 4.1, p.31 - multiple message-header fields with the same name are allowed - but only if the entire field-value is a comma-separated list - should behave as if they were all on one long field (p.32) Q: T/F You can always send multiple identical message header fields; the HTTP protocol says they will be concatenated. - message body MUST NOT be included unless specifically allowed (p.33) - responses to "HEAD" MUST NOT include a message body (p.33) Q: T/F All HTTP Responses may include an optional message body. - HTTP Request and Response messages have the same general format: Request = Request-Line ; Section 5.1 *(( general-header ; Section 4.5 | request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 4.3 Response = Status-Line ; Section 6.1 *(( general-header ; Section 4.5 | response-header ; Section 6.2 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 7.2 - "general header" fields apply to the message, not to the entity being transferred, and they can only be extended by a protocol version change (p.35) Q: T/F HTTP "general header fields" can appear in both Requests and Responses Q: T/F Unrecognized HTTP header fields are presumed to apply to the entity being transferred; they become "entity header" fields - unlike SMTP (HELO and helo), the HTTP "method token" (e.g. "GET") is case-sensitive and must be UPPER CASE ONLY (p.36) - but HTTP header field names in HTTP messages are not! (p.31) Q: T/F HTTP allows the use of either "HEAD" or "head" in a Request Line - servers MUST support at least GET and HEAD (p.36) Q: What method tokens are the minimum required of an HTTP server? - for virtual hosts, an absolute URI over-rides the "Host:" header (p.38) Q: If a client Request contains a host name in both the URI and the Host: header, which one has priority? - RFC 2616 was updated by 2817 to add Transport Layer Security - TLS http://tools.ietf.org/html/rfc2817 ftp://ftp.rfc-editor.org/in-notes/rfc2817.txt - 1997 meeting deprecated the practice of separate secure ports (having separate ports halves the number of usable ports!) "Parallel well-known port numbers have similarly been requested -- and in some cases, granted -- to distinguish between secured and unsecured use of other application protocols (e.g. snews, ftps). This approach effectively halves the number of available well known ports. At the Washington DC IETF meeting in December 1997, the Applications Area Directors and the IESG reaffirmed that the practice of issuing parallel "secure" port numbers should be deprecated. The HTTP/1.1 Upgrade mechanism can apply Transport Layer Security [6] to an open HTTP connection." Q: Why does the IETF deprecate the use of separate port numbers for secure versions of Internet protocols? Coding an HTTP server (Java) ---------------------------- overview of TCP, HTTP and servers using Java: http://www.brics.dk/ixwt/http.pdf Java web server (145 lines): http://www.brics.dk/ixwt/examples/FileServer.java older: http://www.brics.dk/~amoeller/WWW/javaweb/index.html java.net intro http://www.brics.dk/~amoeller/WWW/javaweb/javanet.html java.net reference: see future notes