-------------------------
Week 10 Notes for CST8165
-------------------------
-Ian! D. Allen - idallen@idallen.ca

Remember - knowing how to find out an answer is more important than
memorizing the answer.  Learn to fish!  RTFM!  (Read The Fine Manual)

-------------------
INDEX to this file:
 - (continued...) reading the HTTP protocol: RFC 2616
 - coding an HTTP server in Java
 - sniffing browser requests to servers without using Ethereal 
-------------------

Review:

Q:  Determine if google.ca, yahoo.ca, and facebook.com adhere to the
    first SHOULD clause in section 4.1 on p.31
    - nc -v google.ca http   OR   telnet google.ca http

HTTP  RFC 2616 (continued...)
--------------
  Standards: http://www.w3.org/Protocols/
  RFC: http://tools.ietf.org/html/rfc2616
       ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt
  Errata:  http://skrb.org/ietf/http_errata.html
           http://purl.org/NET/http-errata
  Issues:  http://greenbytes.de/tech/webdav/draft-lafon-rfc2616bis-issues.html
  Mail Archives: http://lists.w3.org/Archives/Public/ietf-http-wg/

p.15
- ABNF extended with a "#rule" for comma-separated lists:
  ( *LWS element *( *LWS "," *LWS element ))  becomes  1#element
- implied *LWS can appear between any ajacent tokens or strings in the grammar

Q: Describe what this ABNF HTTP rule means:   2#3("foo")

p.15-16
- HTTP ABNF grammar is unaffected by LWS between tokens
- HTTP 1.1 lines can continue ("fold") onto multiple lines
  if the continuation line begins with a space or horizontal tab
- the only CRLF allowed is part of a continuation line
- if you want a real CRLF, or a non-ISO-8859-1 character, in a header
   field, encode it as RFC2047 (MIME)

Q: How can you fold a long line in HTTP 1.1?

p.17
- must double-quote special characters used in message headers
- some fields allow comments in parentheses

Q: What do HTTP comments look like in message headers?

Q:  Can a relative Request-URI (client message to server) begin without
    a slash, i.e. can it be a relative pathname?  (5.1.2 p. 36)

Q: Can an HTTP client request an empty URI?  (5.1.2)

Q:  T/F If a URI or "Host:" header field specify a host name that is not
    recognized on this server, the server MUST forward the request to the
    other host name.  (5.2)

- "request header fields" - section 5.3 p.38
  - can only be extended with a protocol change
  - unknown fields are treated as "entity header" fields
- for HTTP 1.1, the Host field is required (14.23 p.129), but may be empty
  - see also the "MUST" paragraph in section 9 p.51
- for virtual hosts, an absolute URI over-rides the "Host:" header (p.38)

Q: List the names of the mandatory request header fields for HTTP 1.1

Q:  T/F If you give the host name in a URI using HTTP 1.1, you don't
    need to send the Host: header field.

HTTP Status Code and Reason Phrase - section 6.1.1 p.39
----------------------------------
- 3 digits, machine-readable
  - only first digit has an assigned meaning (one of five) p.40
  - five "classes" of response, based on the first digit

Q: What are the five possible meanings of the first digit of an HTTP response?

Q:  T/F The "reason phrase" is defined by the HTTP protocol and should
    not be changed or replaced.

Q:  T/F HTTP 1.1 clients must understand the meaning all of the defined
    HTTP 1.1 status codes.

Q: T/F An HTTP client need only understand five classes of response code.

Q:  If an HTTP server returns an unrecognized status code to a client,
    what SHOULD the client do with the response?  (6.1.1 p.41)

- "response header fields" - section 6.2 p.39
  - can only be extended with a protocol change
  - unknown fields are treated as "entity header" fields

Entity (section 7 p.42)
------
- the "entity" is the thing being transferred, e.g. image, text, etc.
- "entity headers" give information about the entity being transferred
  - may include "extension header" fields
  - unrecognized extension headers SHOULD be ignored
- entity body is 8-bit clean (unlike SMTP)
  - but a transfer coding may have been applied to assist transit

- The sender of an HTTP 1.1 message SHOULD give the Content-Type
  - but if not (and only if not), the recipient MAY guess it by inspection
    (7.2.1 p.43)

Q:  T/F In the HTTP 1.1 protocol, senders MUST provide the entity
    Content-Type header field.

Q:  T/F A recipient may over-ride the Content-Type by inspecting the
    entity being transferred (or its URI).

Q:  If no Content-Type is specified, what type is assumed?  (7.2.1)

- the entity-Length of a message is calculated *before* transfer
  encodings have been applied (i.e. it is the actual length of the
  entity, regardless of how it might be altered to be transferred)

- The Content-Length header, if present, MUST represent *both* the
  entity-length and the actual transfer-length.  (4.4 p.33)
- You MUST NOT send a Content-Length field if you apply a Transfer
  Encoding (because the Transfer Encoding might change the size).
  If a Transfer-Encoding field is present, you MUST NOT send
  Content-Length.

Q:  T/F The Content-Length is both the real size of the item being sent
    and the size of the actual data being transferred.

Persistent Connections (HTTP 1.1 - section 8.1 p.44)
---------------------------------
- persistent TCP connections have many advantages:
  - fewer TCP handshakes
    - reduced CPU, memory, latency
  - allow pipelining multiple requests without waiting for responses
  - longer connections allow better TCP congestion control
  - allows HTTP to evolve
    - no penalty for trying a feature then dropping back to previous version

Q: T/F HTTP implementations MUST implement persistent connections.  (8.1.1)
Q: T/F A persistent connection MUST drop on an error condition.  (8.1.2)
Q: Describe three of four advantages of persistent TCP connections (8.1)

- a "Connection:" header field can ask for explicit connection closing:
  Connection: close

Q: How can you signal the end of an HTTP 1.1 persistent connection?

Q:  T/F You signal the end of an HTTP session using the same keyword as
    SMTP - QUIT.

- persistent connections require that all messages have a self-defined
  message length - you can't end the message by closing the connection

Q: Why do persistent connections need message lengths?

- clients should not pipeline non-idempotent methods or non-idempotent
  sequences of methods, to avoid inconsistent state if the connection drops

Q: Why not pipeline non-idempotent methods?  (8.1.2.2 p.46)

- HTTP does not define any time-out for persistent connections
  (actually, I can't find any time-out for *anything*!)
  - connection close events may happen at any time (asynchronous)
  
- clients SHOULD limit to 2 the number of persistent connections to a server

Premature Server Close - 8.2.4 p.50
----------------------
- uses "binary exponential backoff" of T = R * 2**N

Q: T/F HTTP servers MAY double their wait times on each retry.

Methods - section 9 p.51
-------

- safe methods should not have side-effects (GET, HEAD)
  - the user did not request the side-effects, even if they happen

- idempotent methods may have side-effects, but doing them once or
  more than once should not make a difference
  - GET, HEAD, PUT, DELETE
  - OPTIONS and TRACE never have side-effects, are idempotent

- a *sequence* of methods may not be idempotent, even if each methods is
  - "A sequence is idempotent if a single execution of the entire sequence
    always yields a result that is not changed by a reexecution of all, or
    part, of that sequence."
  e.g.  PUT, DELETE   vs.   PUT

- A sequence that never has side effects is idempotent, by definition

Q: Define "safe" and "idempotent" methods.
Q: T/F A sequece of idempotent methods is always itself idempotent.

GET - section 9.3 p.53
---

Q: What is a "conditional GET"?

Q: What is a "partial GET"?


Coding an HTTP server (Java)
----------------------------
HTTP RFC: http://tools.ietf.org/html/rfc2616

A working Java HTTP server with basic functionality (in 145 lines)
is available here:

  http://www.brics.dk/ixwt/examples/FileServer.java

  - this version does not adhere to the HTTP RFC in many respects
  - needs comments on functionality (not on how Java works)
  - has many "public" items that should be made private
  - may be missing things such as closing opened files...

  (Older version:  http://www.brics.dk/~amoeller/WWW/javaweb/index.html )

An overview of TCP, HTTP and servers using Java:
  http://www.brics.dk/ixwt/http.pdf

Sun Guides/Tutorials on Java networking (mostly client side):
  http://java.sun.com/j2se/1.5.0/docs/guide/net/overview/overview.html
  http://java.sun.com/docs/books/tutorial/networking/index.html
  http://java.sun.com/docs/books/tutorial/networking/urls/index.html

java.net references:
  http://java.sun.com/j2se/1.5.0/docs/api/java/net/package-summary.html

Java 5.0 (also known as 1.5) package documentation:
  http://java.sun.com/j2se/1.5.0/docs/
  http://java.sun.com/j2se/1.5.0/docs/api/
    - java.io.File, java.lang.String, etc.

Sniffing Browser HTTP Requests
------------------------------

To see what line a browser sends to an HTTP server, you can use
Ethereal; or, for a quick dump, just use netcat on a spare port (e.g.
55555) and have the browser access http://localhost:55555/foobar :

[ Start a fake HTTP server on a spare port, e.g. 55555 ]

$ nc -v -l -p 55555 localhost    # Debian/Ubuntu
$ nc -v -l localhost 55555       # RedHat/Mandrake
listening on [any] 55555 ...

[ Start up your browser and connect to http://localhost:55555/foobar ]

connect to [127.0.0.1] from localhost [127.0.0.1] 40757
GET /foobar HTTP/1.1
Host: localhost:55555
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060216 Debian/1.7.12-1.1ubuntu2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-ca,en-us;q=0.9,en-gb;q=0.7,en;q=0.6,fr-ca;q=0.4,fr-fr;q=0.3,fr;q=0.1
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

[ At this point, you can type back a server reply to the browser ]

HTTP/1.1 200 this is my reply to the browser
Content-Type: text/plain

ab
cd
ef
gh
^C (interrupt)