-------------------------
Week 12 Notes for CST8165
-------------------------
-Ian! D. Allen - idallen@idallen.ca - www.idallen.com

Remember - knowing how to find out an answer is more important than
memorizing the answer.  Learn to fish!  RTFM!  (Read The Fine Manual)

Keep up on your readings (Course Outline: average 4 hours/week homework)

Review:
------
 - design issues for HTTP (Tim Berners-Lee documents)
 - structure of Requests and Responses
 - using netcat with HTTP clients and servers
 - three methods of session tracking
 - absolute vs. relative URIs
 - handling unrecognized HTTP header lines
 - status codes
 - persistent connections

HTTP - Hyper Text Transfer Protocol - continued
----

HTTP Methods - section 9 p.51
------------

 "Implementors should be aware that the software represents the user in
  their interactions over the Internet, and should be careful to allow
  the user to be aware of any actions they might take which may have an
  unexpected significance to themselves or others."  p.51

  - safe methods should not have side-effects p.51
    - GET and HEAD "SHOULD NOT" have any effect other than retrieval
    - the user did not request the side-effects, even if they happen

 "Methods can also have the property of "idempotence" in that (aside
  from error or expiration issues) the side-effects of N > 0 identical
  requests is the same as for a single request. The methods GET, HEAD,
  PUT and DELETE share this property. Also, the methods OPTIONS and
  TRACE SHOULD NOT have side effects, and so are inherently idempotent." p.51

  - idempotent methods may have side-effects, but doing them once or
    more than once should not make a difference
    - e.g. GET, HEAD, PUT, DELETE are idempotent (can be done repeatedly)
    - OPTIONS and TRACE never have side-effects, are idempotent

  - a *sequence* of methods may not be idempotent, even if each method is:
    - "A sequence is idempotent if a single execution of the entire sequence
      always yields a result that is not changed by a reexecution of all, or
      part, of that sequence."
    e.g.  "PUT, DELETE"  is not an idempotent sequence because partial
    execution (e.g. just PUT) doesn't give the same effect as "PUT, DELETE"

  - A sequence made up entirely of methods that never have side effects is
    idempotent, by definition

Q: Define HTTP "safe" and "idempotent" methods.  What do they mean?

Q: Give examples of HTTP "safe" methods.

Q: Give examples of HTTP "idempotent" methods.

Q: T/F A sequece of idempotent methods is always itself idempotent.

GET - section 9.3 p.53
---

 "The semantics of the GET method change to a "conditional GET" if the
  request message includes an If-Modified-Since, If-Unmodified-Since,
  If-Match, If-None-Match, or If-Range header field."

 "The semantics of the GET method change to a "partial GET" if the
  request message includes a Range header field."

Q: Explain what is a "conditional GET"?

Q: Explain what is a "partial GET"?

HEAD - section 9.4 p.54
----

 "The HEAD method is identical to GET except that the server MUST NOT
  return a message-body in the response."

Q: What is the difference between the message headers returned by GET and HEAD?


HTTP security
-------------

- RFC 2616 was updated by 2817 to add Transport Layer Security - TLS
  http://tools.ietf.org/html/rfc2817
  ftp://ftp.rfc-editor.org/in-notes/rfc2817.txt

  - 1997 meeting deprecated the practice of separate secure ports
     (having separate ports halves the number of usable ports!)

  "Parallel well-known port numbers have similarly been
   requested -- and in some cases, granted -- to distinguish between
   secured and unsecured use of other application protocols (e.g.  snews,
   ftps). This approach effectively halves the number of available well
   known ports.

   At the Washington DC IETF meeting in December 1997, the Applications
   Area Directors and the IESG reaffirmed that the practice of issuing
   parallel "secure" port numbers should be deprecated. The HTTP/1.1
   Upgrade mechanism can apply Transport Layer Security [6] to an open
   HTTP connection."

Q:  Why does the IETF deprecate the use of separate port
    numbers for secure versions of Internet protocols?

-----------------------------------------------------------------------------

Sending electronic mail: SMTP
-----------------------------
  http://tools.ietf.org/html/rfc2821

- Remember: The protocol and ports used to send email (SMTP) are completely
  separate from the ports and protocols used to fetch email (POP3, IMAP)!

SMTP - Simple Mail Transfer Protocol - RFC821 -> RFC2821
 - April 2001 - 79 pages on top of TCP (95 pages) on top of IP (45 pages)
 - a "PUSH" protocol - sender initiates  (HTTP is "PULL" protocol)
 - http://tools.ietf.org/html/rfc2821
   "This document is a self-contained specification of the basic protocol
    for the Internet electronic mail transport.  It consolidates, updates
    and clarifies, but doesn't add new or change existing functionality
    of the following: RFC822, DNS, RFC1123"
 - did not add to or change RFC821; dropped obsolete items

Q: T/F RFC2821 replaced RFC821 and added new SMTP functionality

Algonquin SMTP server
---------------------

Algonquin network restrictions prevent access to other SMTP servers from
on campus.  You must connect to the Algonquin SMTP server to send email.
In strict conformace with RFC 2821, the Algonquin SMTP server accepts
only CR+LF line ends - you have to type ^V^M^M (CTRL-V RETURN RETURN)
at the end of every line to make it work.

  $ nc -v outmail.algonquincollege.com smtp
  Connection to outmail.algonquincollege.com 25 port [tcp/smtp] succeeded!
  220 mail4.algonquincollege.com -- Server ESMTP (Sun Java System Messaging Server 6.2-7.02 (built Jun 13 2006))
  quit
  quit
  quit
  ...

  - connection hangs after the banner and it appears that it doesn't accept
    any further commands; because, the Sun server demands CR+LF line
    ends, not just LF line ends as given by "nc" (the Sun server is
    RFC-compliant; but, not very liberal in what it accepts!)
  - the fix is to enter ^V<CR><CR> (CTRL-V followed by pushing the
    RETURN key twice) at the end of each line:

  $ nc -v outmail.algonquincollege.com smtp
  Connection to outmail.algonquincollege.com 25 port [tcp/smtp] succeeded!
  220 mail4.algonquincollege.com -- Server ESMTP (Sun Java System Messaging Server 6.2-7.02 (built Jun 13 2006))
  quit^V^M
  221 2.3.0 Bye received. Goodbye.

Q: T/F, the Algonquin SMTP server violates the SMTP RFC by requiring CRLF
   on the end of each line.

* SMTP vs. Message Format
 - the SMTP *protocol* does not define the format of the *message*
   - the *message* delivered by the *protocol* has its own description:
     RFC822 -> RFC2822 "Internet Message Format"  (51 pages)
   - http://tools.ietf.org/html/rfc2822
 - the content of the message (including To/From message header lines) is
   independent of the To/From used in the SMTP protocol!

Q: T/F The SMTP protocol RFC defines the format and headers of an email message

* SMTP is a readable ASCII protocol on top of TCP - not binary!
 - you can run it using "nc" or telnet to port 25
 - but you can't do it here at Algonquin College!
   - port 25 blocked leaving the College (must use College servers)
   - College servers implement long wait times before answering
     - to discourage spam programs that don't wait as long
 - SMTP wait times are documented in
   http://tools.ietf.org/html/rfc1122
   "Timeouts are an essential feature of an SMTP
    implementation.  If the timeouts are too long (or worse,
    there are no timeouts), Internet communication failures or
    software bugs in receiver-SMTP programs can tie up SMTP
    processes indefinitely.  If the timeouts are too short,
    resources will be wasted with attempts that time out part
    way through message delivery."

* a sample SMTP session: see Notes file smtp_session.txt

    Note the difference between the SMTP RFC2821 "envelope" FROM/TO lines
    and the RFC2822 Message From:/To: lines.  The Message From:/To:
    lines need not be related to the SMTP RFC2821 envelope FROM/TO
    lines, and application writers are warned not to try to link them:
    (RFC 2821 Section 7.2)

* Extending the original SMTP protocol "HELO" with "EHLO"
 - orignal SMTP "HELO" greeting had no protocol version number
   - no way to negotiate options or features
  - RFC1425 (1993) replaced HELO with new EHLO greeting, allowing extensions
    - http://tools.ietf.org/html/rfc1425
  - awkward way to do protocol versioning
  - latest version of extensions:  http://tools.ietf.org/html/rfc2821
  - SMTP extensions (must be registered with IANA)

  ABNF:  ehlo-cmd ::= "EHLO" SP domain CR LF

Q: Is the EHLO case-sensitive?
Q: Is the domain optional?

 - HELO vs. EHLO:    http://tools.ietf.org/html/rfc2821
   "Contemporary SMTP implementations MUST support the basic extension
    mechanisms.  For instance, servers MUST support the EHLO command even
    if they do not implement any specific extensions and clients SHOULD
    preferentially utilize EHLO rather than HELO."
 - response to EHLO:  http://tools.ietf.org/html/rfc2821
     "Normally, the response to EHLO will be a multiline reply.  Each line
      of the response contains a keyword and, optionally, one or more
      parameters.  Following the normal syntax for multiline replies,
      these keyworks follow the code (250) and a hyphen for all but the
      last line, and the code and a space for the last line."
 - the response to EHLO is a list of options that indicates what optional
   features this email server offers

Q: What SHOULD an SMTP client do if the server refuses EHLO?
   (RFC2821 section 2.2.1 p.7, section 3.2 p. 16)

* Even clever people argue about the interpretation of the RFC documents:
  - http://www.imc.org/ietf-smtp/old-archive/msg01782.html
   "Certain individuals have the impression that the correct response to a
    RSET is ``close the connection'', and insist that RFC-821 backs them up.
    That seems to be an unusually bizarre interpretation, but by golly
    they insist that they Following The Standard (TM).  It quickly became
    clear that attempting to reason with such individuals was hopeless."
  - http://www.imc.org/ietf-smtp/old-archive/msg01783.html
   "having just reread the text in 821, that construing RSET as a synonym
    for QUIT must require real creativity (or trying to think with one's
    head in a normally-uncomfortable position),"

- SMTP continuation syntax: every line but the last of a multi-line
  response contains a "-" immediately following the response number, e.g.

        $ nc -v localhost smtp
        localhost.home.idallen.ca [127.0.0.1] 25 (smtp) open
        220 elm.home.idallen.ca ESMTP Postfix (idallen@idallen.ca)
        EHLO idallen.ca
        250-elm.home.idallen.ca
        250-PIPELINING
        250-SIZE 10240000
        250-VRFY
        250-ETRN
        250-STARTTLS
        250 8BITMIME

Q: How does a SMTP server indicate continuation lines in a reply?

* Reading RFC 2821 - the SMTP protocol
  http://tools.ietf.org/html/rfc2821

  The RFC is the final word on the protocol.

 - note allowed order of SMTP commands p.39
 - you cannot reject an address if the HELO/EHLO name doesn't match the IP
 - note the structure of SMTP reply codes p.40

Q: What is the meaning of the first digit of an SMTP response code?
   1yz   Positive Preliminary reply (not used in standard SMTP)
   2yz   Positive Completion reply
   3yz   Positive Intermediate reply
   4yz   Transient Negative Completion reply
   5yz   Permanent Negative Completion reply

Q: Do SMTP protocol lines end in CR+LF or just LF? (RFC2821 p.12)

Q: Do Internet Message lines end in CR+LF or just LF?
   (RFC2821 p.12, RFC2822 p.17-18)

Q: SMTP commands are given as double-quoted upper-case strings in the
   RFC 2821.  Does this mean they must be upper-case?

Q:  T/F The space following the three-digit SMTP respose code is mandatory
    and all clients MUST look for it, failing if it is not found.
    (RFC 2821 Section 4.2)

Q:  How must an SMTP client handle new response codes that it doesn't
    recognize?  (RFC 2821 Section 4.2, 4.3.2)

Q:  T/F SMTP clients can figure out how to proceed based on just the
    first digit of an SMTP reply code; they can usually ignore the rest.
    (RFC 2821 Section 4.2, 4.2.1, 4.3.2)

Q:  T/F You can queue up and send multiple commands to an SMTP server
    without waiting for any responses.  (RFC 2821 Section 4.3.1)

Looking at RFC 2821 Section 4.3.2, there are three codes that might be
returned by an SMTP server "if the corresponding unusual circumstances
are encountered".  Clients must be prepared to see these codes in response
to any SMTP request.

Q:  T/F SMTP clients only need to handle the fixed set of requests
    listed as responses in the RFC document.

Q:  Looking at RFC 2821 Section 4.5.2, how must clients handle the
    sending of email message lines that start with a period?

Q:  What is the maximum length of an email address (local-part plus
    domain), as passed through the SMTP protocol?  (RFC 2821 Section 4.5.3.1)

Q:  How long may an SMTP server delay before issuing the initial 220
    Message greeting?  (RFC 2821 Section 4.5.3.2)

Q:  Based on experience, what is the suggested policy for retrying failed
    attempts at sending a message?  (RFC 2821 Section 4.5.4.1)

Q:  Should programs attempt to relate the MAIL and RCPT (envelope)
    email addresses with the addresses (that may be) present in the
    headers of the message body?  (RFC 2821 Section 7.2)

http://teaching.idallen.com/cst8165/07w/notes/smtp_session.txt

Review of SMTP:
 - http://tools.ietf.org/html/rfc2821
 - Sample SMTP session (long and short) in Notes: smtp_session.txt
 - SMTP controls the "envelope" TO/FROM, not the message To:/From:
 - a text-based protocol, easily run using netcat.
 - 3-digit numeric response codes (know these five groups)
   - 1yz   Positive Preliminary reply (not used in standard SMTP)
   - 2yz   Positive Completion reply
   - 3yz   Positive Intermediate reply
   - 4yz   Transient Negative Completion reply
   - 5yz   Permanent Negative Completion reply

Q: Name the five main categories of SMTP server responses

Q:  T/F SMTP clients can figure out how to proceed based on just the
    first digit of an SMTP reply code; they can usually ignore the rest.
    (RFC 2821 Section 4.2, 4.2.1, 4.3.2)

SMTP MX records
---------------

How does a mail client know to which SMTP server to connect when sending
mail to a userid at some domain?   It looks up the domain MX records in
the DNS.

An SMTP client queries the DNS for a domain to obtain "MX" (mail
exchange) records that tell which machines accept SMTP mail for the domain:

    $ host -t mx algonquincollege.com
    algonquincollege.com mail is handled by 30 mailgate10.algonquincollege.com.
    algonquincollege.com mail is handled by 20 mailgate11.algonquincollege.com.

    $ host hotmail.com
    hotmail.com has address 64.4.32.7
    hotmail.com has address 64.4.33.7
    hotmail.com mail is handled by 5 mx2.hotmail.com.
    hotmail.com mail is handled by 5 mx3.hotmail.com.
    hotmail.com mail is handled by 5 mx4.hotmail.com.
    hotmail.com mail is handled by 5 mx1.hotmail.com.

    $ host idallen.ca
    idallen.ca has address 72.18.159.15
    idallen.ca mail is handled by 0 idallen.ca.

Q: How does an SMTP mailer know which computer to contact when sending
    mail to someone@domain.ca ?

* SMTP Walk-Through (old RFC 821 version) with comments by Dan Bernstein
  http://cr.yp.to/smtp.html
  - comments based on original RFC 821 not RFC 2821 (but often relevant)

  RFC2822 - message format - http://cr.yp.to/immhf.html
   - "If you're a new implementor, you'll be shocked at how badly 822
      was designed."

  - RFC2821 standards process "incompetence" by editor Klensin
    http://cr.yp.to/smtp/klensin.html
     - group concensus about HELO/EHLO didn't make the final draft! 
     - "What an incredible display of incompetence!"

Q: T/F RFC standards development has been a very organized process.

-----------------------------------------------------------------------------

Coding an HTTP server (Java)
----------------------------
HTTP RFC: http://tools.ietf.org/html/rfc2616

Testing tools:
    http://teaching.idallen.com/cst8165/07f/notes/autotest_http.sh.txt
    http://teaching.idallen.com/cst8165/07f/notes/sample_http_test_out.txt

W3C Java server (HTTP 1.1): Jigsaw
    http://www.w3.org/Jigsaw/

A working Java HTTP server with basic functionality (in 145 lines)
is available here:

  http://www.brics.dk/ixwt/examples/FileServer.java

  - this version does not adhere to the HTTP RFC in many respects
  - needs comments on functionality (not on how Java works)
  - has many "public" items that should be made private
  - may be missing things such as closing opened files...

  (Older version:  http://www.brics.dk/~amoeller/WWW/javaweb/index.html )

An overview of TCP, HTTP and servers using Java:
  http://www.brics.dk/ixwt/http.pdf

Sun Guides/Tutorials on Java networking (mostly client side):
  http://java.sun.com/j2se/1.5.0/docs/guide/net/overview/overview.html
  http://java.sun.com/docs/books/tutorial/networking/index.html
  http://java.sun.com/docs/books/tutorial/networking/urls/index.html

java.net references:
  http://java.sun.com/j2se/1.5.0/docs/api/java/net/package-summary.html

java.net intro
  http://www.brics.dk/~amoeller/WWW/javaweb/javanet.html

Java 5.0 (also known as 1.5) package documentation:
  http://java.sun.com/j2se/1.5.0/docs/
  http://java.sun.com/j2se/1.5.0/docs/api/
    - java.io.File, java.lang.String, etc.

Java Notes (from a non-Java programmer)
----------

* On returning a pair of strings from a function

  I suggested that your HTTP server error function take two input strings.
  The first string is the Status Code and Reason Phrase from the HTTP RFC.
  The second string is text to put into the Message Body of the Response,
  giving more detail on the error, e.g.:

     "404 Not Found"
     "The Request /nosuchfile.html was not found on this server."

* How to return a pair of strings from a function in Java:

    public class IanStrings {
        private String[] foo() {
            return new String[] { "string one", "string two" };
        }
        public static void main(String[] args) {
            IanStrings istr = new IanStrings();
            String[] result = istr.foo();
            if (result != null) {
                System.out.println(result[0] + " and " + result[1]);
            }
        }
    }

* On setting and using the setSoTimeout method

  - the action of using the method to set a time-out may raise a socket
    I/O exception, at the time you set the time-out (you need a try/catch)
  - later, when the timer triggers, it will raise the SocketTimeoutException
  - the above are different exceptions and will occur in different
    places in your program (and you need different try/catch for them)
  - to set the time-out, you need to know exactly where your HTTP server
    blocks waiting for input (which is the same place as all your
    previous servers)

-------------------------------------------------------------------------

Eclipse IDE demo (in the T127 Lab - Fall 2007)
----------------------------------------------
 - see also the NetBeans IDE from Sun

Warning: Eclipse will need about 10MB of file space for each workspace!

You can use your "N" drive to store unused files:

    $ share //algshare/home/
    share: Attaching smb file system //algshare/home at /tmp/smb-abcd0001.
    Password:
    Spawning /bin/bash. Exit shell to unmount Samba share.
    $ cp -a workspace-old /tmp/smb-abcd0001      # use your userid not abcd0001
    $ rm -rf workspace-old
    $ exit
    Unmounting /tmp/smb-abcd0001.

Most of the actions below have keyboard shortcuts that are much faster
than navigating menus.

Preparing to run Eclipse and starting Eclipse
.. If you have old .eclipse or workspace directories from a previous
   version of Eclipse (old version of Java), remove or rename them.
.. Start eclipse (e.g. from the command line or a menu)
.. Select a location for your workspace (e.g. accept the default)
.. Close the "Welcome" tab (using the X box)

Creating a new Project
.. Select:  File | New | Project
.. Select a wizard: Java | Java Project (Next)
.. enter Project name: PigLatinHTTP
.. select link at bottom: Configure compliance
.. select Compiler compliance level: 6.0
.. Apply 
.. Yes (full rebuild)
.. OK
.. In the New Java Project dialog, make sure the "Configure
   compliance" warning is gone
.. Finish
.. Open Associated Perspective? Yes

Importing your FileServer.java file
.. File | Import
.. Select import source: General | File System (Next)
.. Browse to the directory containing your FileServer.java file
   - you are selecting a *directory* here, not a *file*
   - enter "." to use your home directory
.. In the directory listing, select FileServer.java
.. Finish

Opening the imported FileServer.java file in the editor
.. In the Package tab, use the drop arrow to open PigLatinHTTP
.. Use the drop arrow to open (default package)
.. Right Click on FileServer.java and select Open
   - or double-click on the source file name
   - the FileServer.java tab should open with the source visible
   - make sure there are no error tags in the left margin of the code

Running your project for the first time (setting arguments)
.. Right click on the source code and select Run As | Java Application
   - you should see a "Usage:" message in the console window at the
     bottom of the screen (missing arguments)
.. From the top menu bar select Run | Run
.. Select the (x)= Arguments tab
.. In "Program arguments" enter: 55555 /tmp
.. select Apply
.. select Run
   - you should see a successful start-up message in the console window
     "FileServer accepting connections on port 55555"
.. Push the red square to kill the application.
   See "<terminated>" in the console window

Re-running your project
.. select the green Arrow in the top menu bar
    See the console output window at the bottom of the screen.
.. Push the red square to kill the application.
    See "<terminated>" in the console window

Adding more files to your project
.. Use the Import facility to get more source files
.. If the files are imported to the same project/directory, their classes
   will be available to your main program; you only need to use them

Tips:
- in the source code, hold your cursor over any word to get help on that word
  - use F2 to lock focus on the help and allow scrolling
- right-click on the source and select 
  Source | Format

-------------------------------------------------------------------------

Automated Testing - use it right from the start
-----------------

I've provided a script that will do automated testing of your HTTP server,
and I've written a few simple automated tests.  You must use this script
to test your server, and you must organize the script and add your own
tests to the script to test things that I haven't.  No marks are
awarded for using my random tests without modification.

Don't be limited by the categories or tests I've coded in the script -
my list of tests is incomplete and in a random order.  Rewrite the test
suite to suit yourself.  Add more tests to the suite and organize and
renumber the tests that are there into logical categories.

If you start immediately using the automated testing script to test your
server, you'll save time over doing manual testing and then having to
repeat all your tests for handing in.

Some programming disciplines have you write the test suite first, then
write the code to pass all the tests.  If a test doesn't exist for a
function, the function is not considered implemented (because it can't
be tested).