-------------------------
Week 13 Notes for CST8165
-------------------------
-Ian! D. Allen - idallen@idallen.ca - www.idallen.com

Remember - knowing how to find out an answer is more important than
memorizing the answer.  Learn to fish!  RTFM!  (Read The Fine Manual)

Keep up on your readings (Course Outline: average 4 hours/week homework)

Review:
------
 - HTTP methods
 - idempotent and "safe" methods
 - conditional and partial GET methods
 - secure HTTP and the end of the use of separate ports
 - SMTP
 - the issue with SMTP on campus
 - using the Algonquin SMTP server
 - envelope addresses vs. message addresses
 - extending the original protocol
 - SMTP continuation lines
 - SMTP response codes
 - MX records
 - coding an HTTP server in Java
 - Java notes
 - using Eclipse

-------------------------------------------------------------------------

New port access to wt127-32:

Access to most ports in the Linux Lab has been disabled.  For the single
machine wt127-32 the ports 49152 to 49251 have been modified to permit
access (on host wt127-32 only).  You can run servers on these ports and
access the servers from other places on campus, or via the VPN.

-------------------------------------------------------------------------

see Notes: Mail Systems Terminology - mail_systems_terms.txt

-------------------------------------------------------------------------

Protocols - Reading Mail - Post Office Protocol (POP)
-----------------------------------------------------
  http://tools.ietf.org/html/rfc1939   (23 pages)
  http://tools.ietf.org/html/rfc1957   (1 page observation)
  http://tools.ietf.org/html/rfc2449   (CAPA extensions)
   - note the "Errata" link
   - version 3:  RFC 1081 -> 1225 -> 1460 -> 1725 -> 1939 
     updated by RFC 1957 (one page observation RTFM!) and 2449 (extensions)
   - specified to use TCP port 110 (Section 3)
   - POP is supposed to stay *SIMPLE* (use IMAP for everything else: Section 1)
   - Section 10 example:  http://tools.ietf.org/html/rfc1939#page-19

  http://tools.ietf.org/html/rfc2449   "19 pages - CAPA extension"
  - on extending POP3 (RFC 2449 intro and section 7):
   "This extension to the POP3 protocol is to be used by a server to
    express policy descisions taken by the server administrator.  It is
    not an endorsement of implementations of further POP3 extensions
    generally.  It is the general view that the POP3 protocol should stay
    simple, and for the simple purpose of downloading email from a mail
    server.  If more complicated operations are needed, the IMAP protocol
    [RFC 2060] should be used.

    Future extensions to POP3 are in general discouraged, as POP3's
    usefulness lies in its simplicity.  POP3 is intended as a download-
    and-delete protocol; mail access capabilities are available in IMAP
    [IMAP4].  Extensions which provide support for additional mailboxes,
    allow uploading of messages to the server, or which deviate from
    POP's download-and-delete model are strongly discouraged and unlikely
    to be permitted on the IETF standards track.

    Clients MUST NOT require the presence of any extension for basic
    functionality, with the exception of the authentication commands"

Q: Why are extensions to POP3 discouraged?

RFC Section 3 - Basic Operation
 - eight case-insensitive 3-4 character command keywords (section 3)
 - traditional CRLF line terminators
 - single space separators
 - arguments only up to 40 characters (!) - very short lines
 - only two status indicators: +OK and -ERR (upper case)
   - no way to distinguish between temporary and permanent failure
   - no way to distinguish "not now" from "not implemented"
 - multi-line responses terminated by a single period on a line
   - leading periods are doubled and then must be removed (like SMTP)
   - called "byte-stuffing" or "dot-stuffing" (Section 3 page 3)
 - a state-oriented protocol
   AUTHORIZATION -> TRANSACTION -> UPDATE
   - must authenticate before issuing transactions
   - update happens *after* the client disconnects
 - MUST not time out before 10 minutes (section 3 page 4)
   - a time-out does not trigger an UPDATE - throws away updates

Q: T/F, unlike most Internet protocols, POP3 only requires LF on line ends.
Q: T/F, the POP protocol has different exit codes for temporary and
    permanent failures.
Q: How does the POP protocol handle multi-line server responses (e.g.
   when fetching a message)?
Q: What is meant by "dot-stuffing" or "byte-stuffing"?
Q: Name and describe what happens in each of the three states of a
   POP3 connection.  What triggers the entry into each state?
Q: T/F, if a POP3 client drops the connection, the server skips the
   UPDATE phase.

Authorization/Authentication State (Section 4 page 4)
 - each AUTHORIZATION method is optional; but, you must use at least one (!)
 - RFC defines cleartext USER and PASS or APOP methods
 - RFC says "there is no single authentication mechanism that is required
   of all POP3 servers" (!) but Section 9 lists USER and PASS as
   "Minimal POP3 Commands", implying they are required
 - APOP uses md5 and a shared secret
   - see p.16 - you can calculate this cipher in Linux via:
    $ echo -n '<1896.697170952@dbc.mtview.ca.us>tanstaaf' | md5sum
    c4c9334bac560ecc979e58001b3e22fb  -
 - neither USER/PASS nor APOP encrypt the full connection...

Q: T/F, the USER and PASS POP commands set up an encrypted connection.

  http://tools.ietf.org/html/rfc1734 - POP3 AUTH command
    "the client may request authentication types in decreasing order of
     preference, with the USER/PASS or APOP command as a last resort.  (p.2)

    "A protection mechanism provides integrity and privacy protection
     to the protocol session.  If a protection mechanism is negotiated,
  *  it is applied to all subsequent data sent over the connection.
     The protection mechanism takes effect immediately following the CRLF
     that concludes the authentication exchange for the client, and the
     CRLF of the positive response for the server.  Once the protection
     mechanism is in effect, the stream of command and response octets is
     processed into buffers of ciphertext.  Each buffer is transferred
     over the connection as a stream of octets prepended with a four
     octet field in network byte order that represents the length of
     the following data. (p.2)
 - QUIT is also allowed in Authorization State (Section 4 p.5)

Q: How does POP3 "protection" affect data transfer between client and server?

SASL: Simple Authentication and Security Layer
 - usable via the CAPA extension http://tools.ietf.org/html/rfc2449 (19 pages)
 - see also: SASL use in SMTP http://tools.ietf.org/html/rfc2554 (11 pages)

Authorization State
 - all authorization methods are optional; but, one must be supported

Transaction State
 - Must handle: STAT, LIST, RETR, DELE, NOOP, RSET, QUIT

Update State (can only be entered from Transaction State)
 - entered *only* via QUIT, never by hangup or disconnect
 - no commands

Section 8: Scaling and Operational Considerations
 - people using POP stores as permanent message archives
  "When these facilities are used in this way by casual users, there has
   been a tendency for already-read messages to accumulate on the server
   without bound.  This is clearly an undesirable behavior pattern from
   the standpoint of the server operator.  This situation is aggravated
   by the fact that the limited capabilities of the POP3 do not permit
   efficient handling of maildrops which have hundreds or thousands of
   messages.

Q: T/F, POPmail scales well to handle hundreds or thousands of messages.

Section 11: Message Format
  "It is important to note that the octet count for a message on
   the server host may differ from the octet count assigned to that
   message due to local conventions for designating end-of-line.
   - the size of the message in the file system may not match the size
     transmitted over the wire (especially for Unix/Linux systems)

Q: Give the minimal set of POP3 commands needed to retrieve and delete
   one message on a POP3 server.

=============================================================================

Protocols - Reading Mail - Internet Message Access Protocol (IMAP)
------------------------------------------------------------------
  http://tools.ietf.org/html/rfc3501   (108 pages)

  - RFC 1730 -> 2060 -> 3501
    updated by RFC 4466 (collected extensions)
    updated by RFC 4468 (CATENATE extension)
    updated by RFC 4551 (conditional STORE, etc.)

  "The Internet Message Access Protocol, Version 4rev1 (IMAP4rev1)
   allows a client to access and manipulate electronic mail messages on
   a server.  IMAP4rev1 permits manipulation of mailboxes (remote
   message folders) in a way that is functionally equivalent to local
   folders.  IMAP4rev1 also provides the capability for an offline
   client to resynchronize with the server."

 - requires any reliable data stream, e.g. TCP  (TCP port 143)

 - too many pages to read!

Q: T/F, both POPmail and IMAP permit remote folders.
Q: Why are most advances in reading email done through changes to IMAP rather
    than changes to POP3?

=============================================================================

Internet Mail Consortium - extensive email archives by topic
  http://www.imc.org/

Current Draft Protocols - Stopping SPAM
---------------------------------------
Overview:
  http://mipassoc.org/csv/CSV-Intro-03dc.pdf

E-mail Authentication
  http://en.wikipedia.org/wiki/E-mail_authentication
   "Ensuring a valid identity on an e-mail has become a vital first
    step in stopping spam, forgery, fraud, and even more serious
    crimes. An essential second step will be ensuring the entity has a
    good reputation. Unfortunately, the Simple Mail Transfer Protocol
    (SMTP) that handles most e-mail today was designed in an era when
    users of the Internet were mostly honest techies who expected
    others to be equally honest. This article will explain how e-mail
    identities are forged and the steps that are being taken now to
    prevent it. 

"Limiting Unsolicited Bulk Email (UBE)"
http://www.imc.org/imc-spam/
   "IMC's members have expressed a strong interest in helping to come
    up with solutions to the problem of unsolicitied bulk email (UBE),
    better known as "spam". The use and abuse of UBE is spreading
    rapidly, and many Internet users are complaining loudly about the
    very negative effects it has on them.

Anti-Spam Recommendations for SMTP MTAs
http://tools.ietf.org/html/rfc2505
 - footnote mentions the Monty Python origin of the term "spam"
 - done at SMTP level:
   "Our basic assumption is that refuse/accept is handled at the SMTP
    layer and that an MTA that decides to refuse a message should do so
    while still in the SMTP dialogue. First, this means that we do not
    have to store a copy of a message we later decide to refuse and
    second, our responsibility for that message is low or none - since we
    have not yet read it in, we leave it to the sender to handle the
    error.

Q: Give two reasons why refusing spam during the SMTP dialog (refusing
   to accept the email) is a Good Thing.

 - suggests using 4xx temporary fail codes; however:
  "However, 4xx Temporary Errors may have unexpected interaction with
   MX-records. If the receiving domain has several MX records and the
   lowest preference MX-host refuses to receive mail with a "451" Response
   Code, the sending host may choose to - and often will - use the next
   host on the MX list.  [...] Our intent was to make the offending
   mail stay at the offending sender's host and fill up his mqueue disk,
   but it all ended up at our friend, the next lowest preference MX-host.

Q: What is a major drawback to refusing spam using SMTP Temporary Errors?

-------------------------------------------------------------------------

Linux Lab work (only works with on-campus/VPN access to 10.50.254.148):
  http://tools.ietf.org/html/rfc1939   (23 pages)

1.  Send email to abcd0001@localhost.localdomain via SMTP server 10.50.254.148
    where abcd0001 is replaced by your Algonquin student userid.
    - this SMTP server is liberal in accepting LF line ends!
    - you can make up any envelope From address you like
    - you can make up any message To/From addresses you like
    - you can also send email this way to your classmates (be polite)
    See Notes: smtp_session.txt

  * $ nc -v 10.50.254.148 25
    Connection to 10.50.254.148 25 port [tcp/smtp] succeeded!
    220 idallen-alinux ESMTP Postfix (Ubuntu)
  * EHLO ... see the sample session in smtp_session.txt ...
    ... etc ...
  * QUIT
    221 Bye
    $

2.  Fetch and delete the email using "nc" to the POP3 TCP port.
    See RFC Section 10: Example POP3 Session
    - this POP3 server is liberal in accepting LF line ends!
    - login with your Algonquin userid using USER and PASS
    - your password is the letter C followed by the last 7 digits of
     your Algonquin student number

  * $ nc -v 10.50.254.148 110
    Connection to 10.50.254.148 110 port [tcp/pop3] succeeded!
    +OK Dovecot ready.
  * USER abcd0001
  * PASS C1234567
    +OK Logged in.
    ... etc ...
  * QUIT
    +OK Logging out.

-------------------------------------------------------------------------

Linux Lab 6 Work - HTTP server testing

Testing - black box vs. white box, "behavioral" vs. "structural"
-------
 - I don't have time to read and test all your code; you have to do it

Looking at the FileServer white-box style:
  http://www.brics.dk/ixwt/examples/FileServer.java

  - what tests exercise every line of code, especially each of the exceptions?

Automated Testing - use it right from the start
-----------------

I've provided a script that will do automated testing of your HTTP server,
and I've written a few simple automated tests.  You must use this script
to test your server, and you must organize the script and add your own
tests to the script to test things that I haven't.  No marks are
awarded for using my random tests without modification.

Don't be limited by the categories or tests I've coded in the script -
my list of tests is incomplete and in a random order.  Rewrite the test
suite to suit yourself.  Add more tests to the suite and organize and
renumber the tests that are there into logical categories.

If you start immediately using the automated testing script to test your
server, you'll save time over doing manual testing and then having to
repeat all your tests for handing in.

Some programming disciplines have you write the test suite first, then
write the code to pass all the tests.  If a test doesn't exist for a
function, the function is not considered implemented (because it can't
be tested).