-------------------------
Week 12 Notes for CST8165
-------------------------
-Ian! D. Allen - idallen@idallen.ca

Remember - knowing how to find out an answer is more important than
memorizing the answer.  Learn to fish!  RTFM!  (Read The Fine Manual)

-------------------
INDEX to this file:
 - Java is now completely open source via GPL version 2
 - Review of midterm test answers
 - Changes to Lab 5 specification (require Last-Modified header)
 - Read the comment style example in programming_style.txt
 - Testing methods
 - Sniffing Browser HTTP Requests using nc instead of ethereal
 - RFC tools by IETF
 - Fetching a raw web page: wget
 - Mail Systems Terminology
 - Mail Transport History
 - Protocols - Post Office Protocol (to be continued...)
-------------------

News: Sun open source GPLs Java on Sunday Nov 12, 2006 (yesterday)
  http://java.sun.com/
  http://news.google.ca/nwshp?ie=UTF-8&oe=UTF-8&hl=en&tab=wn&ncl=http://www.desktoplinux.com/news/NS3337915997.html
  http://community.java.net/javadesktop/
  http://news.bbc.co.uk/1/hi/technology/6144748.stm
  http://www.desktoplinux.com/news/NS3337915997.html
  http://www.eweek.com/article2/0,1895,2055770,00.asp?kc=EWNAVEMNL111306EOAD

Hand back midterm tests.
  - go over midterm test answers:
  http://teaching.idallen.com/cst8165/06f/notes/termtest2_answers.txt

Review
------

Q: How can I use nc to tell if an SMTP server is an "open relay"?

  - see last week's notes on "open relay"

  $ nc -v localhost smtp
  EHLO somedomain.ca
  MAIL FROM:<xxx>
  RCPT TO:<yyy>

  - connect to the server and see if you can use the server to send
    yourself an email (where "xxx" and "yyy" are both addresses that
    are foreign to the network on which the SMTP server resides)

SMTP MX records
---------------

Q: How does a mail client know to which SMTP server to connect when
    sending mail to a userid?

An SMTP client queries the DNS for a domain to obtain "MX" (mail
exchange) records that tell which machines accept mail for the domain:

    $ host hotmail.com
    hotmail.com has address 64.4.32.7
    hotmail.com has address 64.4.33.7
    hotmail.com mail is handled by 5 mx2.hotmail.com.
    hotmail.com mail is handled by 5 mx3.hotmail.com.
    hotmail.com mail is handled by 5 mx4.hotmail.com.
    hotmail.com mail is handled by 5 mx1.hotmail.com.

    $ host idallen.ca
    idallen.ca has address 72.18.159.15
    idallen.ca mail is handled by 0 idallen.ca.

Review changes to Lab 5 - Last-Modified: and Date:
-----------------------
  http://teaching.idallen.com/cst8165/06f/notes/lab05.txt
  http://teaching.idallen.com/cst8165/06f/notes/test_out3.txt

Comment style
-------------
  http://teaching.idallen.com/cst8165/06f/notes/programming_style.txt

  - see the pair of example programs at the end of the file
  - if you wish to use an alternate commenting and indenting style,
    please provide me with a link to it and we'll discuss it
  - I'm open to you using any popular real-world programming style;
    I don't want you inventing your *own* style

Testing - black box vs. white box, "behavioral" vs. "structural"
-------
 - I don't have time to read and test all your code; you have to do it

  http://www.scism.sbu.ac.uk/law/Section5/chap3/s5c3p23.html

   "White box testing is concerned only with testing the software
    product, it cannot guarantee that the complete specification
    has been implemented. Black box testing is concerned only with
    testing the specification, it cannot guarantee that all parts
    of the implementation have been tested. Thus black box testing
    is testing against the specification and will discover faults of
    omission, indicating that part of the specification has not been
    fulfilled. White box testing is testing against the implementation
    and will discover faults of commission, indicating that part of the
    implementation is faulty. In order to fully test a software product
    both black and white box testing are required."

  http://www.faqs.org/faqs/software-eng/testing-faq/section-13.html

   "One has to use a mixture of different methods so that they aren't
    hindered by the limitations of a particular one.  Some call this
    "gray-box" or "translucent-box" test design, but others wish we'd
    stop talking about boxes altogether."

Looking at Lab 5 white-box:
  http://www.brics.dk/ixwt/examples/FileServer.java

  - what tests exercise every line of code, especially each of the exceptions?

Q: What is the difference between white-box and black-box testing of a
    piece of code?  Give the advantages and disadvantages of each method,
    especially with regard to testing the specification.

Sniffing Browser HTTP Requests
------------------------------

To see what lines a browser sends to an HTTP server, you can use Ethereal
and trace a session; or, for a quick dump, just use netcat on a spare port
(e.g. 55555) and have the browser access http://localhost:55555/foobar :

[ Start a fake HTTP server on a spare port, e.g. 55555 : ]

$ nc -v -l -p 55555 localhost    # Debian/Ubuntu
$ nc -v -l localhost 55555       # RedHat/Mandrake
listening on [any] 55555 ...

[ Start up your browser and connect to http://localhost:55555/foobar : ]

connect to [127.0.0.1] from localhost [127.0.0.1] 40757
GET /foobar HTTP/1.1
Host: localhost:55555
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060216 Debian/1.7.12-1.1ubuntu2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-ca,en-us;q=0.9,en-gb;q=0.7,en;q=0.6,fr-ca;q=0.4,fr-fr;q=0.3,fr;q=0.1
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

[ At this point, you can type back a server reply to the browser : ]

HTTP/1.1 200 this is my reply to the browser
Content-Type: text/plain

ab
cd
ef
gh
^C (interrupt)

  - your browser will show the above text

Programming an HTTP client
--------------------------

java.net references:
  http://java.sun.com/j2se/1.5.0/docs/api/java/net/package-summary.html

    URI uri = new URI("http://java.sun.com/");
    URL url = uri.toURL();
    InputStream in = url.openStream();

HTTP via java class:
  http://java.sun.com/javase/6/docs/api/java/net/URLConnection.html
  http://java.sun.com/j2se/1.5.0/docs/api/java/net/HttpURLConnection.html

  - obsolete reference to rfc2068 should now be rfc2616

  - not everyone is happy with java.net.HttpURLConnection:
    http://www.oaklandsoftware.com/product_http/overview.html

  - an alternate class (2001):
    http://www.innovation.ch/java/HTTPClient/urlcon_vs_httpclient.html

Sun tutorial on URL reading/writing
  http://java.sun.com/docs/books/tutorial/networking/urls/index.html
  http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html

  - note the need to explicitly tell the URLConnection object that we
    want to write on the object using:  connection.setDoOutput(true)

RFC tools by IETF
-----------------
  http://tools.ietf.org/

  - html cross-linked pages
    - http://tools.ietf.org/html/
  - reading tools
    - Firefox plugin
  - difference tools
    - wdiff (word diff)
  - verification tools
    - ABNF to regexp converter

Fetching a raw web page: wget
-----------------------------
   wget http://idallen.com/
   wget -O output_file -S http://idallen.com/
   wget -O output_file --save-headers http://idallen.com/
   wget --header="Host: teaching.idallen.com" http://idallen.com/

Mail Systems Terminology
------------------------
 - common misconception: the place/protocol you use to fetch your email
   is the same place/protocol that you use to send your email
   - sending email uses SMTP
   - reading email uses POP3 or IMAP
   - they can be completely separate

   http://wiki.mutt.org/?MailConcept

Q: T/F, unlike POP3, SMTP can be used to both send and receive email.
Q: T/F, unlike SMTP, POP3 can be used to both receive and send email.

   - may be completely different servers
   - though note POP-before-SMTP (SMTP-after-POP) requires coupling:

   http://tools.ietf.org/html/rfc2476 (section 3.3)

   "Requiring a POP [POP3] authentication (from the same IP address)
    within some amount of time (for example, 20 minutes) prior to the
    start of a message submission session has also been used, but this
    does impose restrictions on clients as well as servers which may
    cause difficulties.  Specifically, the client must do a POP
    authentication before an SMTP submission session, and not all clients
    are capable and configured for this.  Also, the MSA must coordinate
    with the POP server, which may be difficult.  There is also a window
    during which an unauthorized user can submit messages and appear to
    be a prior authorized user."

Q: Describe briefly how POP-before-SMTP works to authenticate an SMTP session.

MSA - Mail Submission Agent
  http://tools.ietf.org/html/rfc2476
   "acts as a submission server to accept messages from MUAs, and either
    delivers them or acts as an SMTP client to relay them to an MTA."

  - enforce policy (no open relay)
  - enforce standards (no forged headers, etc.)
  - enforce filtering (SpamAssassin, etc.)
  - may modify messages (section 8 of RFC)

  http://en.wikipedia.org/wiki/List_of_mail_servers#Mail_filtering

Q: Briefly describe the function of a mail system MSA.

MTA - Mail Transfer Agent (mail server, mail exchange server)
   "A process which conforms to [SMTP-MTA], which acts as an SMTP server to
    accept messages from an MSA or another MTA, and either delivers them or
    acts as an SMTP client to relay them to another MTA."

  http://en.wikipedia.org/wiki/Mail_transfer_agent
   "It receives messages from another MTA (relaying), a mail
    submission agent (MSA) that itself got the mail from a mail user
    agent (MUA), or directly from an MUA, thus acting as an MSA
    itself. The MTA works behind the scenes, while the user usually
    interacts with the MUA.  The delivery of e-mail to a user's
    mailbox typically takes place via a mail delivery agent (MDA);
    many MTAs have basic MDA functionality built in, but a dedicated
    MDA like procmail can provide more sophistication."

 - transfers email between machines (other MTAs) via SMTP
 - Internet-facing, open ports: security issues
 - sendmail, postfix, qmail, exim

  http://en.wikipedia.org/wiki/List_of_mail_servers#SMTP

Q: Briefly describe the function of a mail system MTA.

MDA - Mail Delivery Agent
  http://en.wikipedia.org/wiki/Mail_delivery_agent
   "A Mail Delivery Agent (MDA) is software that accepts incoming e-mail
    messages and distributes them to recipients' individual mailboxes
    (if the destination account is on the local machine), or forwards
    back to an SMTP server (if the destination is on a remote server).
    A mail delivery agent is not necessarily a mail transfer agent (MTA),
    although on many systems the two functions are implemented by the
    same program."
 - Unix/Linux: /bin/mail, procmail 

Q: Briefly describe the function of a mail system MDA.

MRA/MAA - Mail Retrieval Agent / Mail Access Agent
  http://tools.ietf.org/html/rfc1939  - POP3 port 110
  http://tools.ietf.org/html/rfc3501  - IMAP-V4-R1 port 143

  - often built-in to mail clients (MUAs)
  - can be stand-alone
    - e.g. fetchmail gets the mail; MUA reads mail from file system

Q: Briefly describe the function of a mail system MRA/MAA.

MUA - Mail User Agent (email client)
  - the user's interface to the protocols
  - usually gives access to functionality of both MTA and MRA/MAA
    - but may not itself implement any protocols (may read/write file system)

  http://en.wikipedia.org/wiki/Mail_user_agent
   "An e-mail client, also called a Mail User Agent (MUA), is a computer
    program that is used to read and send e-mail.

    Originally, the MUA was intended to be a simple program to read the user's
    mail messages, which the mail delivery agent (MDA) in conjunction with
    the mail transfer agent (MTA) would transfer into a local mailbox.

    The most important mailbox formats are mbox and Maildir. These rather
    simple protocols for locally storing e-mails make import, export and
    backup of mailfolders quite easy.

    E-mails to be sent would be handed over to the MTA, perhaps via a
    mail submission agent, therefore an MUA would not have to provide any
    transport-related functions.

   *Since the various Microsoft Windows versions intended for home use never
   *provided an MTA, most modern MUAs have to support protocols like POP3
   *and Internet Message Access Protocol (IMAP) to communicate with a remote
   *MTA located at the e-mail providers machine."

 
 - user-visible email clients of all descriptions
 - mutt, "mail", "Mail", "mailx", pine, elm
 - KMail, Eudora, MS Outlook
 - web-browser email (Netscape Messenger,Mozilla,Thunderbird)
 - webmail, Horde, SqurrelMail

 http://en.wikipedia.org/wiki/List_of_mail_servers#POP.2FIMAP

Q: Briefly describe the function of a mail system MUA.

Mail server comparison
----------------------
  http://en.wikipedia.org/wiki/List_of_mail_servers
  - see comparison near bottom

  - PUSH protocols - sending email: MTA - SMTP
  - PULL protocols - reading email: MRA/MAA - POP3, IMAP

 Single-user PCs often don't run separate MTA or MRA/MAA programs.
 Your chose of mail reader (e.g. Pine, Elm, Outlook) itself PULLs your
 incoming email from a remote server (acting as an MRA/MAA) and then
 PUSHes your outgoing email to the remote server (acking as an MTA).

Q: What is the difference between a PUSH protocol and a PULL protocol?
Q: T/F, SMTP is a PUSH protocol.
Q: T/F, POP3 is a PUSH protocol.
Q: T/F, HTTP is a PUSH protocol.

A History of MTAs
-----------------

Q: Unix/Linux mail user agents didn't need to know how to talk to SMTP
   servers - you never had to configure your "outgoing mail" preferences.
   All the Windows MUAs need to be configured with a mail server.  Why?

I. Incoming - delivering your incoming email via SMTP:

* Sending email into Unix/Linux machines:
  Unix/Linux was traditionally multi-user and ran its own MTA
  (e.g. sendmail) that accepted incoming SMTP connections.  Remote systems
  could use SMTP to drop off your email with your local MTA (sendmail),
  and the MTA would hand the email to an MDA (/bin/mail, procmail)
  to put it in your mailbox in the local file system.  Your MUA
  (e.g. /usr/ucb/Mail) would read the mail from your inbox (no need
  for POP3 or IMAP in any MUA).  There are a few different conventions
  for inbox formats so that many different MUAs can read your email,
  all without knowing POP or IMAP.

  - sendmail (running as root!) has had many security patches
    - the first Morris Internet worm (Nov 1988) used sendmail security holes
    - http://en.wikipedia.org/wiki/Morris_worm

Q: Why don't many Unix MUAs need to know how to run POP or IMAP?

  Current single-user Unix/Linux PCs often have a local-only MTA
  that handles the sending and delivery of local on-machine email but
  doesn't accept SMTP from off-site.  (Best to keep ports closed on
  Internet-facing machines!)

  On recent single-user Unix/Linux workstations, the MUAs mimic their
  Windows counterparts and include MRA/MAA features.  Your chosen MUA
  (e.g. Elm, Pine, Mutt) is responsible for fetching your email via POP3
  or IMAP (this is an MRA/MAA function); or, you use an intermediate
  MRA/MAA program such as "fetchmail" and your MUA reads the mail out
  of the local file system after the MRA/MAA has put it there.

  - no Internet-facing MTA means fewer open ports and fewer attacks
    - don't run an Internet-facing MTA if you don't need it

* Sending email into MS Windows machines (or not):
  Windows had (has?) no MTA - you can't send an email to a Windows PC
  using SMTP.  Your personal MUA has to fetch the email itself via POP3
  or IMAP and keep a copy in the local file system.

 - no open ports for incoming email; no open port security issues

* Note that MUAs that implement POP/IMAP typically store the email in
  the local file system in a format that only that MUA can handle.
  The concept of a common inbox format usable by different MUAs was lost.

Q: T/F, the standards for inbox formats developed under Unix were adopted
   by MUAs on PCs, so that different MUAs can read the same inbox.

II. Outgoing - sending your outgoing email via SMTP:

* Unix/Linux machines have traditionally each had their own MTA (sendmail)
  that could directly deliver email on the Internet using MX record lookup.

  Every local MUA would put email into a directory where the MTA
  (sendmail) would eventually pick it up and transfer it, retrying as
  necessary.  No MUA needed to know how to do SMTP; only the MTA did that.

  You could optionally tell your machine's MTA not to send mail directly
  to its destination via SMTP over the Internet, but to use a remote
  "smart" MTA that could accept your outgoing email and figure out
  how to deliver it.  (You have to use such a "smart" host here at
  Algonquin; since, you cannot connect to any off-campus SMTP servers.)
  The MTA on your machine would use SMTP to drop off the queued mail at
  the smart host, and the smart host would do the MX record lookup and
  final SMTP delivery.

  Since the local Unix MTAs were separately scheduled programs, you could
  queue email from a MUA into the file system even when your machine was
  not connected to the Internet.  The MUA or local MTA would queue up
  your email in the file system until your MTA was finally able to make
  a connection to deliver it off-machine.  (In the days of modems, the
  Internet connection was often made late at night when rates were lower.)

Q: Why don't most Unix MUAs need to know SMTP?

  Current single-user Unix/Linux PCs now have MUAs that mimic their
  Windows counterparts - the MUAs ignore the file system and the local
  MTA and expect you to give the name of a remote "smart" MTA to which
  all email will be sent via SMTP for actual delivery.

  The Algonquin Linux lab has both types of mail systems:  Command-line
  email (e.g. the "mail" command) queues up mail for the local
  MTA (sendmail) to send.  (This is currently broken.)  GUI MTAs
  (e.g. Thunderbird, Mozilla) ignore the local file system and the local
  MTA and use a "smart" remote MTA (e.g.  outmail.algonquincollege.com)
  to deliver the mail.  (This supposedly still works.)

* MS Windows has no local MTA - no program exists whose job it is just
  to deliver queued email.  Each MUA has to know how to do its own
  SMTP connection and each MUA has to be configured (separately!) with
  the address of a smart MTA to which it connects.  MUAs on Windows
  machines all contain networking code to drop off email at some "smart"
  MTA that does the actual delivery.  There is no local MTA queue and
  much duplication of SMTP code in all the MUAs.

  On Windows, it is up to each MUA to deal with what happens if the
  message being composed can't be dropped off right away at the remote
  smart MTA.  Better MUAs will queue the email for later transmission.
  Poor MUAs will tell you that your mail can't be sent.

Q: Why do MUAs on Windows all need to know how to talk SMTP?

Protocols - Reading Mail - Post Office Protocol
-----------------------------------------------
  http://tools.ietf.org/html/rfc1939   (23 pages)
  http://tools.ietf.org/html/rfc2449   "CAPA extension"

  - version 3:  RFC 1081 -> 1225 -> 1460 -> 1725 -> 1939 
    updated by RFC 1957 (one page observation RTFM!) and 2449 (extensions)

  - on extending POP3 (RFC 2449 intro and section 7):
   "This extension to the POP3 protocol is to be used by a server to
    express policy descisions taken by the server administrator.  It is
    not an endorsement of implementations of further POP3 extensions
    generally.  It is the general view that the POP3 protocol should stay
    simple, and for the simple purpose of downloading email from a mail
    server.  If more complicated operations are needed, the IMAP protocol
    [RFC 2060] should be used.

    Future extensions to POP3 are in general discouraged, as POP3's
    usefulness lies in its simplicity.  POP3 is intended as a download-
    and-delete protocol; mail access capabilities are available in IMAP
    [IMAP4].  Extensions which provide support for additional mailboxes,
    allow uploading of messages to the server, or which deviate from
    POP's download-and-delete model are strongly discouraged and unlikely
    to be permitted on the IETF standards track.

    Clients MUST NOT require the presence of any extension for basic
    functionality, with the exception of the authentication commands"

Q: Why are extensions to POP3 discouraged?

 - case-insensitive 3-4 character command keywords (section 3)
 - traditional CRLF line terminators
 - single space separators
 - arguments only up to 40 characters (!) - very short lines
 - multi-line responses terminated by a single period on a line
   - leading periods are removed (like SMTP)
 - state-oriented protocol
   AUTHORIZATION -> TRANSACTION -> UPDATE
 - MUST not time out before 10 minutes (section 3)
   - a time-out does not trigger an UPDATE

Q: T/F, unlike most Internet protocols, POP3 only requires LF on line ends.

Q: Name and describe what happens in each of the three states of a
   POP3 connection.  What triggers the entry into each state?