------------------------- Week 13 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Keep up on your readings (Course Outline: average 4 hours/week homework) Review: ------ - HTTP methods - idempotent and "safe" methods - conditional and partial GET methods - secure HTTP and the end of the use of separate ports - SMTP - the issue with SMTP on campus - using the Algonquin SMTP server - envelope addresses vs. message addresses - extending the original protocol - SMTP continuation lines - SMTP response codes - MX records - coding an HTTP server in Java - Java notes - using Eclipse ------------------------------------------------------------------------- New port access to wt127-32: Access to most ports in the Linux Lab has been disabled. For the single machine wt127-32 the ports 49152 to 49251 have been modified to permit access (on host wt127-32 only). You can run servers on these ports and access the servers from other places on campus, or via the VPN. ------------------------------------------------------------------------- see Notes: Mail Systems Terminology - mail_systems_terms.txt ------------------------------------------------------------------------- Protocols - Reading Mail - Post Office Protocol (POP) ----------------------------------------------------- http://tools.ietf.org/html/rfc1939 (23 pages) http://tools.ietf.org/html/rfc1957 (1 page observation) http://tools.ietf.org/html/rfc2449 (CAPA extensions) - note the "Errata" link - version 3: RFC 1081 -> 1225 -> 1460 -> 1725 -> 1939 updated by RFC 1957 (one page observation RTFM!) and 2449 (extensions) - specified to use TCP port 110 (Section 3) - POP is supposed to stay *SIMPLE* (use IMAP for everything else: Section 1) - Section 10 example: http://tools.ietf.org/html/rfc1939#page-19 http://tools.ietf.org/html/rfc2449 "19 pages - CAPA extension" - on extending POP3 (RFC 2449 intro and section 7): "This extension to the POP3 protocol is to be used by a server to express policy descisions taken by the server administrator. It is not an endorsement of implementations of further POP3 extensions generally. It is the general view that the POP3 protocol should stay simple, and for the simple purpose of downloading email from a mail server. If more complicated operations are needed, the IMAP protocol [RFC 2060] should be used. Future extensions to POP3 are in general discouraged, as POP3's usefulness lies in its simplicity. POP3 is intended as a download- and-delete protocol; mail access capabilities are available in IMAP [IMAP4]. Extensions which provide support for additional mailboxes, allow uploading of messages to the server, or which deviate from POP's download-and-delete model are strongly discouraged and unlikely to be permitted on the IETF standards track. Clients MUST NOT require the presence of any extension for basic functionality, with the exception of the authentication commands" Q: Why are extensions to POP3 discouraged? RFC Section 3 - Basic Operation - eight case-insensitive 3-4 character command keywords (section 3) - traditional CRLF line terminators - single space separators - arguments only up to 40 characters (!) - very short lines - only two status indicators: +OK and -ERR (upper case) - no way to distinguish between temporary and permanent failure - no way to distinguish "not now" from "not implemented" - multi-line responses terminated by a single period on a line - leading periods are doubled and then must be removed (like SMTP) - called "byte-stuffing" or "dot-stuffing" (Section 3 page 3) - a state-oriented protocol AUTHORIZATION -> TRANSACTION -> UPDATE - must authenticate before issuing transactions - update happens *after* the client disconnects - MUST not time out before 10 minutes (section 3 page 4) - a time-out does not trigger an UPDATE - throws away updates Q: T/F, unlike most Internet protocols, POP3 only requires LF on line ends. Q: T/F, the POP protocol has different exit codes for temporary and permanent failures. Q: How does the POP protocol handle multi-line server responses (e.g. when fetching a message)? Q: What is meant by "dot-stuffing" or "byte-stuffing"? Q: Name and describe what happens in each of the three states of a POP3 connection. What triggers the entry into each state? Q: T/F, if a POP3 client drops the connection, the server skips the UPDATE phase. Authorization/Authentication State (Section 4 page 4) - each AUTHORIZATION method is optional; but, you must use at least one (!) - RFC defines cleartext USER and PASS or APOP methods - RFC says "there is no single authentication mechanism that is required of all POP3 servers" (!) but Section 9 lists USER and PASS as "Minimal POP3 Commands", implying they are required - APOP uses md5 and a shared secret - see p.16 - you can calculate this cipher in Linux via: $ echo -n '<1896.697170952@dbc.mtview.ca.us>tanstaaf' | md5sum c4c9334bac560ecc979e58001b3e22fb - - neither USER/PASS nor APOP encrypt the full connection... Q: T/F, the USER and PASS POP commands set up an encrypted connection. http://tools.ietf.org/html/rfc1734 - POP3 AUTH command "the client may request authentication types in decreasing order of preference, with the USER/PASS or APOP command as a last resort. (p.2) "A protection mechanism provides integrity and privacy protection to the protocol session. If a protection mechanism is negotiated, * it is applied to all subsequent data sent over the connection. The protection mechanism takes effect immediately following the CRLF that concludes the authentication exchange for the client, and the CRLF of the positive response for the server. Once the protection mechanism is in effect, the stream of command and response octets is processed into buffers of ciphertext. Each buffer is transferred over the connection as a stream of octets prepended with a four octet field in network byte order that represents the length of the following data. (p.2) - QUIT is also allowed in Authorization State (Section 4 p.5) Q: How does POP3 "protection" affect data transfer between client and server? SASL: Simple Authentication and Security Layer - usable via the CAPA extension http://tools.ietf.org/html/rfc2449 (19 pages) - see also: SASL use in SMTP http://tools.ietf.org/html/rfc2554 (11 pages) Authorization State - all authorization methods are optional; but, one must be supported Transaction State - Must handle: STAT, LIST, RETR, DELE, NOOP, RSET, QUIT Update State (can only be entered from Transaction State) - entered *only* via QUIT, never by hangup or disconnect - no commands Section 8: Scaling and Operational Considerations - people using POP stores as permanent message archives "When these facilities are used in this way by casual users, there has been a tendency for already-read messages to accumulate on the server without bound. This is clearly an undesirable behavior pattern from the standpoint of the server operator. This situation is aggravated by the fact that the limited capabilities of the POP3 do not permit efficient handling of maildrops which have hundreds or thousands of messages. Q: T/F, POPmail scales well to handle hundreds or thousands of messages. Section 11: Message Format "It is important to note that the octet count for a message on the server host may differ from the octet count assigned to that message due to local conventions for designating end-of-line. - the size of the message in the file system may not match the size transmitted over the wire (especially for Unix/Linux systems) Q: Give the minimal set of POP3 commands needed to retrieve and delete one message on a POP3 server. ============================================================================= Protocols - Reading Mail - Internet Message Access Protocol (IMAP) ------------------------------------------------------------------ http://tools.ietf.org/html/rfc3501 (108 pages) - RFC 1730 -> 2060 -> 3501 updated by RFC 4466 (collected extensions) updated by RFC 4468 (CATENATE extension) updated by RFC 4551 (conditional STORE, etc.) "The Internet Message Access Protocol, Version 4rev1 (IMAP4rev1) allows a client to access and manipulate electronic mail messages on a server. IMAP4rev1 permits manipulation of mailboxes (remote message folders) in a way that is functionally equivalent to local folders. IMAP4rev1 also provides the capability for an offline client to resynchronize with the server." - requires any reliable data stream, e.g. TCP (TCP port 143) - too many pages to read! Q: T/F, both POPmail and IMAP permit remote folders. Q: Why are most advances in reading email done through changes to IMAP rather than changes to POP3? ============================================================================= Internet Mail Consortium - extensive email archives by topic http://www.imc.org/ Current Draft Protocols - Stopping SPAM --------------------------------------- Overview: http://mipassoc.org/csv/CSV-Intro-03dc.pdf E-mail Authentication http://en.wikipedia.org/wiki/E-mail_authentication "Ensuring a valid identity on an e-mail has become a vital first step in stopping spam, forgery, fraud, and even more serious crimes. An essential second step will be ensuring the entity has a good reputation. Unfortunately, the Simple Mail Transfer Protocol (SMTP) that handles most e-mail today was designed in an era when users of the Internet were mostly honest techies who expected others to be equally honest. This article will explain how e-mail identities are forged and the steps that are being taken now to prevent it. "Limiting Unsolicited Bulk Email (UBE)" http://www.imc.org/imc-spam/ "IMC's members have expressed a strong interest in helping to come up with solutions to the problem of unsolicitied bulk email (UBE), better known as "spam". The use and abuse of UBE is spreading rapidly, and many Internet users are complaining loudly about the very negative effects it has on them. Anti-Spam Recommendations for SMTP MTAs http://tools.ietf.org/html/rfc2505 - footnote mentions the Monty Python origin of the term "spam" - done at SMTP level: "Our basic assumption is that refuse/accept is handled at the SMTP layer and that an MTA that decides to refuse a message should do so while still in the SMTP dialogue. First, this means that we do not have to store a copy of a message we later decide to refuse and second, our responsibility for that message is low or none - since we have not yet read it in, we leave it to the sender to handle the error. Q: Give two reasons why refusing spam during the SMTP dialog (refusing to accept the email) is a Good Thing. - suggests using 4xx temporary fail codes; however: "However, 4xx Temporary Errors may have unexpected interaction with MX-records. If the receiving domain has several MX records and the lowest preference MX-host refuses to receive mail with a "451" Response Code, the sending host may choose to - and often will - use the next host on the MX list. [...] Our intent was to make the offending mail stay at the offending sender's host and fill up his mqueue disk, but it all ended up at our friend, the next lowest preference MX-host. Q: What is a major drawback to refusing spam using SMTP Temporary Errors? ------------------------------------------------------------------------- Linux Lab work (only works with on-campus/VPN access to 10.50.254.148): http://tools.ietf.org/html/rfc1939 (23 pages) 1. Send email to abcd0001@localhost.localdomain via SMTP server 10.50.254.148 where abcd0001 is replaced by your Algonquin student userid. - this SMTP server is liberal in accepting LF line ends! - you can make up any envelope From address you like - you can make up any message To/From addresses you like - you can also send email this way to your classmates (be polite) See Notes: smtp_session.txt * $ nc -v 10.50.254.148 25 Connection to 10.50.254.148 25 port [tcp/smtp] succeeded! 220 idallen-alinux ESMTP Postfix (Ubuntu) * EHLO ... see the sample session in smtp_session.txt ... ... etc ... * QUIT 221 Bye $ 2. Fetch and delete the email using "nc" to the POP3 TCP port. See RFC Section 10: Example POP3 Session - this POP3 server is liberal in accepting LF line ends! - login with your Algonquin userid using USER and PASS - your password is the letter C followed by the last 7 digits of your Algonquin student number * $ nc -v 10.50.254.148 110 Connection to 10.50.254.148 110 port [tcp/pop3] succeeded! +OK Dovecot ready. * USER abcd0001 * PASS C1234567 +OK Logged in. ... etc ... * QUIT +OK Logging out. ------------------------------------------------------------------------- Linux Lab 6 Work - HTTP server testing Testing - black box vs. white box, "behavioral" vs. "structural" ------- - I don't have time to read and test all your code; you have to do it Looking at the FileServer white-box style: http://www.brics.dk/ixwt/examples/FileServer.java - what tests exercise every line of code, especially each of the exceptions? Automated Testing - use it right from the start ----------------- I've provided a script that will do automated testing of your HTTP server, and I've written a few simple automated tests. You must use this script to test your server, and you must organize the script and add your own tests to the script to test things that I haven't. No marks are awarded for using my random tests without modification. Don't be limited by the categories or tests I've coded in the script - my list of tests is incomplete and in a random order. Rewrite the test suite to suit yourself. Add more tests to the suite and organize and renumber the tests that are there into logical categories. If you start immediately using the automated testing script to test your server, you'll save time over doing manual testing and then having to repeat all your tests for handing in. Some programming disciplines have you write the test suite first, then write the code to pass all the tests. If a test doesn't exist for a function, the function is not considered implemented (because it can't be tested).