-------------------------
Week 07 Notes for CST8165
-------------------------
-Ian! D. Allen - idallen@idallen.ca - www.idallen.com

Remember - knowing how to find out an answer is more important than
memorizing the answer.  Learn to fish!  RTFM!  (Read The Fine Manual)

Lab 4 is coming up.

Review:
 - writing test cases
 - GDB
 - symptoms of buffer overflow in C programs
 - IP routing
 - subnetting / supernetting and path aggregation / CIDR
 - DNS review
 - getting a machine on the net (minimal)

-----------------------------------------------------------------------------

Linux commands for DNS testing: host and dig
-------------------------------

    $ host idallen.ca
    idallen.ca has address 208.76.82.6
    idallen.ca mail is handled by 0 idallen.ca.

    $ host idallen.ca ns1.totalchoicehosting.com  # use this DNS server
    Using domain server:
    Name: ns1.totalchoicehosting.com
    Address: 64.246.50.105#53
    idallen.ca has address 208.76.82.6
    idallen.ca mail is handled by 0 idallen.ca.

    $ host -t txt idallen.ca
    idallen.ca descriptive text "v=spf1 ip4:66.11.175.96/30 ip4:66.11.173.142 a mx ptr a:cpu1808.adsl.bellglobal.com mx:idallen.org include:algonquincollege.com ?all"

    $ dig idallen.ca
    ; <<>> DiG 9.3.4 <<>> idallen.ca
    ;; global options:  printcmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31955
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

    ;; QUESTION SECTION:
    ;idallen.ca.			IN	A

    ;; ANSWER SECTION:
    idallen.ca.		14382	IN	A	208.76.82.6

    ;; AUTHORITY SECTION:
    idallen.ca.		70604	IN	NS	ns2.totalchoicehosting.com.
    idallen.ca.		70604	IN	NS	ns1.totalchoicehosting.com.

    ;; ADDITIONAL SECTION:
    ns2.totalchoicehosting.com. 170671 IN	A	65.254.32.122
    ns1.totalchoicehosting.com. 170671 IN	A	64.246.50.105

    ;; Query time: 67 msec
    ;; SERVER: 192.168.9.254#53(192.168.9.254)
    ;; WHEN: Tue Oct 16 04:21:19 2007
    ;; MSG SIZE  rcvd: 134

    $ dig @ns1.totalchoicehosting.com idallen.ca   # use this DNS server

----------------------------------------------------------------------------

Lab 4 - Coding the Looping Echo Server
--------------------------------------

Assignment: Modify the server to keep reading lines from the client,
until EOF, echoing those lines back to the client.

Review the PDL for any process that reads from one place and writes to
another.  The Server must be enhanced to keep reading from the client,
instead of stopping after just one line.

 * Server modifications:
 - the conversion of server2.c to a looping echo server:
 - write PDL for server2.c and the revised PDL for the converted server

Q: Give the PDL for a forking "echo" server that receives connections
   from clients and echoes the data received back to the client.

Coding the Looping Client
-------------------------

Assignment: Modify the client to keep reading lines from standard input
until EOF, sending the lines to the server, and to keep reading lines
from the server untile EOF, sending the lines to standard output.

Review the PDL for any process that reads from one place and writes
to another.  The Client must be enhanced to fork() into two separate
processes, each with one of these read/write loops.  One process reads
stdin and writes to the server; the other process reads from the server
and writes to standard output.

On EOF from the keyboard, the client shuts down just the writing half of
the server socket.  On EOF from the server, the client kills the other
process that is hung reading the keyboard.  See Notes: eof_handling.txt

Q: Give the PDL for a forking two-process "echo" client that sends
   keyboard input to a remote TCP/IP server and receives the echo
   of the input back and displays it on the screen.  Explain under
   what conditions one process needs to kill() the other process.
   Explain under what conditions one process needs to shut down the
   writing half of the socket that connects to the remote server.

 * Client modifications:
   Reference:  http://www.cs.rpi.edu/courses/sysprog/sockets/sock.html
   The client.c code is explained line-by-line in the above web page.
 - see the PDL for client.c and the revised PDL for the converted client
 - reorganize the command line argument parsing in front of the socket code
   - keep the parsing code separate from the server code
   - add a check for a valid port number that is within range
 - replace the deprecated bzero() and bcopy() functions
 - note the use of socket() and connect() in client.c
   - fix the error message to say what host and port failed
   - error messages must have four qualities (see programming_style.txt)
 - fix the prompt
 - detect errors and EOF when reading standard input
 - use shutdown() to half-close the socket when finished writing to the server
   See Notes: eof_handling.txt
 - more updates to do, see the upcoming assignments

References to Notes files (required reading):
-------------------------

    eof_handling.txt

Zero Tolerance for Buffer Overflows
-----------------------------------

http://teaching.idallen.com/cst8165/07w/notes/buffer_overflows.txt

Q: Why must Internet-facing programs avoid buffer overflows?
Q: What gcc flag turns on local symbols and line numbers for gdb and valgrind?
Q: What does "valgrind" do?
Q: Will valgrind find all buffer overflow errors?
Q: T/F Like in Java, when you have a buffer overflow in C language the
   program stops on the line causing the buffer overflow.

Aside: On choosing buffer sizes
-------------------------------

When deciding how much buffer space an Internet server should allow for
incoming request lines, you have to weigh memory use against functionality.

Here's an excerpt from an RFC extending the SMTP protocol, which
originally specified a maximum buffer of just 512 bytes:

http://tools.ietf.org/html/rfc1869
http://www.rfc-editor.org/rfc/rfc1869.txt

4.1.2.  Maximum command line length

   This specification extends the SMTP MAIL FROM and RCPT TO to allow
   additional parameters and parameter values.  It is possible that the
   MAIL FROM and RCPT TO lines that result will exceed the 512 character
   limit on command line length imposed by RFC 821.  This limit is
   hereby amended to only apply to command lines without any parameters.
   Each specification that defines new MAIL FROM or RCPT TO parameters
   must also specify maximum parameter value lengths for each parameter
   so that implementors of some set of extensions know how much buffer
   space must be allocated. The maximum command length that must be
   supported by an SMTP implementation with extensions is 512 plus the
   sum of all the maximum parameter lengths for all the extensions
   supported.

Reading from Network sockets
----------------------------

A program can detect when a write() or send() call doesn't write all
the bytes.  It can loop until all the bytes are sent, perhaps using a
cover function such as sendall() (see earlier notes for sendall()).

How about the reverse - reading from a network socket?

Simple read() or recv() calls on Internet-connected sockets are not
guaranteed to return data in the same quantities that remote applications
send it.  Just because a remote application writes 500 bytes into a socket
doesn't mean that your next read() will return those 500 bytes.  The data
may be incomplete, or that 500 bytes may be followed by another 500 bytes
(or less, or more) from a following write() on the same connection.

How does your program know that it has *read* all the bytes that the
remote client has sent?  Answer: Your program can't know, unless your
application provides some assistance.  You have two choices:

 A. The sending program has to send a count of the number of
    bytes (probably one of the very first things it sends), and the
    receiving program has to loop to make sure it reads that many bytes.

    With this solution, your application has to first "encapsulate"
    the data it sends with an application-specific header indicating
    how much data is being sent.

 B. Your application needs to send some trailing flag in the data
    stream indicating that unit of data is complete, and the receiving
    program has to loop to make sure it reads all the bytes until it
    sees the flag before processing.

    The flag has to be some byte or combination of bytes that never
    appears inside the data itself.  For a single-line chat server,
    you might pick a newline character.

Note that if you encapsulate and send a header containing a size field,
the receiving program may still have to loop a bit to get all the bytes
that make up the bytes of the number indicating the size, unless the
size is just a single byte!

In all of this, you also have to handle the error and EOF cases where
the data stream ends unexpectedly.

Q: T/F We can implement a readall() function, similar to our
    sendall() function, that simply loops until it has read all the bytes
    from a socket.  Then we know we have all the data sent by the client.

Q: What are the two main methods that allow an application to
    communicate that "all" the data has been sent over a socket?

Q: T/F When reading from a network socket, each read returns all the
   bytes sent by the remote client.

----------------------------------------------------------------------------

Defining Internet protocols - Request for Comment (RFC)
-------------------------------------------------------

Who sets down the standards for programs that communicate over the Internet?

IETF: Internet Engineering Task Force: "Rough concensus and running code"

http://radar.oreilly.com/archives/2007/01/what_actually_i.html

   "(FWIW, this happened to the IETF as well. It's "rough consensus and
    running code" policy was outflanked by big companies who just sent enough
    people to the meetings to affect the "rough consensus," and gradually,
    the IETF became driven more by politics than pure technical excellence
    in some areas.)"

Master RFC Index: http://tools.ietf.org/rfc/index

Q: What role do RFC documents play in the Internet?
   http://www.rfc-editor.org/
  "The  RFC (Request for Comments) series  contains technical and
   organizational documents about the Internet, including the technical
   specifications and policy documents produced by the Internet
   Engineering Task Force (IETF)."

RFC documents lay out Internet protocols, e.g. for SMTP:
    ftp://ftp.rfc-editor.org/in-notes/rfc2821.txt
    http://tools.ietf.org/html/rfc2821

- Some words have specific meanings, see:
    http://tools.ietf.org/html/rfc2119

   "The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in RFC 2119."

- RFC uses modified ABNF (Augmented Backus-Naur Form) to describe protocols:
    http://tools.ietf.org/html/rfc2234

  Certain rules are predefined as "CORE" rules, e.g. ALPHA, DIGIT, CHAR, etc.
  (from section 6.1 in rfc2234) so you don't have to do all the work.

Q: True/False - strings ("abc") in ABNF are case-sensitive  (RFC2234 p.4)

Q: give an ABNF rule <algemail> that defines an Algonquin student email
   address (abcd0001@algonquincollege.com), using these definitions:
   
       ALPHA = %x41-5A / %x61-7A  ; from CORE: A-Z upper and a-z lower
       DIGIT = %x30-39            ; from CORE: 0-9
       atsign = "@"
       period = "."

   Note that ab000001 and abc00001 are also valid userids (must be eight
   characters); but, a0000001, ab0001, and abcd000001 are not.

   Hint: start with this and fix it to handle the other two valid cases:
   
      algemail = 4ALPHA 4DIGIT atsign "algonquincollege" period "com"

RFC tools by IETF
-----------------
  http://tools.ietf.org/

  - html cross-linked pages
    - http://tools.ietf.org/html/
  - reading tools
    - Firefox plugin
  - difference tools
    - wdiff (word diff)
  - verification tools
    - ABNF to regexp converter

----------------------------------------------------------------------------

IP - Internet Protocol
----------------------
- http://tools.ietf.org/html/rfc791 (45 pages, Sep 1981)
- layer 2 of the 4 (or 5) layer stack:

    4 - application layer (programs)
    3 - TCP/UDP (transport/host-to-host layer)
    2 - IP (Internet/gateway layer), ICMP
    1 - Network/hardware layer (e.g. Ethernet, ARP, MAC addresses)
        (Layer 1 may be split into Physical/Network Access)

Internet four (or five) layer stack has IP at layer 2.
Below IP are one (or two) layers; above IP are another two layers.
- Figure 2: http://www.garykessler.net/library/tcpip.html#arch

Most everything on the Internet starts with just plain IP, "the Internet's
most basic protocol" (http://www.freesoft.org/CIE/Topics/79.htm):

* Internet layer - IP
 - IP has no port information; only IP addresses
   Figure 4: http://tools.ietf.org/html/rfc791#section-3.1
   Figure 4: http://www.garykessler.net/library/tcpip.html#IP
   - simple http://www.freesoft.org/CIE/Topics/79.htm
 - large amounts of data may be "fragmented" into multiple IP packets
   - the IP Identification field numbers the fragments for later re-assembly
   - this was later determined to be a Very Bad Idea
   - fragmentation is now considered harmful, difficult to get right, etc.
   - more on this later

Compare protocol complexities:
 - IP RFC791 is 45 pages
 - UDP RFC768 is 3 more pages on top of IP
 - TCP RFC793 is 95 more pages on top of IP
 - DCCP RFC4340 is 125 pages on top of IP (!!)

Q: True/False - the IP packet header contains port numbers

Q: Looking at RFC791 Figure 4, what is the longest total length theoretically
   possible for an IP packet?

Q: Looking at RFC791 Figure 4, what is the largest time-to-live value possible?