-------------------------
Week 08 Notes for CST8165
-------------------------
-Ian! D. Allen - idallen@idallen.ca - www.idallen.com

Remember - knowing how to find out an answer is more important than
memorizing the answer.  Learn to fish!  RTFM!  (Read The Fine Manual)

Keep up on your readings (Course Outline: average 4 hours/week homework)

Review:
------
 - host and dig
 - looping echo server and client
 - reading from network sockets (how to get all the data?)
 - RFC, IETF
 - four (maybe five) layers  Application, Transport, Network, Physical

IP - Internet Protocol
----------------------
- http://tools.ietf.org/html/rfc791 (45 pages, Sep 1981)
- layer 2 of the 4 (or 5) layer stack:

    4 - application layer (programs)
    3 - TCP/UDP (transport/host-to-host layer)
    2 - IP (Internet/gateway layer), ICMP
    1 - Network/hardware layer (e.g. Ethernet, ARP, MAC addresses)
        (Layer 1 may be split into Physical/Network Access)

Internet four (or five) layer stack has IP at layer 2.
Below IP are one (or two) layers; above IP are another two layers.
- Figure 2: http://www.garykessler.net/library/tcpip.html#arch

At or near the bottom, below IP is the Network layer (e.g. Ethernet)
- ARP converts between Ethernet hardware (MAC) and IP addresses
  - ARP is part of both "layers" (IETF doesn't like the "layers" concept)
  http://www.garykessler.net/library/tcpip.html#ARP

Most everything on the Internet starts with just plain IP, "the Internet's
most basic protocol" (http://www.freesoft.org/CIE/Topics/79.htm):

* Internet layer - IP
 - IP has no port information; only IP addresses
   Figure 4: http://tools.ietf.org/html/rfc791#section-3.1
   Figure 4: http://www.garykessler.net/library/tcpip.html#IP
   - simple http://www.freesoft.org/CIE/Topics/79.htm
 - large amounts of data may be "fragmented" into multiple IP packets
   - the IP Identification field numbers the fragments for later re-assembly
   - this was later determined to be a Very Bad Idea
   - fragmentation is now considered harmful, difficult to get right, etc.
   - more on this later (below)

Compare protocol complexities:
 - IP RFC791 is 45 pages
 - TCP and UDP are on "top" of IP (means packets go *inside* IP packets)
 - UDP RFC768 is only 3 more pages on top of IP (unreliable)
 - TCP RFC793 is 95 more pages on top of IP (reliable)
 - DCCP RFC4340 is 125 pages on top of IP (!!)

Q: T/F packets get larger as they move down the protocol stack from
   Layer 4 (Application) down to the Physical media.

* Overloading the IP network:

  http://www.africonnect.com/tcpip_tut.htm
   "The IP protocol does not guarantee delivery, or that packets will
    arrive in the proper sequence. [...]
   "Rather than simply discarding all newly arriving packets, the routers
    are programmed discard packets in a random fashion to prevent
    buffer overflow. This is best implemented in a "fair" way so that
    the data stream having the largest volume suffers the largest number
    of dropped packets."

Q: True/False - the IP packet header contains port numbers

Q: Looking at RFC791 Figure 4, what is the longest total length theoretically
   possible for an IP packet?

Q: Looking at RFC791 Figure 4, what is the largest time-to-live value possible?

Q: What does ARP stand for and how is it used in Internet networking?

Q: What happens to packets when the Internet gets overloaded?
   How do routers recover from an overload?

ICMP - Internet Control Message Protocol
----------------------------------------
- same layer as IP (layer 2)

Ref: http://www.freesoft.org/CIE/Topics/81.htm

Q: Is the delivery of ICMP messages guaranteed?

Q: What is ICMP used for on the Internet (name two of four functions)?
   Announce network errors (e.g. unreachable), network congestion (quench)
   Announce time-outs (zero TTL)
   Troubleshooting (ICMP echo)

Q: What popular program uses ICMP echo packets?
   http://www.freesoft.org/CIE/Topics/53.htm

Q: How does traceroute use ICMP to map a packet route?
   http://www.freesoft.org/CIE/Topics/54.htm

Q: Traceroute is not reliable.  What can go wrong (describe two things)?
   http://www.freesoft.org/CIE/Topics/54.htm

Layer Three: TCP and UDP - port numbers
---------------------------------------

Just above the IP layer is the Transport layer (layer 3).  UDP and TCP add
"port" numbers to IP, for host-to-host communication.
- http://www.garykessler.net/library/tcpip.html#transport

Just above layer 3 Transport (TCP/UDP) is the Application layer (4)
  - this is the part where you get to write the application code
  - SMTP, HTTP, POP3, etc.

Two major Linux/Unix sockets types are UDP (SOCK_DGRAM) or TCP (SOCK_STREAM).
Both extend IP addressing with the concept of "ports".

Reference: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#twotypes

 - UDP is essentially raw IP plus port numbers; still unreliable
   See the RFC: http://tools.ietf.org/html/rfc768  (only 3 pages!)
   - used in DNS and TFTP
   - UDP is message-oriented - fixed size chunks, unreliable

 - TCP is like streaming UDP with reliable transmission added
   See the RFC: http://tools.ietf.org/html/rfc793 (85 pages!)
   - TCP is stream-oriented - arbitrary byte stream, reliable

http://www.tcpipguide.com/free/t_TCPDataHandlingandProcessingStreamsSegmentsandSequ.htm

Most UDP/TCP Port Numbers have to be Registered with IANA
 - IANA: Internet Assigned Numbers Authority
 - Master IANA List of ports: http://www.iana.org/assignments/port-numbers
 - ports are in three ranges: "Well Known", "Registered", "Dynamic/Private"
 - you SHOULD NOT use a "Well Known" or "Registered" port without first
   registering it with IANA.

Major service port numbers (often seen in trace output):
 http://www.tcpipguide.com/free/t_TCPCommonApplicationsandServerPortAssignments.htm
 - port numbers are given names in the Linux/Unix file /etc/services
 - see also the master list at http://www.iana.org/assignments/port-numbers

 * TCP      20 ftp-data
 * TCP      21 ftp (control)
   TCP      22 SSH
   TCP      23 telnet
 * TCP      25 SMTP (sending mail only)
 * UDP/TCP  53 domain (DNS)
   UDP      67-68 DHCP
 * TCP      80 HTTP (WWW)
 * TCP     110 POP3 (receiving mail only)
   TCP     113 ident (identifying incoming TCP connections)
   TCP     119 NNTP (Network News)
   UDP/TCP 123 NTP (Network Time)
   UDP/TCP 137-139 Microsoft netbios (SMB) (Samba)
   TCP     443 HTTPS (secure WWW)
   UDP/TCP 445 Microsoft-DS
   UDP/TCP 631 Internet Printing Protocol (IPP - CUPS)

The "*" protocols are the ones most important in this course.

On Unix/Linux, individual network servers/daemons (e.g. ssh, http) may
have individual start-up scripts, or they may run on demand out of the
master "inetd" or "xinetd" super-servers.

Socket Options for UDP/TCP/IP
-----------------------------

As an application programmer, what control does your application have
over the lower-level TCP/IP layers in Unix/Linux?
  - you can set options on the sockets you open that affect the TCP/IP stack
  - "man 7 socket"  setsockopt(2)  and   getsockopt(2)
    - SO_KEEPALIVE
    - SO_RCVTIMEO and SO_SNDTIMEO (useful in port scanning)
    - SO_BINDTODEVICE
    - SO_REUSEADDR   (you already used this in labs)
    - SO_DONTROUTE
    - SO_BROADCAST
    - SO_LINGER
    - SO_PRIORITY

The SO_SNDTIMEO can be used in a port scanner to reduce the amount of time
that the program waits for a reply from a blocked port (a port that issues
no ICMP "connection refused" error) or from a machine that does not exist.

Q: What function calls are available to C programmers to set options
    on sockets?  Give two examples of the kind of options you can set.

Q: The TCP/UDP header contains port numbers.  Why aren't the source and
   destination addresses also in the TCP/UDP header?

Q: Why is the UDP RFC 3 pages but the TCP RFC is 85 pages?

Q: What port numbers lie in the "Well Known" range?

Q: T/F your Internet application can use any port it wants outside of
   the "Well Known" range

Understanding UDP
-----------------
Ref: http://tools.ietf.org/html/rfc768  (only 3 pages!)
     http://www.freesoft.org/CIE/RFC/1122/72.htm 

"The User Datagram Protocol UDP [UDP:1] offers only a minimal
 transport service -- non-guaranteed datagram delivery -- and gives
 applications direct access to the datagram service of the IP layer.
 UDP is used by applications that do not require the level of service
 of TCP or that wish to use communications services (e.g., multicast
 or broadcast delivery) not available from TCP.

 UDP is almost a null protocol; the only services it provides over IP
 are checksumming of data and multiplexing by port number. Therefore,
 an application program running over UDP must deal directly with
 end-to-end communication problems that a connection-oriented protocol
 would have handled -- e.g., retransmission for reliable delivery,
 packetization and reassembly, flow control, congestion avoidance,
 etc., when these are required. The fairly complex coupling between IP
 and TCP will be mirrored in the coupling between UDP and many
 applications using UDP. "

- a very thin layer added inside an IP packet
- like raw IP, UDP is unreliable, no retransmission: "fire and forget"
- adds "ports" to IP and little else: any reliability or retransmission
  work has to be done by the application (as is done by TCP)
- recall that the TCP RFC is 85 pages; that's an indication of how hard
  it would be to make your application turn UDP into a reliable protocol!
- big user of UDP is basic DNS queries and replies
  (DNS zone transfers use TCP; everything else is UDP)

Q: What four fields are added to raw IP by a UDP packet header?

To ensure that a UDP packet arrives at the right destination, the checksum
in UDP includes a "pseudo-header" that includes some of the IP header
information.  See page 2 in:  http://tools.ietf.org/html/rfc768

A very brief history of the development of the pseudo-header, and how
the NSA messed things up by preventing encryption:
   http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html
   http://www.postel.org/pipermail/end2end-interest/2005-February/004617.html

Q: What is the purpose of the UDP or TCP "pseudo header"?

Understanding TCP
-----------------

Ref: http://tools.ietf.org/html/rfc793 (85 pages!)
     http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html
     http://www4.informatik.uni-erlangen.de/Projects/JX/Projects/TCP/tcpstate.html

 "TCP provides a connection oriented, reliable, byte stream service.
  The term connection-oriented means the two applications using TCP
  must establish a TCP connection with each other before they can
  exchange data. It is a full duplex protocol, meaning that each TCP
  connection supports a pair of byte streams, one flowing in each
  direction. TCP includes a flow-control mechanism for each of these
  byte streams that allows the receiver to limit how much data the
  sender can transmit. TCP also implements a congestion-control
  mechanism."

Q: Does TCP include flow-control and/or congestion control?

Q: Can a TCP connection be on one-way or must it always be two way?

Q: What purpose is the "pseudo header" used in calculating a checksum?
   http://tools.ietf.org/html/rfc793  page 16-17
   http://www.postel.org/pipermail/end2end-interest/2005-February/004617.html
   http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html

Handshaking: 3 way open, 4 way close including SYN, ACK, FIN etc
  - http://www.garykessler.net/library/tcpip.html#connect
 "This three-way handshake is sometimes referred to as an exchange of
  "syn, syn/ack, and ack" segments. It is important for a number of
  reasons. For individuals looking at packet traces, recognition of
  the three-way handshake is how to find the start of a connection.
  For firewalls, proxy severs, intrusion detectors, and other systems,
  it provides a way of knowing the direction of a TCP connection setup
  since rules may differ for outbound and inbound connections."

Q: Outline the TCP flags used in the basic TCP 3-way handshake.
   Clearly indicate which is server and which is client.

You can attack some servers by doing many partial handshakes and
exhausting resources:
 - syn flood attack: http://www.vijaymukhi.com/vmis/tcp.htm

Q: How does a syn-flood attack work?

UDP and TCP packet header
-------------------------

The TCP header is much more complex than the UDP header
  http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html#TH
 - has to handle issues dealing with reliability, flow control, congestion

 - note the peculiar TCP/UDP pseudo-header for checksums
   - UDP and TCP checksums include the source and destination IP addresses!
   - UDP pseudo-header: http://tools.ietf.org/html/rfc768  page 2
   - TCP pseudo-header: http://tools.ietf.org/html/rfc793  page 16-17
   http://www.postel.org/pipermail/end2end-interest/2005-February/004617.html
   http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html

Q: What purpose is the UDP "pseudo header" used in calculating a checksum?
Q: What purpose is the TCP "pseudo header" used in calculating a checksum?

Q: T/F TCP and UDP include the IP layer packet source and destination
   addresses in their checksum calculations.

TCP state transition diagram
----------------------------
   http://www4.informatik.uni-erlangen.de/Projects/JX/Projects/TCP/tcpstate.html
   http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html#ST
 - client and server both start in the CLOSED state (top of diagram)
 - graph arrows are labelled with transitions <Expect>[/<Send>]
   where <Expect> indicates either an incoming packet with a flag set,
   (e.g. ACK, FIN) or a deliberate change to another state (e.g. "Passive
   Open", "Close", "Send").

* The "three-way handshake" for a non-simultaneous connectin opening:
  - a server is sitting in the LISTEN state
  - a client does an "active open"

  1. client sends:  SYN, moves to SYN_SENT
  2. server sends:  SYN,ACK, moves to SYN_RCVD
  3. client sends:  ACK, moves to ESTABLISHED
  4. server receives ACK and moves to ESTABLISHED

  Now both processes are in the "ESTABLISHED" state.

Q: Give the three-way TCP handshake, showing the role of client and server

* A *simultaneous* TCP connection opening:

  1. both systems send SYN and move to SYN_SENT
  2. both send SYN,ACK (RFC793 diagram has an error) and move to SYN_RCVD
  3. both systems receive SYN,ACK and move to ESTABLISHED

 - RFC1122 4.2.2.7 says RFC793 has an error on what is sent on the transition
   from SYN_SENT directly to SYN_RCVD: should be sending SYN,ACK, not SYN
   http://tools.ietf.org/html/rfc1122
 - the corrections suggested by RFC1122 appear to break the simultaneous open;
   one has to interpret the "ACK" transition as "ACK or SYN,ACK"

Q: Looking at the TCP state transition diagram, into which state will
   a program move if it is currently in state SYN_SENT and it receives a
   TCP packet with just the SYN flag set?  When it makes that state
   transition, what flags will it set in the next outgoing packet?

- be familar with interpreting a TCP state diagram in RFC793
   - three-way handshake for an asymmetric (non-simultaneous) open
   - trace a simultaneous open in RFC793
     - the corrections suggested by RFC1122 appear to break simultaneous open;
       one has to interpret the "ACK" transition as "ACK or SYN,ACK"
   - RFC1122 section 4.2.2.10 says:
      http://tools.ietf.org/html/rfc1122
     "It sometimes surprises implementors that if two applications attempt
      to simultaneously connect to each other, only one connection is
      generated instead of two.  This was an intentional design decision;
      don't try to "fix" it."

Q: T/F When two systems attempt simultaneous connections with each other,
   you end up with two separate TCP streams.

Lab work: Testing all three processes in a TCP echo client/server application.