------------------------- Week 05 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Keep up on your readings (Course Outline: average 4 hours/week homework) Midterm test coming in Week 6 - see the questions in the Notes files. Be very familiar with the detailed pseudocode of your echo server. Lab #2 is available - a simple port scanner. Due February 11 (Week 6). Review ------ - setting socket options - Unix/Linux Network Diagnostic Tools - Getting a machine on the net - Domain Name System (DNS) - Testing - black box vs. white box, "behavioral" vs. "structural" - Coding a forking TCP Client - Zero Tolerance for Buffer Overflows - Reading from Network sockets ---------------------------------------------------------------------------- Defining Internet protocols - Request for Comment (RFC) ------------------------------------------------------- Who sets down the standards for programs that communicate over the Internet? IETF: Internet Engineering Task Force: "Rough concensus and running code" http://radar.oreilly.com/archives/2007/01/what_actually_i.html "(FWIW, this happened to the IETF as well. It's "rough consensus and running code" policy was outflanked by big companies who just sent enough people to the meetings to affect the "rough consensus," and gradually, the IETF became driven more by politics than pure technical excellence in some areas.)" Master RFC Index: http://tools.ietf.org/rfc/index Q: What role do RFC documents play in the Internet? http://www.rfc-editor.org/ "The RFC (Request for Comments) series contains technical and organizational documents about the Internet, including the technical specifications and policy documents produced by the Internet Engineering Task Force (IETF)." RFC documents lay out Internet protocols, e.g. for SMTP: ftp://ftp.rfc-editor.org/in-notes/rfc2821.txt http://tools.ietf.org/html/rfc2821 - Some words have specific meanings, see: http://tools.ietf.org/html/rfc2119 "The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119." - RFC uses modified ABNF (Augmented Backus-Naur Form) to describe protocols: http://tools.ietf.org/html/rfc2234 Certain rules are predefined as "CORE" rules, e.g. ALPHA, DIGIT, CHAR, etc. (from section 6.1 in rfc2234) so you don't have to do all the work. Q: True/False - strings ("abc") in ABNF are case-sensitive (RFC2234 p.4) Q: give an ABNF rule that defines an Algonquin student email address (abcd0001@algonquincollege.com), using these definitions: ALPHA = %x41-5A / %x61-7A ; from CORE: A-Z upper and a-z lower DIGIT = %x30-39 ; from CORE: 0-9 atsign = "@" period = "." Note that ab000001 and abc00001 are also valid userids (must be eight characters); but, a0000001, ab0001, and abcd000001 are not. Hint: start with this ABNF and fix it to handle the other two valid cases: algemail = 4*4ALPHA 4*4DIGIT atsign "algonquincollege" period "com" Using section 3.7 the above rule can be slightly shortened to: algemail = 4ALPHA 4DIGIT atsign "algonquincollege" period "com" RFC tools by IETF ----------------- http://tools.ietf.org/ - html cross-linked pages - http://tools.ietf.org/html/ - reading tools - Firefox plugin - difference tools - wdiff (word diff) - verification tools - ABNF to regexp converter ---------------------------------------------------------------------------- IP - Internet Protocol ---------------------- - http://tools.ietf.org/html/rfc791 (45 pages, Sep 1981) - layer 2 of the 4 (or 5) layer stack: 4 - application layer (programs) 3 - TCP/UDP (transport/host-to-host layer) 2 - IP (Internet/gateway layer), ICMP 1 - Network/hardware layer (e.g. Ethernet, ARP, MAC addresses) (Layer 1 may be split into Physical/Network Access) Internet four (or five) layer stack has IP at layer 2. Below IP are one (or two) layers; above IP are another two layers. - Figure 2: http://www.garykessler.net/library/tcpip.html#arch Most everything on the Internet starts with just plain IP, "the Internet's most basic protocol" (http://www.freesoft.org/CIE/Topics/79.htm). * Internet layer - IP - IP has no port information; only IP addresses Figure 4: http://tools.ietf.org/html/rfc791#section-3.1 Figure 4: http://www.garykessler.net/library/tcpip.html#IP - simple http://www.freesoft.org/CIE/Topics/79.htm - large amounts of data may be "fragmented" into multiple IP packets - the IP Identification field numbers the fragments for later re-assembly - this was later determined to be a Very Bad Idea - fragmentation is now considered harmful, difficult to get right, etc. - more on fragmentation issues later Compare protocol complexities: - TCP and UDP are on "top" of IP (means packets go *inside* IP packets): - IP RFC791 is 45 pages - UDP RFC768 is only 3 more pages on top of IP (unreliable) - TCP RFC793 is 95 more pages on top of IP (reliable) - DCCP RFC4340 is 125 pages on top of IP (!!) http://tools.ietf.org/html/rfc4340 * Unprivileged Unix/Linux programs cannot open sockets that send raw IP datagrams. See packet(7) ("man 7 packet"): "Only processes with effective UID 0 or the CAP_NET_RAW capability may open packet sockets." The closest an unprivileged Unix/Linux program can get to sending raw IP packets is socket type "SOCK_DGRAM" (UDP) - a small veneer on top of IP. * Handling overloading of the IP network: http://www.africonnect.com/tcpip_tut.htm "The IP protocol does not guarantee delivery, or that packets will arrive in the proper sequence. [...] "Rather than simply discarding all newly arriving packets, the routers are programmed discard packets in a random fashion to prevent buffer overflow. This is best implemented in a "fair" way so that the data stream having the largest volume suffers the largest number of dropped packets." ---------------------------------------------------------------------------- Q: Looking at RFC791 Figure 4, what is the longest total length theoretically possible for an IP packet? Q: Looking at RFC791 Figure 4, what is the largest time-to-live value possible? Q: Looking at RFC791 Figure 4, what is the longest total length theoretically possible for an IP packet? Q: T/F packets get larger as they move down the protocol stack from Layer 4 (Application) down to the Physical media. Q: True/False - the IP packet header contains port numbers Q: What happens to IP packets when the Internet gets overloaded? How do routers recover from an overload? ---------------------------------------------------------------------------- ICMP - Internet Control Message Protocol ---------------------------------------- - http://tools.ietf.org/html/rfc792 (September 1981, 21 pages) - extended by RFC950 (Address Mask) and RFC4884 (Multi-Part) - same layer as IP (layer 2) - unreliable - ICMP packets have a "type" field (do not use "ports") - the "ping" echo is just one of many types of ICMP - used to communicate regarding: 1. network errors - e.g. connection refused (port not open) 2. network congestion - e.g. source quench - slow your transmission rate 3. packet TTL expiry (time-out) - used by traceroute to track a network path - but traceroute is not reliable: buggy TCP stacks don't handle TTL, changing paths, routing problems, ICMP doesn't show router outbound addresses (only inbound interfaces) - see http://www.freesoft.org/CIE/Topics/54.htm 4. network troubleshooting (e.g. is a host up?) - e.g. "ping" ICMP echo and echo reply type http://www.freesoft.org/CIE/Topics/53.htm "The Internet Protocol is not designed to be absolutely reliable. The purpose of these control messages is to provide feedback about problems in the communication environment, not to make IP reliable. There are still no guarantees that a datagram will be delivered or a control message will be returned. Some datagrams may still be undelivered without any report of their loss. The higher level protocols that use IP must implement their own reliability procedures if reliable communication is required." - RFC950 References: http://www.javvin.com/protocolICMP.html http://www.freesoft.org/CIE/Topics/81.htm http://www.lincoln.edu/math/rmyrick/ComputerNetworks/InetReference/81.htm http://www.cookcomputing.com/blog/archives/000367.html ---------------------------------------------------------------------------- Q: Is the delivery of ICMP messages guaranteed? Q: What is ICMP used for on the Internet? (describe two of four functions) Give examples of each use. Q: What popular program uses ICMP echo packets? Q: How does traceroute use ICMP to map a packet route? Q: Traceroute is not reliable. Why? (describe two of four issues) ---------------------------------------------------------------------------- Layer Three: TCP and UDP - port numbers --------------------------------------- Just above the IP layer is the Transport layer (layer 3). UDP and TCP add "port" numbers to IP, for host-to-host communication. - http://www.garykessler.net/library/tcpip.html#transport Layer three is the layer that is implemented using the Unix/Linux network system calls socket/bind/listen/accept and connect. Two major Linux/Unix sockets types are UDP (SOCK_DGRAM) or TCP (SOCK_STREAM). Both extend IP addressing with the concept of "ports". Reference: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#twotypes - UDP is essentially raw IP plus port numbers; still unreliable See the RFC: http://tools.ietf.org/html/rfc768 (only 3 pages!) - used in DNS and TFTP - UDP is message-oriented - fixed size chunks, unreliable - TCP is like streaming UDP with reliable transmission added See the RFC: http://tools.ietf.org/html/rfc793 (85 pages!) - TCP is stream-oriented - arbitrary byte stream, reliable http://www.tcpipguide.com/free/t_TCPDataHandlingandProcessingStreamsSegmentsandSequ.htm ---------------------------------------------------------------------------- Layer Four: Application Layer ----------------------------- Just above layer 3 Transport (TCP/UDP) is the Application layer (4) - this is the part where you get to write the application code - this the code that uses socket/bind/listen/accept - DNS, SMTP, HTTP, POP3, etc. Internet applications have to agree on which ports to use. Most UDP/TCP Port Numbers have to be Registered with IANA - IANA: Internet Assigned Numbers Authority - Master IANA List of ports: http://www.iana.org/assignments/port-numbers - ports are in three ranges: "Well Known", "Registered", "Dynamic/Private" - you SHOULD NOT use a "Well Known" or "Registered" port without first registering it with IANA. Major service port numbers (often seen in trace output): http://www.tcpipguide.com/free/t_TCPCommonApplicationsandServerPortAssignments.htm - port numbers are given names in the Linux/Unix file /etc/services - see also the master list at http://www.iana.org/assignments/port-numbers * TCP 20 ftp-data * TCP 21 ftp (control) TCP 22 SSH TCP 23 telnet * TCP 25 SMTP (sending mail only) * UDP/TCP 53 domain (DNS) UDP 67-68 DHCP * TCP 80 HTTP (WWW) * TCP 110 POP3 (receiving mail only) TCP 113 ident (identifying incoming TCP connections) TCP 119 NNTP (Network News) UDP/TCP 123 NTP (Network Time) UDP/TCP 137-139 Microsoft netbios (SMB) (Samba) TCP 443 HTTPS (secure WWW) UDP/TCP 445 Microsoft-DS UDP/TCP 631 Internet Printing Protocol (IPP - CUPS) The "*" protocols are the ones most important in this course. On Unix/Linux, individual network servers/daemons (e.g. ssh, http) that listen on the above ports may have individual start-up scripts, or they may run on demand out of the master "inetd" or "xinetd" super-servers. ---------------------------------------------------------------------------- Q: At what layer are "ports" used in the TCP/IP stack? Q: What is the highest TCP/IP stack layer implemented using socket/bind/listen/accept? Q: Differentiate between the two major types of port-oriented Unix sockets. Q: The TCP/UDP header contains port numbers. Why aren't the source and destination addresses also in the TCP/UDP header? Q: Why is the UDP RFC only three pages but the TCP RFC is 85 pages? Q: What port numbers lie in the "Well Known" range? What is special about this range on Unix/Linux systems? Q: T/F your Internet application can use any port it wants outside of the "Well Known" range. Q: How does your Unix/Linux C program (layer 4) access Layer 3 services? Q: What organization coordinates and registers port numbers world-wide? Q: What Internet services are usually attached to these port numbers: 20-21, 22, 25, 53, 67-68, 80, 110, 631 Q: What is the purpose of the inetd/xinetd super-server? ---------------------------------------------------------------------------- Understanding UDP ----------------- Ref: http://tools.ietf.org/html/rfc768 (only 3 pages!) http://www.freesoft.org/CIE/RFC/1122/72.htm "The User Datagram Protocol UDP [UDP:1] offers only a minimal transport service -- non-guaranteed datagram delivery -- and gives applications direct access to the datagram service of the IP layer. UDP is used by applications that do not require the level of service of TCP or that wish to use communications services (e.g., multicast or broadcast delivery) not available from TCP. UDP is almost a null protocol; the only services it provides over IP are checksumming of data and multiplexing by port number. Therefore, an application program running over UDP must deal directly with end-to-end communication problems that a connection-oriented protocol would have handled -- e.g., retransmission for reliable delivery, packetization and reassembly, flow control, congestion avoidance, etc., when these are required. The fairly complex coupling between IP and TCP will be mirrored in the coupling between UDP and many applications using UDP. " - UDP is a very thin layer added inside an IP packet - like raw IP, UDP is unreliable, no retransmission: "fire and forget" - adds "ports" to IP and little else: any reliability or retransmission work has to be done by the application (as is done by TCP) - recall that the TCP RFC is 85 pages; that's an indication of how hard it would be to make your application turn UDP into a reliable protocol! - a big user of UDP on the Internet is basic DNS queries and replies - DNS zone transfers (big) use TCP; everything else is UDP - yes, DNS queries are unreliable - UDP uses a "pseudo-header" for the checksum; see description below ---------------------------------------------------------------------------- UDP and TCP "pseudo-header" --------------------------- To ensure that a UDP or TCP packet arrives at the correct destination, the checksum in the packet operates on a "pseudo-header" that includes some of the IP header information, crossing "layers". (Recall that the IETF discourages thinking of TCP/IP in strict layers.) See page 2 in: http://tools.ietf.org/html/rfc768 - note the peculiar TCP/UDP "pseudo-header" for checksums - UDP and TCP checksums include the source and destination IP addresses! - UDP pseudo-header: http://tools.ietf.org/html/rfc768 page 2 - TCP pseudo-header: http://tools.ietf.org/html/rfc793 page 16-17 A very brief history of the development of the pseudo-header, and how the NSA messed things up by preventing encryption: http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html http://www.postel.org/pipermail/end2end-interest/2005-February/004617.html ---------------------------------------------------------------------------- Q: What four fields are added to raw IP by a UDP packet header? Q: Why is the UDP RFC only three pages but the TCP RFC is 85 pages? Q: What purpose is the "pseudo header" used in calculating a checksum?