------------------------- Week 07 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) New in Notes: http://teaching.idallen.com/cst8165/07w/notes/buffer_overflows.txt deep_indentation.txt header_files.txt Use GDB! -------- "If you have eight hours to cut down a tree, it is best to spend six hours sharpening your axe and then two hours cutting down the tree." Google search: gdb tutorial Zero Tolerance for Buffer Overflows ----------------------------------- http://teaching.idallen.com/cst8165/07w/notes/buffer_overflows.txt Q: Why must Internet-facing programs avoid buffer overflows? Q: What gcc flag turns on local symbols and line numbers for gdb and valgrind? Q: What does "valgrind" do? Q: Will valgrind find all buffer overflow errors? Q: T/F Like in Java, when you have a buffer overflow in C language the program stops on the line causing the buffer overflow. Aside: On choosing buffer sizes ------------------------------- When deciding how much buffer space an Internet server should allow for incoming request lines, you have to weigh memory use against functionality. Here's an excerpt from an RFC extending the SMTP protocol, which originally specified a maximum buffer of just 512 bytes: http://tools.ietf.org/html/rfc1869 http://www.rfc-editor.org/rfc/rfc1869.txt 4.1.2. Maximum command line length This specification extends the SMTP MAIL FROM and RCPT TO to allow additional parameters and parameter values. It is possible that the MAIL FROM and RCPT TO lines that result will exceed the 512 character limit on command line length imposed by RFC 821. This limit is hereby amended to only apply to command lines without any parameters. Each specification that defines new MAIL FROM or RCPT TO parameters must also specify maximum parameter value lengths for each parameter so that implementors of some set of extensions know how much buffer space must be allocated. The maximum command length that must be supported by an SMTP implementation with extensions is 512 plus the sum of all the maximum parameter lengths for all the extensions supported. Reading from Network sockets ---------------------------- A program can detect when a write() or send() call doesn't write all the bytes. It can loop until all the bytes are sent, perhaps using a cover function such as sendall() (see Lab #03 for sendall()). How about the reverse - reading from a network socket? Simple read() or recv() calls on Internet-connected sockets are not guaranteed to return data in the same quantities that remote applications send it. Just because a remote application writes 500 bytes into a socket doesn't mean that your next read() will return those 500 bytes. The data may be incomplete, or that 500 bytes may be followed by another 500 bytes (or less, or more) from a following write() on the same connection. How does your program know that it has *read* all the bytes that the client has sent? Answer: It can't, unless your application provides some assistance. You have two choices: 1. The sending program has to send a count of the number of bytes (probably one of the very first things it sends), and the receiving program has to loop to make sure it reads that many bytes. With this solution, your application has to first "encapsulate" the data it sends with an application-specific header indicating how much data is being sent. 2. Your application needs to send some trailing flag in the data stream indicating that unit of data is complete, and the receiving program has to loop to make sure it reads all the bytes until it sees the flag before processing. The flag has to be some byte or combination of bytes that never appears inside the data itself. For a single-line chat server, you might pick a newline character. Note that if you encapsulate and send a header containing a size field, the receiving program may still have to loop a bit to get all the bytes that make up the bytes of the number indicating the size, unless the size is just a single byte! In all of this, you also have to handle the error and EOF cases where the data stream ends unexpectedly. Q: T/F We can implement a readall() function, similar to our sendall() function, that simply loops until it has read all the bytes from a socket. Then we know we have all the data sent by the client. Q: What are the two main methods that allow an application to communicate that "all" the data has been sent over a socket? Q: T/F When reading from a network socket, each read returns all the bytes sent by the remote client. Routing Protocols ----------------- Ref: http://www.freesoft.org/CIE/Topics/116.htm Pre-classless (the old Class A,B,C way) - Classful addressing, Figure 5: A nice diagram of how the top bits determined the old Class A,B,C addressing: http://www.garykessler.net/library/tcpip.html#IPadd Routing vs. Switching http://www.networkcomputing.com/netdesign/1122ippart2.html - routers forward based on IP address (OSI Layer 3) - must open up the Ethernet packet and look at the IP header - switches and hubs forward packets based on MAC address (OSI Layer 2) - don't look inside the Ethernet packets - no IP info used Q: T/F Ethernet switches and hubs forward based on IP addresses. Routing is a way of getting packets from one place to another Routing software determines the next hop for a datagram Specifically does 2 things 1. Determines the (optimal) path 2. Delivers the datagram Q: What is the purpose of network routing? Indirect routing is used when the network numbers of the source and destination do not match: - the packet must leave the local network - must be forwarded by a known "gateway" (a router) - a gateway is a node that knows how to reach the destination - may have different gateways for different networks Q: T/F To send packets to machines on your local network, you first send the packet to the gateway. Path Determination - which way to send a packet? - http://www.freesoft.org/CIE/Topics/117.htm - http://www.freesoft.org/CIE/Topics/118.htm Algorithms use 'metrics' to determine path. Metrics - cost, length, etc Algorithms populate routing tables, tables are used for determination * If two routing prefix paths match, the *longest* match is preferred. Default route: 0.0.0.0/0 or 0.0.0.0/0.0.0.0 - every match has more 1-bits in the mask; always a last choice Q: If two routing prefixes match a packet, which one is chosen? Q: Why is the default address always chosen last? http://www.freesoft.org/CIE/Topics/116.htm The network can adjust routing tables according to network changes: Router receives change notifications and recalculates its routing table Updates to a routing table triggers notifications sent out to other routers Minimalized human config required (setup) Be careful that the updates don't cause a "storm" or route instability Overloading the IP network: http://www.africonnect.com/tcpip_tut.htm "The IP protocol does not guarantee delivery, or that packets will arrive in the proper sequence. [...] "Rather than simply discarding all newly arriving packets, the routers are programmed discard packets in a random fashion to prevent buffer overflow. This is best implemented in a "fair" way so that the data stream having the largest volume suffers the largest number of dropped packets." Q: What happens to packets when the Internet gets overloaded? How do routers recover from an overload? Protocols needed to decide on global Internet routing tables: Ref: http://www.freesoft.org/CIE/Topics/87.htm - interior: http://www.freesoft.org/CIE/Topics/119.htm - RIP - Router Information Protocol (old) - used internally http://www.freesoft.org/CIE/Topics/90.htm - very widely used - but 1970s design - OSPF - Open Shortest Path First - replaces RIP, used internally http://www.geocities.com/Heartland/4394/work/ospf.html http://www.freesoft.org/CIE/Topics/89.htm - exterior: http://www.freesoft.org/CIE/Topics/120.htm - BGP4 - Border Gateway Protocol (v4) - used between large nets http://www.freesoft.org/CIE/Topics/88.htm - Cisco protocols (IGRP, EIGRP) - etc. Q: What do BGP and OSPF stand for? What is each used for? Q: What interior/internal routing protocol is replacing the old RIP protocol? Q: T/F - Both BGP and OSPF are designed to work on a local network. Tracing: traceroute, "traceroute -n" (RTFM) - lost or blocked packets print "*" Q: What do all the fields in the output of a traceroute mean? Local Routing tables in Unix/Linux kernel: Commands: "ip route list" or just "ip route" Old way: "route" and "netstat -r" - shows the known network interface routes Q: T/F "ip route" will show the IP address of your gateway Defining protocols - Request for Comment (RFC) ---------------------------------------------- Who sets down the standards for programs that communicate over the Internet? IETF: Internet Engineering Task Force: "Rough concensus and running code" http://radar.oreilly.com/archives/2007/01/what_actually_i.html "(FWIW, this happened to the IETF as well. It's "rough consensus and running code" policy was outflanked by big companies who just sent enough people to the meetings to affect the "rough consensus," and gradually, the IETF became driven more by politics than pure technical excellence in some areas.)" http://tools.ietf.org/rfc/index Q: What role do RFC documents play in the Internet? http://www.rfc-editor.org/ "The RFC (Request for Comments) series contains technical and organizational documents about the Internet, including the technical specifications and policy documents produced by the Internet Engineering Task Force (IETF)." RFC documents lay out Internet protocols, e.g. for SMTP: ftp://ftp.rfc-editor.org/in-notes/rfc2821.txt http://tools.ietf.org/html/rfc2821 - Some words have specific meanings, see: http://tools.ietf.org/html/rfc2119 "The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119." - RFC uses modified ABNF (Augmented Backus-Naur Form) to describe protocols: http://tools.ietf.org/html/rfc2234 Certain rules are predefined as "CORE" rules, e.g. ALPHA, DIGIT, CHAR, etc. (from section 6.1 in rfc2234) so you don't have to do all the work. Q: True/False - strings ("abc") in ABNF are case-sensitive (RFC2234 p.4) Q: give an ABNF rule that defines an Algonquin student email address (abcd0001@algonquincollege.com), using these definitions: ALPHA = %x41-5A / %x61-7A ; from CORE: A-Z upper and a-z lower DIGIT = %x30-39 ; from CORE: 0-9 atsign = "@" period = "." Note that ab000001 and abc00001 are also valid userids (must be eight characters); but, a0000001, ab0001, and abcd000001 are not. Hint: start with this and fix it to handle the other two valid cases: algemail = 4ALPHA 4DIGIT atsign "algonquincollege" period "com" Network Diagnostic Tools ------------------------ Please re-read the "Acceptable Use Warning" on the course home page. Note that at Algonquin College many/most network probe ports and protocols (even ping and traceroute) are blocked and will *NOT* work. Sorry! Try them at home instead. These tools are helpful in diagnosing network problems: arp - show/change MAC addresses currently known to this host ethereal - GUI packet sniffer fuser - (-n tcp, -n udp) list processes with open TCP or UDP ports mtr - ping-style traceroute: packet route diagnostic netcat (nc) - Network Swiss Army Knife: connect/listen to ports netstat -natu - list open and listening TCP and UDP connections nmap - Network Mapper - port prober ping - ICMP echo generator/receiver tcpdump - command-line (non-GUI) packet sniffer telnet - TCP connection program: use when netcat is not available traceroute - packet route diagnostic Major service port numbers (often seen in trace output): http://www.tcpipguide.com/free/t_TCPCommonApplicationsandServerPortAssignments.htm - port numbers are given names in the file /etc/services - see also the master list at http://www.iana.org/assignments/port-numbers * TCP 20 ftp-data * TCP 21 ftp (control) TCP 22 SSH TCP 23 telnet * TCP 25 SMTP (sending mail only) * UDP/TCP 53 domain (DNS) UDP 67-68 DHCP * TCP 80 HTTP (WWW) * TCP 110 POP3 (receiving mail only) TCP 113 ident (identifying incoming TCP connections) TCP 119 NNTP (Network News) UDP/TCP 123 NTP (Network Time) UDP/TCP 137-139 Microsoft netbios (SMB) (Samba) TCP 443 HTTPS (secure WWW) UDP/TCP 445 Microsoft-DS UDP/TCP 631 Internet Printing Protocol (IPP - CUPS) The "*" protocols are the ones most important in this course. On Unix/Linux, individual network servers/daemons (e.g. ssh, http) may have individual start-up scripts, or they may run on demand out of the master "inetd" or "xinetd" super-servers. IP - Internet Protocol - and the 4 (or 5) layer stack ----------------------------------------------------- Most everything on the Internet starts with just plain IP, "the Internet's most basic protocol" (http://www.freesoft.org/CIE/Topics/79.htm): * Internet layer - IP - IP has no port information; only IP addresses Figure 4: http://tools.ietf.org/html/rfc791#section-3.1 Figure 4: http://www.garykessler.net/library/tcpip.html#IP - simple http://www.freesoft.org/CIE/Topics/79.htm - large amounts of data are "fragmented" into multiple IP packets - the IP Identification field numbers the fragments for later re-assembly - this was later determined to be a Very Bad Idea - fragmentation is considered harmful Q: True/False - the IP packet header contains port numbers Q: Looking at RFC791 Figure 4, what is the longest total length theoretically possible for an IP packet? Reference: http://beej.us/guide/bgnet/output/htmlsingle/bgnet.html#twotypes Below IP are one (or two) layers; above IP are another two layers. Internet four (or five) layer stack has IP at layer 2: Figure 2: http://www.garykessler.net/library/tcpip.html#arch 4 - application layer (programs) 3 - TCP/UDP (transport/host layer) 2 - IP (Internet/gateway layer), ICMP 1 - Network/hardware layer (e.g. Ethernet, ARP, MAC addresses) (Layer 1 may be split into Physical/Network Access) Since the Internet network stack layers went from four layers to five recently, nobody knows what the official names are. * Near the bottom, below IP: Network layer (e.g. Ethernet) - ARP converts between Etherner hardware (MAC) and IP addresses http://www.garykessler.net/library/tcpip.html#ARP Q: What does ARP stand for and how is it used in Internet networking? * Just above IP: Transport layer - layered on top of IP - UDP and TCP add "port" numbers to IP Protocols and ports: http://www.garykessler.net/library/tcpip.html#transport - UDP is essentially raw IP plus port numbers; still unreliable See the RFC: http://tools.ietf.org/html/rfc768 (only 3 pages!) - used in DNS and TFTP - TCP is like UDP with reliable transmission added See the RFC: http://tools.ietf.org/html/rfc793 (85 pages!) Q: Why aren't the source and destination addresses in the TCP/UDP header? Q: Why is the UDP RFC 3 pages but the TCP RFC is 85 pages? Most UDP/TCP Port Numbers have to be Registered with IANA - IANA: Internet Assigned Numbers Authority - Master IANA List of ports: http://www.iana.org/assignments/port-numbers - ports are in three ranges: "Well Known", "Registered", "Dynamic/Private" - you SHOULD NOT use a "Well Known" or "Registered" port without first registering it with IANA. Q: What port numbers lie in the "Well Known" range? Q: T/F your Internet application can use any port it wants outside of the "Well Known" range * Just above Transport (TCP/UDP): Application layer - this is the part where you get to write the code - SMTP, HTTP, POP3, etc. ICMP - Internet Control Message Protocol ---------------------------------------- Ref: http://www.freesoft.org/CIE/Topics/81.htm Q: Is the delivery of ICMP messages guaranteed? Q: What is ICMP used for on the Internet (name two functions)? Q: What popular program uses ICMP echo packets? http://www.freesoft.org/CIE/Topics/53.htm Q: How does traceroute use ICMP to map a packet route? http://www.freesoft.org/CIE/Topics/54.htm Q: Traceroute is not reliable. What can go wrong (describe two things)? http://www.freesoft.org/CIE/Topics/54.htm TCP vs UDP - SOCK_STREAM vs SOCK_DGRAM -------------------------------------- * What control do you have over the IP layer from Unix/Linux? - you can set options on the sockets you open that affect the TCP/IP stack - "man 7 socket" setsockopt(2) and getsockopt(2) - SO_KEEPALIVE - SO_RCVTIMEO SO_SNDTIMEO - SO_BINDTODEVICE - SO_REUSEADDR - SO_DONTROUTE - SO_BROADCAST - SO_LINGER - SO_PRIORITY Q: What function calls are available to C programmers to set options on sockets? Give two examples of the kind of options you can set. Understanding UDP ----------------- Ref: http://tools.ietf.org/html/rfc768 (only 3 pages!) http://www.freesoft.org/CIE/RFC/1122/72.htm "The User Datagram Protocol UDP [UDP:1] offers only a minimal transport service -- non-guaranteed datagram delivery -- and gives applications direct access to the datagram service of the IP layer. UDP is used by applications that do not require the level of service of TCP or that wish to use communications services (e.g., multicast or broadcast delivery) not available from TCP. UDP is almost a null protocol; the only services it provides over IP are checksumming of data and multiplexing by port number. Therefore, an application program running over UDP must deal directly with end-to-end communication problems that a connection-oriented protocol would have handled -- e.g., retransmission for reliable delivery, packetization and reassembly, flow control, congestion avoidance, etc., when these are required. The fairly complex coupling between IP and TCP will be mirrored in the coupling between UDP and many applications using UDP. " - unreliable, no retransmission: "fire and forget" - a very thin layer added inside an IP packet - adds "ports" to IP and little else: any reliability or retransmission work has to be done by the application - recall that the TCP RFC is 85 pages; that's an indication of how hard it would be to make your application turn UDP into a reliable protocol! - big user of UDP is DNS queries and replies Q: What four fields are added to raw IP by a UDP packet header? Q: What purpose is the "pseudo header" used in calculating a checksum? http://tools.ietf.org/html/rfc768 page 2 http://www.postel.org/pipermail/end2end-interest/2005-February/004617.html http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html Understanding TCP ----------------- Ref: http://tools.ietf.org/html/rfc793 (85 pages!) http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html http://www4.informatik.uni-erlangen.de/Projects/JX/Projects/TCP/tcpstate.html "TCP provides a connection oriented, reliable, byte stream service. The term connection-oriented means the two applications using TCP must establish a TCP connection with each other before they can exchange data. It is a full duplex protocol, meaning that each TCP connection supports a pair of byte streams, one flowing in each direction. TCP includes a flow-control mechanism for each of these byte streams that allows the receiver to limit how much data the sender can transmit. TCP also implements a congestion-control mechanism." Q: Does TCP include flow-control and/or congestion control? Q: Can a TCP connection be on one-way or must it always be two way? Q: What purpose is the "pseudo header" used in calculating a checksum? http://tools.ietf.org/html/rfc793 page 16-17 http://www.postel.org/pipermail/end2end-interest/2005-February/004617.html http://www.postel.org/pipermail/end2end-interest/2005-February/004616.html Handshaking: 3 way open, 4 way close including SYN, ACK, FIN etc - http://www.garykessler.net/library/tcpip.html#connect "This three-way handshake is sometimes referred to as an exchange of "syn, syn/ack, and ack" segments. It is important for a number of reasons. For individuals looking at packet traces, recognition of the three-way handshake is how to find the start of a connection. For firewalls, proxy severs, intrusion detectors, and other systems, it provides a way of knowing the direction of a TCP connection setup since rules may differ for outbound and inbound connections." Q: Outline the TCP flags used in the basic TCP 3-way handshake. Clearly indicate which is server and which is client. You can attack some servers by doing many partial handshakes: - http://www.vijaymukhi.com/vmis/tcp.htm (syn flood attack) Q: How does a syn-flood attack work?