------------------------- Week 07 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Lab 4 is coming up. Review: - writing test cases - GDB - symptoms of buffer overflow in C programs - IP routing - subnetting / supernetting and path aggregation / CIDR - DNS review - getting a machine on the net (minimal) ----------------------------------------------------------------------------- Linux commands for DNS testing: host and dig ------------------------------- $ host idallen.ca idallen.ca has address 208.76.82.6 idallen.ca mail is handled by 0 idallen.ca. $ host idallen.ca ns1.totalchoicehosting.com # use this DNS server Using domain server: Name: ns1.totalchoicehosting.com Address: 64.246.50.105#53 idallen.ca has address 208.76.82.6 idallen.ca mail is handled by 0 idallen.ca. $ host -t txt idallen.ca idallen.ca descriptive text "v=spf1 ip4:66.11.175.96/30 ip4:66.11.173.142 a mx ptr a:cpu1808.adsl.bellglobal.com mx:idallen.org include:algonquincollege.com ?all" $ dig idallen.ca ; <<>> DiG 9.3.4 <<>> idallen.ca ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31955 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;idallen.ca. IN A ;; ANSWER SECTION: idallen.ca. 14382 IN A 208.76.82.6 ;; AUTHORITY SECTION: idallen.ca. 70604 IN NS ns2.totalchoicehosting.com. idallen.ca. 70604 IN NS ns1.totalchoicehosting.com. ;; ADDITIONAL SECTION: ns2.totalchoicehosting.com. 170671 IN A 65.254.32.122 ns1.totalchoicehosting.com. 170671 IN A 64.246.50.105 ;; Query time: 67 msec ;; SERVER: 192.168.9.254#53(192.168.9.254) ;; WHEN: Tue Oct 16 04:21:19 2007 ;; MSG SIZE rcvd: 134 $ dig @ns1.totalchoicehosting.com idallen.ca # use this DNS server ---------------------------------------------------------------------------- Lab 4 - Coding the Looping Echo Server -------------------------------------- Assignment: Modify the server to keep reading lines from the client, until EOF, echoing those lines back to the client. Review the PDL for any process that reads from one place and writes to another. The Server must be enhanced to keep reading from the client, instead of stopping after just one line. * Server modifications: - the conversion of server2.c to a looping echo server: - write PDL for server2.c and the revised PDL for the converted server Q: Give the PDL for a forking "echo" server that receives connections from clients and echoes the data received back to the client. Coding the Looping Client ------------------------- Assignment: Modify the client to keep reading lines from standard input until EOF, sending the lines to the server, and to keep reading lines from the server untile EOF, sending the lines to standard output. Review the PDL for any process that reads from one place and writes to another. The Client must be enhanced to fork() into two separate processes, each with one of these read/write loops. One process reads stdin and writes to the server; the other process reads from the server and writes to standard output. On EOF from the keyboard, the client shuts down just the writing half of the server socket. On EOF from the server, the client kills the other process that is hung reading the keyboard. See Notes: eof_handling.txt Q: Give the PDL for a forking two-process "echo" client that sends keyboard input to a remote TCP/IP server and receives the echo of the input back and displays it on the screen. Explain under what conditions one process needs to kill() the other process. Explain under what conditions one process needs to shut down the writing half of the socket that connects to the remote server. * Client modifications: Reference: http://www.cs.rpi.edu/courses/sysprog/sockets/sock.html The client.c code is explained line-by-line in the above web page. - see the PDL for client.c and the revised PDL for the converted client - reorganize the command line argument parsing in front of the socket code - keep the parsing code separate from the server code - add a check for a valid port number that is within range - replace the deprecated bzero() and bcopy() functions - note the use of socket() and connect() in client.c - fix the error message to say what host and port failed - error messages must have four qualities (see programming_style.txt) - fix the prompt - detect errors and EOF when reading standard input - use shutdown() to half-close the socket when finished writing to the server See Notes: eof_handling.txt - more updates to do, see the upcoming assignments References to Notes files (required reading): ------------------------- eof_handling.txt Zero Tolerance for Buffer Overflows ----------------------------------- http://teaching.idallen.com/cst8165/07w/notes/buffer_overflows.txt Q: Why must Internet-facing programs avoid buffer overflows? Q: What gcc flag turns on local symbols and line numbers for gdb and valgrind? Q: What does "valgrind" do? Q: Will valgrind find all buffer overflow errors? Q: T/F Like in Java, when you have a buffer overflow in C language the program stops on the line causing the buffer overflow. Aside: On choosing buffer sizes ------------------------------- When deciding how much buffer space an Internet server should allow for incoming request lines, you have to weigh memory use against functionality. Here's an excerpt from an RFC extending the SMTP protocol, which originally specified a maximum buffer of just 512 bytes: http://tools.ietf.org/html/rfc1869 http://www.rfc-editor.org/rfc/rfc1869.txt 4.1.2. Maximum command line length This specification extends the SMTP MAIL FROM and RCPT TO to allow additional parameters and parameter values. It is possible that the MAIL FROM and RCPT TO lines that result will exceed the 512 character limit on command line length imposed by RFC 821. This limit is hereby amended to only apply to command lines without any parameters. Each specification that defines new MAIL FROM or RCPT TO parameters must also specify maximum parameter value lengths for each parameter so that implementors of some set of extensions know how much buffer space must be allocated. The maximum command length that must be supported by an SMTP implementation with extensions is 512 plus the sum of all the maximum parameter lengths for all the extensions supported. Reading from Network sockets ---------------------------- A program can detect when a write() or send() call doesn't write all the bytes. It can loop until all the bytes are sent, perhaps using a cover function such as sendall() (see earlier notes for sendall()). How about the reverse - reading from a network socket? Simple read() or recv() calls on Internet-connected sockets are not guaranteed to return data in the same quantities that remote applications send it. Just because a remote application writes 500 bytes into a socket doesn't mean that your next read() will return those 500 bytes. The data may be incomplete, or that 500 bytes may be followed by another 500 bytes (or less, or more) from a following write() on the same connection. How does your program know that it has *read* all the bytes that the remote client has sent? Answer: Your program can't know, unless your application provides some assistance. You have two choices: A. The sending program has to send a count of the number of bytes (probably one of the very first things it sends), and the receiving program has to loop to make sure it reads that many bytes. With this solution, your application has to first "encapsulate" the data it sends with an application-specific header indicating how much data is being sent. B. Your application needs to send some trailing flag in the data stream indicating that unit of data is complete, and the receiving program has to loop to make sure it reads all the bytes until it sees the flag before processing. The flag has to be some byte or combination of bytes that never appears inside the data itself. For a single-line chat server, you might pick a newline character. Note that if you encapsulate and send a header containing a size field, the receiving program may still have to loop a bit to get all the bytes that make up the bytes of the number indicating the size, unless the size is just a single byte! In all of this, you also have to handle the error and EOF cases where the data stream ends unexpectedly. Q: T/F We can implement a readall() function, similar to our sendall() function, that simply loops until it has read all the bytes from a socket. Then we know we have all the data sent by the client. Q: What are the two main methods that allow an application to communicate that "all" the data has been sent over a socket? Q: T/F When reading from a network socket, each read returns all the bytes sent by the remote client. ---------------------------------------------------------------------------- Defining Internet protocols - Request for Comment (RFC) ------------------------------------------------------- Who sets down the standards for programs that communicate over the Internet? IETF: Internet Engineering Task Force: "Rough concensus and running code" http://radar.oreilly.com/archives/2007/01/what_actually_i.html "(FWIW, this happened to the IETF as well. It's "rough consensus and running code" policy was outflanked by big companies who just sent enough people to the meetings to affect the "rough consensus," and gradually, the IETF became driven more by politics than pure technical excellence in some areas.)" Master RFC Index: http://tools.ietf.org/rfc/index Q: What role do RFC documents play in the Internet? http://www.rfc-editor.org/ "The RFC (Request for Comments) series contains technical and organizational documents about the Internet, including the technical specifications and policy documents produced by the Internet Engineering Task Force (IETF)." RFC documents lay out Internet protocols, e.g. for SMTP: ftp://ftp.rfc-editor.org/in-notes/rfc2821.txt http://tools.ietf.org/html/rfc2821 - Some words have specific meanings, see: http://tools.ietf.org/html/rfc2119 "The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119." - RFC uses modified ABNF (Augmented Backus-Naur Form) to describe protocols: http://tools.ietf.org/html/rfc2234 Certain rules are predefined as "CORE" rules, e.g. ALPHA, DIGIT, CHAR, etc. (from section 6.1 in rfc2234) so you don't have to do all the work. Q: True/False - strings ("abc") in ABNF are case-sensitive (RFC2234 p.4) Q: give an ABNF rule that defines an Algonquin student email address (abcd0001@algonquincollege.com), using these definitions: ALPHA = %x41-5A / %x61-7A ; from CORE: A-Z upper and a-z lower DIGIT = %x30-39 ; from CORE: 0-9 atsign = "@" period = "." Note that ab000001 and abc00001 are also valid userids (must be eight characters); but, a0000001, ab0001, and abcd000001 are not. Hint: start with this and fix it to handle the other two valid cases: algemail = 4ALPHA 4DIGIT atsign "algonquincollege" period "com" RFC tools by IETF ----------------- http://tools.ietf.org/ - html cross-linked pages - http://tools.ietf.org/html/ - reading tools - Firefox plugin - difference tools - wdiff (word diff) - verification tools - ABNF to regexp converter ---------------------------------------------------------------------------- IP - Internet Protocol ---------------------- - http://tools.ietf.org/html/rfc791 (45 pages, Sep 1981) - layer 2 of the 4 (or 5) layer stack: 4 - application layer (programs) 3 - TCP/UDP (transport/host-to-host layer) 2 - IP (Internet/gateway layer), ICMP 1 - Network/hardware layer (e.g. Ethernet, ARP, MAC addresses) (Layer 1 may be split into Physical/Network Access) Internet four (or five) layer stack has IP at layer 2. Below IP are one (or two) layers; above IP are another two layers. - Figure 2: http://www.garykessler.net/library/tcpip.html#arch Most everything on the Internet starts with just plain IP, "the Internet's most basic protocol" (http://www.freesoft.org/CIE/Topics/79.htm): * Internet layer - IP - IP has no port information; only IP addresses Figure 4: http://tools.ietf.org/html/rfc791#section-3.1 Figure 4: http://www.garykessler.net/library/tcpip.html#IP - simple http://www.freesoft.org/CIE/Topics/79.htm - large amounts of data may be "fragmented" into multiple IP packets - the IP Identification field numbers the fragments for later re-assembly - this was later determined to be a Very Bad Idea - fragmentation is now considered harmful, difficult to get right, etc. - more on this later Compare protocol complexities: - IP RFC791 is 45 pages - UDP RFC768 is 3 more pages on top of IP - TCP RFC793 is 95 more pages on top of IP - DCCP RFC4340 is 125 pages on top of IP (!!) Q: True/False - the IP packet header contains port numbers Q: Looking at RFC791 Figure 4, what is the longest total length theoretically possible for an IP packet? Q: Looking at RFC791 Figure 4, what is the largest time-to-live value possible?