------------------------- Week 04 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Midterm test coming in Week 6 - see the questions in the Notes files. Be very familiar with the detailed pseudocode of your echo server. Lab #2 is available - a simple port scanner. Due February 11 (Week 6). Review ------ - encapsulation and layering in the TCP/IP stack - properties of networks - CIDR, Subnetting, Supernetting ---------------------------------------------------------------------------- Socket Options for UDP/TCP/IP ----------------------------- As an application programmer, what control does your application have over the lower-level TCP/IP layers in Unix/Linux? - you can set options on the sockets you open that affect the TCP/IP stack - "man 7 socket" setsockopt(2) and getsockopt(2) - SO_KEEPALIVE - SO_RCVTIMEO and SO_SNDTIMEO (useful in port scanning) - SO_BINDTODEVICE - SO_REUSEADDR (you already used this in labs) - SO_DONTROUTE - SO_BROADCAST - SO_LINGER - SO_PRIORITY The SO_SNDTIMEO can be used in a port scanner to reduce the amount of time that the program waits for a reply from a blocked port (a port that issues no ICMP "connection refused" error) or from a machine that does not exist. Q: What function calls are available to C programmers to set options on sockets? Give two examples of options you can set, describing their purpose. ---------------------------------------------------------------------------- Unix/Linux Network Diagnostic Tools ----------------------------------- These tools are helpful in diagnosing network problems. Please re-read the "Acceptable Use Warning" on the course home page. Note that at Algonquin College many/most network probe ports and protocols (even ping and traceroute) may be blocked and may *NOT* work. Sorry! Try them at home instead. arp - show/change MAC addresses currently known to this host ethereal - GUI packet sniffer fuser - (-n tcp, -n udp) list processes with open TCP or UDP ports host, dig - DNS resolvers ifconfig - show network interfaces, MAC, IP addresses, statistics, etc. ip route - show main routing tables mtr - ping-style traceroute: packet route diagnostic netcat (nc) - Network Swiss Army Knife: connect/listen to ports netstat -natup - list open and listening TCP and UDP connections and procs nmap - Network Mapper - port prober ping - ICMP echo generator/receiver route - old version of "ip route"; show routing tables tcpdump - command-line (non-GUI) packet sniffer telnet - TCP connection program: use when netcat is not available traceroute - packet route diagnostic - ifconfig, "ifconfig eth0" - show MAC, IP address, and network mask of each network interface - ifconfig may be in /sbin which may not be in your default $PATH - ip route (or "netstat -r -n" or "route -n") - show IP routing tables, including route to default gateway - arp, "arp -a" - show known (cached, with time-out) MAC addresses on local net - traceroute - using increasing small TTLs, find the route of an outgoing packet - may be blocked at Algonquin College - see also "mtr" for a nicer display (not available at Algonquin) - tcpdump (privileged under Linux - needs root permissions) - show the raw network activity on a network card - ethereal (privileged under Linux - needs root permissions) - show the raw network activity on a network card (GUI) Some of these commands (e.g. ifconfig) may not be in the standard unprivileged search $PATH; you may need to add /sbin or /usr/sbin to your $PATH use them. (ifconfig is often hidden under /sbin.) ----------------------------------------------------------------------------- Q: What Linux command shows you your network interfaces and their IP addresses, MAC addresses, and network masks? Q; What Linux command(s) show you your main IP routing tables? Q: What Unix command shows the machine's ARP MAC address tables? Q: What Unix command traces the route a packet takes to a remote host? Q: ...etc... for all the above commands. What is their purpose? Q: How does your shell search path need to be modified to use these commands? ----------------------------------------------------------------------------- Getting a machine on the net ---------------------------- At minimum, your machine needs two network parameters to be a good network citizen: 1. an IP address assigned to at least one connected network card 2. a network mask or prefix length, so you know which IP addresses are on the local net and which are not If you want to talk to more than your local network, you also need: 3. the IP address of a gateway machine (for off-net access) Naturally, the gateway machine must be on your local network! If you want to use names instead of IP addresses, you need: 4. addresses of DNS server(s) to resolve host names, and 5. a host name for your machine (fully qualified with a domain name) You can program your machine with all or some these things directly (static addressing); or, you can have your machine broadcast a request to see if some other machine on the network has its configuration info: DHCP, BOOTP(old), RARP The Unix "hostname" command shows and sets the machine host name. The Unix "ifconfig" command shows and sets IP addresses and network masks on interfaces. The "arp" command shows the current kernel table listing known (cached, with time-out) MAC addresses on the local network. It can also manage the ARP table and enter/remove addresses. "ARP requests" broadcast to the local network, requesting the MAC address that maches a particular IP address. 19:59:31.658132 arp who-has 192.168.1.251 tell 192.168.1.253 19:59:31.658469 arp reply 192.168.1.251 is-at 00:60:08:ce:43:02 19:59:33.542320 arp who-has 192.168.9.183 tell 192.168.9.251 19:59:33.542736 arp reply 192.168.9.183 is-at 00:19:5b:8c:90:b8 A "default gateway" machine is a machine on your local network to which packets will be sent if your machine doesn't know where else to send them. (The packets are presumed to be destined for an off-network machine.) Without a default gateway, your machine can only communicate with other machines on the local network segment (the local ARP domain). The "ip route" (Linux) or "route" (Unix) command shows you your routing tables, including the "default" route to your gateway machine: default via 192.168.9.253 dev eth0 src 192.168.9.251 metric 30 You can run your machine without defining any DNS servers, in which case you will have to use IP addresses (not names) for all hosts. If you want to use the DNS, the file /etc/resolv.conf ("man resolv.conf") contains definitions of your domain name and the IP addresses of your DNS servers. Your network broadcast address can be calculated from your IP and mask. ----------------------------------------------------------------------------- Q: What are the two minimum network parameters needed to allow your machine to talk on the local network? Q: What are the three minimum network parameters needed to allow your machine to talk to machines that are *not* on your local network? Q: What Internet network access is possible without a DNS server? Q: What Internet network access is possible without a gateway machine? Q: What Internet network access is possible without a network mask? Q: I want my computer to talk to another computer on the same network as mine. What minimum network configuration do I need? Q: I want my computer to talk to another computer on a different network from mine. What minimum network configuration do I need? Q: What does ARP stand for and how is it used in Internet networking? Q: What is the meaning of a "local ARP domain"? Q: When an outgoing packet matches two or more rules in the routing tables, how does the kernel decide which rule to use? Q: What is the network address and mask used for a "default route"? Q: Why is the default route always chosen last in a set of routing tables? ----------------------------------------------------------------------------- Domain Name System (DNS) (Review) --------------------------------- DNS turns names into IP addresses. Not essential for a raw Internet connection; but, very, very useful. The Unix/Linux file /etc/resolv.conf ("man resolv.conf") contains your machine's domain name and the IP addresses of your DNS servers. Important Fact: DNS uses (almost exclusively) unreliable UDP, not reliable TCP, for queries and responses. The maximum size of a single UDP packet limits the number of DNS ROOT name servers to 13. * Linux commands for DNS testing: host and dig $ host idallen.ca idallen.ca has address 208.76.82.6 idallen.ca mail is handled by 0 idallen.ca. $ host idallen.ca ns1.totalchoicehosting.com # use this DNS server Using domain server: Name: ns1.totalchoicehosting.com Address: 64.246.50.105#53 idallen.ca has address 208.76.82.6 idallen.ca mail is handled by 0 idallen.ca. $ host -t txt idallen.ca idallen.ca descriptive text "v=spf1 ip4:66.11.175.96/30 ip4:66.11.173.142 a mx ptr a:cpu1808.adsl.bellglobal.com mx:idallen.org include:algonquincollege.com ?all" $ dig idallen.ca ; <<>> DiG 9.3.4 <<>> idallen.ca ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31955 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;idallen.ca. IN A ;; ANSWER SECTION: idallen.ca. 14382 IN A 208.76.82.6 ;; AUTHORITY SECTION: idallen.ca. 70604 IN NS ns2.totalchoicehosting.com. idallen.ca. 70604 IN NS ns1.totalchoicehosting.com. ;; ADDITIONAL SECTION: ns2.totalchoicehosting.com. 170671 IN A 65.254.32.122 ns1.totalchoicehosting.com. 170671 IN A 64.246.50.105 ;; Query time: 67 msec ;; SERVER: 192.168.9.254#53(192.168.9.254) ;; WHEN: Tue Oct 16 04:21:19 2007 ;; MSG SIZE rcvd: 134 $ dig @ns1.totalchoicehosting.com idallen.ca # use this DNS server ---------------------------------------------------------------------------- Q: What type of IP protocol does DNS use (most of the time)? Q: T/F The choice of DNS transport protocol means a DNS request is automatically retried if it fails. Q: Why are there only 13 ROOT name servers on the planet? Q: What Linux commands are used for doing DNS queries? Q: Give command-line examples of doing a DNS query for an NS or MX record ---------------------------------------------------------------------------- Testing - black box vs. white box, "behavioral" vs. "structural" ------- - I don't have time to read and test all your code; you have to do it http://www.scism.sbu.ac.uk/law/Section5/chap3/s5c3p23.html "White box testing is concerned only with testing the software product, it cannot guarantee that the complete specification has been implemented. Black box testing is concerned only with testing the specification, it cannot guarantee that all parts of the implementation have been tested. Thus black box testing is testing against the specification and will discover faults of omission, indicating that part of the specification has not been fulfilled. White box testing is testing against the implementation and will discover faults of commission, indicating that part of the implementation is faulty. In order to fully test a software product both black and white box testing are required." http://www.faqs.org/faqs/software-eng/testing-faq/section-13.html "One has to use a mixture of different methods so that they aren't hindered by the limitations of a particular one. Some call this "gray-box" or "translucent-box" test design, but others wish we'd stop talking about boxes altogether." ---------------------------------------------------------------------------- Q: what type of tests exercise every line of code, especially each of the exceptions? Q: what type of tests verify that the code matches the specifications? Q: What is the difference between white-box and black-box testing of a piece of code? Give the advantages and disadvantages of each method, especially with regard to testing the specification. Q: Which type of testing is most likely to discover code security flaws? ---------------------------------------------------------------------------- Coding a TCP Client ------------------- You have already coded a TCP echo server that receives bytes from a remote client and returns those bytes back to the client. We turn now to the code for that client. A TCP client has two functions. One function is to read from standard input (e.g. from your keyboard) and send the bytes read to the remote server; the other function is to read bytes from the remote server and put those bytes onto standard output (e.g. on your screen). How does a Unix/Linux program handle multiple simultaneous I/O streams? If your client process is blocked reading your keyboard, it can't read and display input coming from the server; if it is blocked reading from the server, it can't read and send characters coming from your keyboard. * Handling multiple simultaneous I/O streams Unix/Linux has three major solutions to handling concurrent I/O streams: fork(), select(), and threads A. fork() separate processes to handle each file descriptor - e.g. client forks into one process for keyboard, one for server socket - forking (duplicating process address space) is expensive - forked processes have limited means to communicate with each other - they don't share any address space; they are separate - you can use process signal()s to communicate simple events - you can explicitly set up inter-process communication sockets; but, then you again have the problem of how to manage reading/writing from two different places, which is why you used fork() in the first place! B. use one single process and the system call select() to listen to multiple file descriptors at the same time - one single process handles all the file descriptors - shared address spece - no need to fork() - more complex than a forking server C. run separate process threads We will code a fork()ing client - one process for handling standard input to server, another process for handing server to screen. * Coding the Looping Client - fork() version Create a TCP Client that reads bytes from standard input until EOF or error, sending the bytes to the remote server, and that simultaneously reads bytes from the remote server until EOF or error, sending the bytes received to standard output. Review the detailed pseudocode for any process that reads from one place and writes to another. The Client must fork() into two separate processes, each with one of these read/write loops. One process reads stdin and writes to the server; the other process reads from the server and writes to standard output. On EOF from the keyboard, the Client shuts down just the writing half of the server socket. On EOF from the server, the client kills the other fork()ed client process that is hung reading the keyboard. * Client code and modifications: Reference code downloaded from: http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/sock.html The client.c code is explained line-by-line in the above web page. - devise the pseudocode for client.c and the revised pseudocode for the converted client that forks and loops continually - reorganize the command line argument parsing in front of the socket code - keep the parsing code separate from the network code - add a check for a valid port number that is within range - replace the deprecated bzero() and bcopy() functions - note the use of the Unix network socket() and connect() in client.c - fix the error message to say what host and port failed - error messages must have four qualities (see programming_style.txt) - fix the prompt to appear on standard error - detect errors and EOF when reading standard input - use shutdown() to half-close the socket when finished writing to the server - more updates to do, see the upcoming assignment (Lab 3) See Notes: eof_handling.txt ---------------------------------------------------------------------------- Q: Give the detailed pseudocode for a forking two-process "echo" client that sends keyboard input to a remote TCP/IP server and receives the echo of the input back and displays it on the screen. Explain under what conditions one process needs to kill() the other process. Explain under what conditions one process needs to shut down the writing half of the socket that connects to the remote server. See Notes: eof_handling.txt Q: Describe the sequence of events that happens when your client receives EOF from the keyboard. How does that EOF result in the termination of the client and server processes, and in what order? See Notes: eof_handling.txt Q: Why must the client process reading your keyboard call shutdown(fd,SHUT_WR) on the server socket? Wouldn't calling close() on the socket also send EOF to the server? Why or why not? See Notes: eof_handling.txt Q: What issues can arise trying to send EOF in processes that fork()? See Notes: eof_handling.txt ---------------------------------------------------------------------------- Zero Tolerance for Buffer Overflows ----------------------------------- See Notes: buffer_overflows.txt Q: Why must Internet-facing programs avoid buffer overflows? Q: What gcc flag turns on local symbols and line numbers for gdb and valgrind? Q: What does "valgrind" do? Q: Will valgrind find all buffer overflow errors? Q: T/F Like in Java, when you have a buffer overflow in C language the program stops on the line causing the buffer overflow. Aside: On choosing buffer sizes ------------------------------- When deciding how much buffer space an Internet client or server should allow for incoming request lines, you have to weigh memory use against functionality. Here's an excerpt from an RFC extending the SMTP protocol, which originally specified a maximum buffer of just 512 bytes: http://tools.ietf.org/html/rfc1869 http://www.rfc-editor.org/rfc/rfc1869.txt 4.1.2. Maximum command line length This specification extends the SMTP MAIL FROM and RCPT TO to allow additional parameters and parameter values. It is possible that the MAIL FROM and RCPT TO lines that result will exceed the 512 character limit on command line length imposed by RFC 821. This limit is hereby amended to only apply to command lines without any parameters. Each specification that defines new MAIL FROM or RCPT TO parameters must also specify maximum parameter value lengths for each parameter so that implementors of some set of extensions know how much buffer space must be allocated. The maximum command length that must be supported by an SMTP implementation with extensions is 512 plus the sum of all the maximum parameter lengths for all the extensions supported. Reading from Network sockets ---------------------------- A program can detect when a write() or send() call doesn't write all the bytes. It can loop until all the bytes are sent, perhaps using a cover function such as sendall() (see earlier notes for sendall()). How about the reverse - reading from a network socket? Simple read() or recv() calls on Internet-connected sockets are not guaranteed to return data in the same quantities that remote applications send it. Just because a remote application writes 500 bytes into a socket doesn't mean that your next read() will return those 500 bytes. The data may be incomplete, or that 500 bytes may be concatenated with another 500 bytes (or less, or more) from a following write() on the same connection. How does your program know that it has *read* all the bytes that the remote client has sent? Answer: Your program can't know, unless your application provides some assistance. You have two choices: A. The sending program has to send a count of the number of bytes (probably one of the very first things it sends), and the receiving program has to loop to make sure it reads that many bytes. With this solution, your application has to first "encapsulate" the data it sends with an application-specific header indicating how much data is being sent. B. Your application needs to send some trailing flag in the data stream indicating that unit of data is complete, and the receiving program has to loop to make sure it reads all the bytes until it sees the flag before processing. The flag has to be some byte or combination of bytes that never appears inside the data itself. For a single-line chat server, you might pick a newline character to indicate that "a full line of text" has been read. Note that if you encapsulate and send a header containing a size field, the receiving program may still have to loop a bit to get all the bytes that make up the bytes of the number indicating the size, unless the size is just a single byte! In all of this, you also have to handle the error and EOF cases where the data stream ends unexpectedly. ---------------------------------------------------------------------------- Q: T/F We can implement a readall() function, similar to our sendall() function, that simply loops until it has read all the bytes from a socket. Then we know we have all the data sent by the client. Q: What are the two main methods that allow an application to communicate that "all" the data has been sent over a socket? Q: T/F When reading from a network socket, each read returns all the bytes sent by the remote client. Q: How many of the above loops are coded in an echo server (server2.c)? Q: How many of the above loops are coded in a client (client.c) that reads from your keyboard and writes to a server, and reads from the server and writes to your screen?