------------------------- Week 02 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Midterm test dates are posted on the Course Home Page. Lab #01 is available in the Class Notes area. Comments on lab work - Read the whole question before starting to answer it! - The hints that make life easier are at the end of the question. Review ------ - programming_style.txt - style counts - four-part error messages - difference between I/O errors and EOF - four new networking system calls for a TCP/IP server - endian-ness of the Network - htonl, htons, etc. - correct use of errno, perror() and error() Using the GNU error() function: man 3 error Client/Server Programming ------------------------- References: Diagram: http://community.borland.com/article/0,1410,26022,00.html Sockets Tutorial: http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/sock.html (alternate: http://www.linuxhowtos.org/C_C++/socket.htm ) Sockets programming: http://beej.us/guide/bgnet/ Sample code: http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/server2.c (alternate: http://www.linuxhowtos.org/data/6/server2.c ) FAQ: http://www.faqs.org/faqs/unix-faq/socket/ TCP/IP Slides: http://www.cs.rpi.edu/~hollingd/netprog.2003/ New four Unix networking system calls for servers: socket,bind,listen,accept Client/Server Diagram: http://community.borland.com/article/0,1410,26022,00.html - this toy client/server pair only reads/writes one line from a client - Google for more TCP/IP client server socket examples and tutorials: - http://www.perl.com/doc/manual/html/pod/perlipc.html - http://beej.us/guide/bgnet/ To see active TCP connections: netstat -natp the equivalence of read() and recv(), write() and send() for sockets: For socket programming, you may see recv() used instead of read() and send() instead of write(). Both work equally well; recv() and send() allow socket options to be passed using an extra parameter. Warning: read/write work on any type of output (sockets, files, pipes, devices, etc.) while recv/send *only* work on network sockets. - man 2 recv - man 2 send If you don't set any special TCP/IP flags in recv() or send(), the system calls recv() and read() are the same/equivalent for accessing sockets, as are the syscalls send() and write(). You can't use the socket syscalls recv() or send() on file descriptors that are *not* sockets (even if the TCP/IP flags are zero); using read() and write() works for both sockets and ordinary files. Socket programming is similar to low-level Unix file I/O using open/read/write/close - the Unix socket() and accept() system calls return small integer file descriptors, just as open() does - socket descriptors are just like file descriptors - you can use them with read() and write() (many socket programs use the equivalent recv() and send()) - see the simple non-forking sockets server examples: http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/sock.html (alternate: http://www.linuxhowtos.org/C_C++/socket.htm ) http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/server2.c (alternate: http://www.linuxhowtos.org/data/6/server.c ) - read the explanation of the code in the above socket tutorial - note that you should replace the deprecated bzero() with memset() - see "man bstring" Sockets are not like files: I/O may not complete - When writing to a network socket, the write() may not write all the bytes you requested; you need to loop to keep trying to write the bytes that were not sent the first time. - When reading from a network socket, the number of bytes read may not match the number sent by the remote client. Buffering and packetization may split up the client data into arbitrary chunks. You need a way of knowing when you have received all of the data sent by the remote client. - See the "sendall()" cover function from beej. The server2.c sample code ------------------------- For our base code we use (with credit) this forking server2.c code: http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/server2.c (alternate: http://www.linuxhowtos.org/data/6/server2.c ) - a fork()ing server that handles multiple connections - the child only reads one single line from a connection, then exits - this code does not correctly detect or handle EOF - this code inefficiently uses bzero() - must be fixed The usual order of four network system calls to initialize a TCP/IP server: - 1. socket(), 2. bind(), 3. listen(), 4. accept() - most TCP/IP servers loop calling accept() to receive multiple connections - server may fork() separate child processes to deal with each connection - each connection may loop reading/writing the accepted socket, to read/write multiple lines from/to the incoming connection - the rpi.edu "server.c" only accepts one connection and then exits - the rpi.edu "server2.c" loops and fork()s, accepting many connections - each connection reads one line and exits; it does not itself loop - this server code does not correctly detect or handle EOF - we must modify server2.c to read/write multiple lines for each connection - add a loop in the child funcion dostuff() (rename this function!) - remember to check for EOF and error after read() or recv() - remember to check for error after write() or send() - recode the function not to need to bzero() the whole buffer! Coding a looping echo-style TCP server -------------------------------------- The server2.c code used as a starting point for our TCP server process only reads and writes single lines from a client. We must fix the server to write *all* the lines received from a client. Writing a looping process that reads one fd and writes to another fd: - know before you code: what are the terminating conditions for the loop? - for clients/servers you need to terminate on these three conditions: - when reading the fd: (1) break loop on errors, and (2) break loop on EOF - when writing the fd: (3) break loop on errors - start coding a loop with "while(1)" - don't worry about putting any conditions in the while loop test at the top until you're done the loop. Perhaps the loop will be cleaner if each of the terminating conditions uses "break" in the body of the loop and you keep "while(1)" at the top? /* this loop has three terminating conditions, as given above */ WHILE 1 numread = CALL READ to get some data into a fixed-size buffer IF read error THEN print error message and break loop /* (1) */ IF end-of-file THEN print EOF message and break loop /* (2) */ ASSERT( numread > 0 ) /* man 3 assert */ numwrite = WRITE from buffer the numread bytes of data that was read IF write error THEN print error message and break loop /* (3) */ END WHILE The "WRITE from buffer" should use the sendall() function, to make sure all the bytes are written. Only write the number of bytes "numbytes" that were actuall read from the client; don't write the whole buffer back to the client! Q: Write the detailed pseudocode for any process that wishes to read data from one place and write it to another place. - what are the three terminating conditions for the loop? Q: T/F "EOF" is an error condition that should be followed by perror() Writing to Network sockets - sendall.c -------------------------------------- Unlike writing to files, writes to sockets can be incomplete! The system may not write all the bytes you asked - the write() or send() will return fewer bytes than the size you asked to send. You need to loop to send the remaining bytes. Fetch the sendall() function: - http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#sendall - can "*len" be replaced by "len" in this function? - BUG: what value is returned by sendall() if *len <= 0 ? - why can't sendall() just return the number of bytes written? - if you want to generalize sendall() to write to a file descriptor that is not a socket, you must ensure that sendall() uses write() and not send(). send() only works for sockets; write() works for anything (including sockets, pipes, fifos, files, etc.) Q: T/F writes to network sockets may only write some of the requested bytes Q: T/F the sendall() function may write some bytes but still return -1 indicating an error Q: why does the sendall() function need both a return value and a pass-by-reference number of bytes written? Q: under what circumstances will sendall() indicate a positive number of bytes written but still return -1 indicating failure? Note: Reading from network sockets can also be incomplete! How do you know you have received "everything" sent by a client? We'll talk about handling that later. Coding: assert() and VARIADIC functions --------------------------------------- Checking for internal consistency using the assert() macro: http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_28.html "When you're writing a program, it's often a good idea to put in checks at strategic places for "impossible" errors or violations of basic assumptions. These checks are helpful in debugging problems due to misunderstandings between different parts of the program." - mentions the assert() macro that will abort your program, printing the file and line number where the abort happened - use assert() ("man assert") to find bugs in your program - e.g. assert( numread > 0 ); Q: What purpose does using the assert() macro have in a program? Do you know how to write va_list (variadic, varargs) functions? - VARIADIC/VARARGS functions take multiple arguments (e.g. like printf) - http://www.gnu.org/software/libc/manual/html_node/Variadic-Functions.html ========================================================================== Interim submission for Lab #1 required next week - see Lab #1. Notes to read: programming_style.txt - keep lines less than 80 characters - indentation is critical deep_indentation.txt - fix deeply indented code buffer_overflows.txt - handling binary data safely header_files.txt - know what goes in header files, and what #includes too makefiles.txt - how to write a minimal Makefile, using the defaults Read the whole question before starting to answer it! - The hints that make life easier are at the end of the question. Error messages should only show information from the command line if that information is relevant to the cause of the error: - errors from bind() and connect() can depend on command line arguments - the error messages must include the user's supplied arguments - errors from socket() and listen() have nothing to do with the command line - the error messages do not need to show command line arguments Q: Which of the socket, bind, listen, accept, fork, read, write syscalls generate errors that may be related to the command line arguments? (Which syscalls require that command line arguments be echoed to the user as part of the error message?) ========================================================================== Debugging C language using gdb ------------------------------ "If you have eight hours to cut down a tree, it is best to spend six hours sharpening your axe and then two hours cutting down the tree." Google search: gdb tutorial - gdb reference card: http://sources.redhat.com/gdb/download/onlinedocs/refcard.ps.gz - the full manual http://www.gnu.org/software/gdb/ http://www.gnu.org/software/gdb/documentation/ http://sourceware.org/gdb/current/onlinedocs/gdb_toc.html http://sources.redhat.com/gdb/download/onlinedocs/gdb.html - debugging multi-process programs (fork): http://sourceware.org/gdb/current/onlinedocs/gdb_5.html#SEC28 http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/gdb/processes.html http://www.delorie.com/gnu/docs/gdb/gdb_26.html http://www.cs.toronto.edu/~maclean/csc209/ddd-gdb-children.html - gdb normally follows the parent; to debug a child process (gdb) set follow-fork-mode child You can put these kinds of init commands in file .gdbinit Q: If you overflow an auto variable buffer in main() and then return, your program faults and dies. If you call exit(), the program doesn't die. In both cases, the program is terminating. Why does "return" fault and the other exit() cleanly? Sending EOF from the keyboard - ^D ---------------------------------- The truth about keyboard ^D and end-of-file (EOF) Typing the character ^D at your keyboard does not actually signal EOF to a process. What ^D does is act somewhat like pushing the RETURN key, in that whatever characters have been typed since the last RETURN or ^D are sent to the receiving process that is reading your keyboard. The ^D character itself is never sent, it just tells the terminal driver to "send whatever characters you have buffered, right now". Unlike pushing the RETURN key, using ^D does not send a newline - it simply flushes the characters. If there are no characters to flush, e.g. ^D is typed right after starting your process or right after pushing RETURN or typing a previous ^D, then the ^D sends zero characters to the process. When a process reads zero characters, it interprets that to mean EOF. If you type a few characters on your keyboard and then type ^D instead of RETURN, the ^D sends those few characters to the process without a newline on the end. (The ^D character itself is never sent.) If you then immediately type a second ^D, the second ^D sends zero characters, and the process interprets that read of zero characters as EOF. The ^D means "send now", and if you send zero bytes, that's interpreted as EOF. ============================================================================ Q: When can write() be used in place of send() in accessing a socket fd? (see "man 2 send") (Note: You cannot use send() in place of write(), unless you are writing to a socket!) Q: When can read() and recv() be interchanged in accessing a socket fd? Q: Can recv() and send() be used on non-sockets? Q: In one column list in flow-chart form the Unix system calls made to set up a TCP/IP server that loops accepting clients, forking children that each read one packet, write one packet, and exit. In a parallel column list the system calls used in a TCP/IP client that sends one packet and receives one packet then exits. Connect the two columnts with arrows, showing the relationship of the system calls and the direction of data travel. Q: what are the basic inputs and return values of the Unix syscalls: socket,bind,listen,accept,read/recv,write/send,close Q: What is the purpose/inputs/return of the socket() syscall? Q: What is the purpose/inputs/return of the bind() syscall? Q: What is the purpose/inputs/return of the listen() syscall? Q: What is the meaning of the small integer second parameter to listen()? Q: What is the purpose/inputs/return of the accept() syscall? Q: T/F the socket() and accept() syscalls return file descriptors that can be used directly with standard I/O functions fread/fwrite/fclose Q: T/F the successful accept() system call returns a socket file descriptor that is a copy of the socket file descriptor that is its first argument Q: Under what circumstances can one omit a call to bind() a socket? What happens when a server calls accept() using such an unbound socket? What happens when a client calls connect() using such an unbound socket? Give an example of a client application that uses unbound sockets. Ref: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#bind Ref: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#connect Q: What happens if you forget to call bind() before you call listen() in a server? Does your server program fail to start up? Can clients connect to a server with an unbound socket? (Why/how, or why not?) Ref: http://www.cs.rpi.edu/~hollingd/netprog.2003/code/simptcp/server.c Q: True/False - after a call to accept() you have *two different* open socket file descriptors. Q: True/False - if you close the socket descriptor that is the return value from an accept(), you also close the original socket() descriptor (and vice-versa - they are the same descriptor). Q: You want a server to accept only a single client: True/False - after the accept() call, you can close the original socket file descriptor (the first argument to accept()) and use only the socket descriptor returned by the accept() call. Q: Give the detailed pseudocode for a forking "echo" server that creates a new process for each new client connection and then reads and echoes incoming data back to the client. ============================================================================== The Internet ------------ http://en.wikipedia.org/wiki/Internet "The Internet is a worldwide, publicly accessible series of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). It is a "network of networks" that consists of millions of smaller domestic, academic, business, and government networks, which together carry various information and services, such as electronic mail, online chat, file transfer, and the interlinked Web pages and other documents of the World Wide Web." - the Internet is not just the WWW (HTTP) - HTTP is just one of many, many Internet protocols! - but Algonquin College blocks most non-HTTP Internet traffic - in particular, the SMTP port (25) is blocked to external sites - blocks are "drop packet", not "refuse packet" types; they time out - Internet not developed as a pay-per-view or proprietary system - standards-based vs. product-based - based on defined protocols, not on vendor products or implementations - nobody pays license fees to use TCP/IP, SMTP, HTTP, etc. - Tim Berners-Lee doesn't get royalties for your web site - why do companies still write web pages that only work in one browser? - e.g. Algonquin Blackboard - http://www.anybrowser.org/campaign/ - the mistake of designing for a vendor's product, not for an international standard protocol Role of Unix (now Linux or BSD) and the Internet: ------------------------------------------------ - WWW slashes are "forward" slashes because the WWW grew up on open-source Unix machines. (DOS/Windows came much later, and was closed-source.) - text-based Internet protocols pre-date XML (everything is text in Unix) - Unix was full of tools to deal with text and text files - an "ethereal" or "netcat" text dump of most Internet protocols is often very readable (no binary junk) - Be aware of the history and importance of Open Source in the development of the Internet and its protocols (e.g. RFC). The Internet could not have evolved under a closed-source, pay-per-view business model. (Don't let it head that way!) - Internet development was Open Source: - "FLOSS": Free/Libre Open Source Software (or "FOSS" in the USA) - open-source discussions occur with source code samples Is the Internet smart about content? ----------------------------------- - The Internet is dumb. It wasn't designed to give priority to different owners of packet traffic. The intelligence is "at the edges" of the net. - Some say you could implement the Internet using two cans and a string; or, even using carrier pigeons: - pigeons: http://tools.ietf.org/html/1149 (1 April 1990) - pigeons: http://www.blug.linux.no/rfc1149/ Net Neutrality - not for long? -------------- - Like the downtown streets at rush hour, the Internet doesn't (yet) pass traffic based on how much money you have. You can't get higher priority by paying more; though, this may change (on the Internet) in the next year or two if the backbone carries have their way. - http://www.digital-copyright.ca/taxonomy/term/396 * AT&T blocks Pearl Jam's Bush slam : Pearl Jam calls for Net Neutrality A Salon article discusses how AT&T unilaterally censored political speech at a Pearl Jam concert: The band says the company's actions highlight the need for action on "network neutrality" -- the fight for regulations prohibiting broadband firms from making decisions about what content is and is not allowed on their networks. AT&T is currently fighting network neutrality, helping the NSA spy on Americans, and developing a way for Hollywood to police the Internet. * Rogers Must Come Clean on Traffic Shaping: Michael Geist's weekly Law Bytes column (Toronto Star version, Homepage version) focuses on Rogers, a leading Canadian ISP, actively engaging in "traffic shaping", a process that limits the amount of bandwidth available for certain applications. Although this was initially limited to peer-to-peer file sharing applications, there is mounting speculation that the practice may be affecting basic functionality such as email and the use of virtual private networks. The Internet - who owns it? who controls it? ------------ - IP and port address space is coordinated by ICANN/IANA - Internet Corporation for Assigned Names and Numbers: icann.org - Internet Assigned Numbers Authority http://www.iana.org/ - Internet Engineering Task Force (IETF): http://www.ietf.org/ - Motto: "Rough consensus and running code." "When I was studying Physics the quickest way to end an argument was to show the explanation in mathematics (albeit a lot of handwaving mathematics!). Most software developers on the otherhand do not grok math, however they surely do grok code. Therefore if you could explain your arguments through code then you would have improved your odds of getting your message through." http://www.manageability.org/blog/stuff/rest-explained-in-code/view "Be liberal in what you accept, and conservative in what you send" (Jon Postel, TCP/IP developer) * BUT: "If we were all conservative in what we do, then we wouldn't do much that is new, or different. This would seem to retard progress. Of course, the same would be true in protocols so perhaps we need a "where possible" qualifier." http://www.aaronsw.com/weblog/000776 - Internet standards: evolved from the ARPAnet Request for Comment - RFC http://tools.ietf.org/html/ IP: http://tools.ietf.org/html/791 (45 pages) UDP: http://tools.ietf.org/html/768 (3 pages on top of IP) TCP: http://tools.ietf.org/html/793 (85 pages on top of IP) SMTP: http://tools.ietf.org/html/2821 (79 pages on top of TCP) TCP tutorial: http://tools.ietf.org/html/1180 * Who controls handing out the IP numbers and port numbers? - the Internet Corporation for Assigned Names and Numbers (ICANN) through its operating unit the Internet Assigned Numbers Authority (IANA) "Dedicated to preserving the central coordinating functions of the global Internet for the public good." ICANN: http://www.icann.org/ IANA: http://www.iana.org/ - IANA delegates to a few Regional Internet Registries (RIRs) to distribute the large blocks of IP addresses http://www.iana.org/ipaddress/ip-addresses.htm http://www.iana.org/assignments/ipv4-address-space - e.g. American Registry for Internet Numbers (ARIN) IP address list http://www.arin.net/ - special addresses (historical and current) are documented in RFC3330 http://tools.ietf.org/html/3330 - note: hosts on this net are allocated: 0.0.0.0/8 - note the important RFC1918 private address space: 10.0.0.0 - 10.255.255.255 (10/8 prefix) 172.16.0.0 - 172.31.255.255 (172.16/12 prefix) 192.168.0.0 - 192.168.255.255 (192.168/16 prefix) "the Internet does not inherently protect against abuse of these addresses; if you expect (for instance) that all packets from the 10.0.0.0/8 block originate within your subnet, all border routers should filter such packets that originate from elsewhere. Attacks have been mounted that depend on the unexpected use of some of these addresses." - IANA TCP/UDP port list (see RFC4340 for the three big divisions): http://www.iana.org/assignments/port-numbers - Well Known Ports are those from 0 through 1023 - only Unix privileged (root) programs can bind to these ports - Registered Ports are those from 1024 through 49151 - Dynamic and/or Private Ports are those from 49152 through 65535 - note that 65536 - 16384 = 49152 (2**16 - 2**14 = 49152) - a shorter Unix/Linux specific copy of this file is kept in /etc/services - to register a new port, see [RFC4340], Section 19.9 http://tools.ietf.org/html/rfc4340#section-19.9 ============================================================================ Q: T/F the Internet is patented; companies pay royalties to use the WWW and IP protocols Q: T/F you can pay more to have your data packets given priority on the global Internet Q: What organization is the ultimate authority on all IP addresses and ports? Give the full name. Q: Which organization is delegated to manage IP addresses in North America? Give the full name. Q: What does "Be liberal in what you accept, and conservative in what you send" mean? Q: What does the acronym "FLOSS" mean? Q: What do the initials RFC mean with regard to Internet standards documents? Q: Give the three RFC1918 private address space blocks and their masks Q: What is the last IP address in the RFC1918 block 172.16.0.0/12 ? Q: Is 172.15.0.0 a RFC1918 private address? Q: Is 172.17.0.0 a RFC1918 private address? Q: What is the last (highest) private address in the RFC1918 10.0.0.0 block? Q: What is the last (highest) private address in the RFC1918 172.16.0.0 block? Q: What is the last (highest) private address in the RFC1918 192.168.0.0 block? Q: T/F the Internet will not route RFC1918 private addresses Q: T/F Special address block 0.0.0.0 is reserved for hosts on your local network. [see RFC3330] Q: T/F IP address 0.0.0.0 is not a valid address. [see RFC3330] Q: What Unix/Linux file is used to turn "smtp" into "25" when you do $ telnet localhost smtp $ nc -v localhost smtp Q: Name and give the port ranges of the three RFC4340 divisions of ports ( http://tools.ietf.org/html/rfc4340#section-19.9 ) Q: Which port numbers can only be bound to by the super-user on Unix/Linux? What is the IANA name for this reserved-for-super-user port range? (Not all operating systems restrict access to these low-numbered ports.) Ref: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#bind See the paragraph: "Another thing to watch out for when calling bind(): don't go underboard with your port numbers. ..."