------------------------- Week 01 Notes for CST8165 ------------------------- -Ian! D. Allen - idallen@idallen.ca - www.idallen.com Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) CS Department News http://www.algonquincollege.com/sat/cs/news.htm Home Page and Course Outline Find the Winter 2008 CST8165 course home page via Algonquin Blackboard or via: http://teaching.idallen.com/cst8165/08w/ Make sure you find the page for this term, not last term! Bookmark it. I do not keep any course files on Algonquin Blackboard. Read the course home page carefully, including the parts about plagiarism and course notes. Note the important dates. Write down *on paper* the location of the Alternate Web Notes. Read the Course Outline, including the parts about tests and lab attendance. Ensure that you are registered in both the lecture section of the course (010) and Lab section (also 010). Find and look at my Timetable. Know how to set up an office appointment with me by email. Review the Course Outline: cst8165-08w.pdf Midterm test dates are posted on the Course Home Page. EMail and Web-based EMail archives EMail is a critical component of course delivery for this course. Forward your Algonquin email (see the link on the course home page). Test to make sure that your forwarded Algonquin email works! Send yourself a test message. You must have a working Algonquin EMail address for this course (that you can forward elsewhere). You must read your course email regularly, either in your mailbox or via the web archives. For class online discussion I'm considering using a Course Mailing List, available as a link on the top left of the Course Home Page. Please post questions related to course content to this mailing list. (Please answer the questions if you know the answer!) Do not send me private email questions about course content; post course questions to the mailing list so I can answer them there. Attendance and Attention Attendance is also critical to course success. If you know the material and don't need to come to classes, ask for a Prior Learning Assessment. If you paid to be here, please be here. If you are in class, shut your laptop and pay attention to your lecturer. The person at the front of the room cannot compete with the entire Internet for your attention - he doesn't have the budget. If you're bored or falling alseep, take notes. Lab attendance is recorded - make sure you're signed in each week. I often give out a small lab exercise to submit as proof that you attended the lab that week. Taking Notes You will need to take notes in class. Not everything I say ends up in these online files. Passing the information through your body onto paper helps you remember it, even if you never read the notes later. If you have a question about course content, the first thing I will ask is to see your notes, to see what you wrote down about the topic. Often the answer is there! Textbook There is no assigned textbook. The Internet is your (cheap) friend. I will post URL references for much of the course material. Workload The overall term workload sometimes overwhelms students who try to leave everything to the last minute. You need to put in approximately an extra hour per day, per course, to keep up. There aren't enough hours in a day to catch up in mid-term. Timeliness Late assignments are penalized, usually resulting in a mark of zero. The due date for an assignment is given in the assignment. Read each assignment to know the due date. Preparation Lab time is precious. Most lab exercises are time-limited and will require you to have done advance preparation. If you haven't read the material and done the preparation, you won't finish on time. Standards Like any company, this course has standards for its documents. Assignments must adhere to the published standards. See the course web page for the Standards document. Linux Working Environment The material taught and used in this course is intended to be portable; you can use most any Linux machine to write and test your code. The final test run must be on the machine given in the assignment specification. This might be one of the Lab Linux machines, or it might be a different machine. Read carefully. Off-hours Lab Access You are encouraged to use the lab outside of assigned lab hours. The hours of operation are posted on the door. You may ask other instructors if you may work quietly at the back of their classes. Remote Access to the T127 Linux Lab You can access the lab machines remotely; you don't need to be at school. For Windows users, the "ssh" protocol is available in such programs as "ewan" and "PuTTY", which you can download for free from the Internet. Do a Google search for: putty download You need to use the Algonquin VPN to get to the T127 Linux Lab; but, be aware that this VPN is not a split-tunnel and *all* your Internet traffic will pass through it, which will slow down the rest of your home network severely as long as the VPN is runnning. http://algonquincollege.com/its/ http://algonquincollege.com/its/support/connecthome/index.htm Running Linux at Home You may download (and optionally install) most any Linux distribution at home for free. Most any distribution will be support the material taught in this course. Many Linux distributions (e.g. Knoppix, Mandrake Live, Ubuntu) will boot directly from a CDROM and run entirely in memory, bypassing the need to do any disk installation at all. Be aware that when you shut down such an in-memory system, everything is lost - you must save any important files on real disk before you shut down. Linux User Groups See the Ottawa Canada Linux User Group (OCLUG): http://www.oclug.on.ca/ They meet on the first Tuesday of every month. Plagiarism You may not copy code from anywhere else without clearing the copying with me, in writing or by email, first. If your code contains enough unique lines found in other files, I am required to inquire whether you are the author of this code. If I authorize copying, you must attribute the source of material you use that isn't yours. You earn marks for the new material that you write, not code that comes from other sources. Check that you can login to machines in the T127 Linux Lab: Your initial password is/was: just2day Change it. Remember it. I cannot reset your passwords; see Dick Campbell down the hall. From: Richard Campbell Subject: Linux lab status Date: Fri, 04 Jan 2008 15:58:11 -0500 Disk block quota on the /home file system for all existing users has been changed to 500 MB. The student's login name is the same as the Windows Network login name or e-mail name. The initial password for the new student accounts is 'just2day'. Please have the students use either the 'yppasswd' or 'passwd' command to change the password. The 'yppasswd' command uses up to eight characters. The 'passwd' command can use more characters and has better encryption. The password must be changed. The systems are configured for password expiry. For new accounts, if the initial password has not been changed by January 21, 2008, it will be forced on login. Previous student accounts may also experience password expiry. In this case, the user will be prompted for a new password at login. Later/Dick, -- Richard Campbell Information Technology Services Algonquin College (613)727-4723 extension 3459 Review VIM: You will need to know how to use the VIM text editor to modify files under Unix/Linux. Review the VIM tutorial mentioned in this Notes file: vi_basics.txt The VI (VIM) Editor - Basics * At most Linux shell prompts you can type: vimtutor A copy of the VIM tutorial file is in the Course Notes. See file vi_basics.txt in the course notes for details. See the vi_refcard reference card in the course notes. A Program With A Bug: stdxxx.c++.txt $ wget http://teaching.idallen.com/cst8165/08f/notes/stdxxx.c++.txt $ g++ -g stdxxx.c++.txt $ gdb a.out > run Guidelines for lab work - Read the whole question before starting to answer any of it: - the hints on solutions are at the *end* of the question - Don't use casts to solve C declaration problems; fix the declaration: - casts hide errors - How do you search for text in a web page? - What is "wget" and how does it work? - When inserting the SO_REUSEADDR code, how do you know what arguments to pass? Do you define your own variables or use the variables already in your code? ============================================================================= The structure of your code matters! ---------------------------------- - see the opt_iocc files in the course notes: http://teaching.idallen.com/cst8165/08w/notes/indexcgi.cgi http://www.ioccc.org/years-spoiler.html Client/Server programming ------------------------ Background: Know the low-level Unix system calls: - man 2 open (returns a small integer file descriptor) - unit 0 is already open in your program as standard input - unit 1 is already open in your program as standard output - unit 2 is already open in your program as standard error - unit 3 is usually the next integer returned by open() in your program - man 2 read - man 2 write - man 2 close The low level Unix system calls "open()", "read()", "write()", and "close()": - have no buffering (are not like stdio fopen/fgets/fread/fwrite/fclose) - return -1 on error and set errno (which can be used by perror()) - errno is only set after a system call *fails*, not when it succeeds You should use perror() or error() to print errno after a system call fails - man 3 perror and man 3 error - errno is only set after a system call *fails*, not when it succeeds - you must only call error() or perror() if the system call *fails* A successful system call does *NOT* clear or set errno to zero! - you cannot test errno to know if a system call failed - errno is only set after a system call *fails*, not when it succeeds A Unix read() returns zero (zero bytes read) on end-of-file (EOF) - you must not use the descriptor after EOF - the contents of the read() buffer are undefined after EOF; don't use it - EOF is not an error - errno is not set - never call perror() after EOF ============================================================================== Q: T/F the output of perror() appears on standard error, not standard output Q: T/F after a successful system call, perror() prints nothing Q: T/F when most Unix syscalls fail, the return value is zero Q: T/F when most Unix syscalls fail, the external global errno is set to -1 Q: T/F on error, the open() syscall returns zero Q: T/F on error, the read() syscall returns zero Q: T/F on error, the write() syscall returns zero Q: T/F on EOF, perror() prints "end of file" Q: T/F after a successful fork() system call, the parent process receives a non-zero child pid Q: T/F after a successful fork() system call, the child process receives a non-zero parent pid Q: what IP address is this (as a dotted quad): int ipaddr = -1; ? Q: T/F the opposite of x > 0 is x < 0 ? Q: T/F if(x>0) is the same as if(!(x<0)) ? Q: T/F usually the fd to be returned by the first call to socket() or open() in your Unix program will be fd 3 (why or why not?) Q: why doesn't the first call to socket() or open() in a Unix program return file descriptor 1? Q: what is the small integer value usually returned by the first successful call to accept() in a TCP/IP server program? (Hint: accept() is called *after* socket()) Helpful code: ------------ a. printf size You can printf exactly 9 bytes from a buffer (no \0 needed) using: printf("%.9s",buf); // print only 9 bytes from buf AND it gets better if you use '*' instead of 9 (more useful in this case): n = read(fd,buf, .... ); ... printf("%.*s",n,buf); // the "*" means pick up the current value of "n" This kind of printf can be useful for buffers that don't have \0 in them; but, you can't use printf with binary data (since printf stops on \0). (You must use fread() and fwrite() to handle binary data correctly.) You can also output n bytes in a buffer directly using write(fd,buf,n). - standard input (usually your keyboard) is Unix fd 0. - standard output (usually your screen) is Unix fd 1. - standard error (usually your screen) is Unix fd 2. - the first unit you open yourself in your program is usually fd 3. You can use read() and write() safely with binary data. b. buffer size Never do this: char buf[256]; ... read(fd,buf,256); Do this: char buf[OUT_BUFSIZE]; ... read(fd,buf,sizeof(buf)) Buffer sizes must be set in only *one* place for easy maintenance. Client/Server Programming ------------------------- References: Diagram: http://community.borland.com/article/0,1410,26022,00.html Sockets Tutorial: http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/sock.html (alternate: http://www.linuxhowtos.org/C_C++/socket.htm ) Sockets programming: http://beej.us/guide/bgnet/ Sample code: http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/server2.c (alternate: http://www.linuxhowtos.org/data/6/server2.c ) FAQ: http://www.faqs.org/faqs/unix-faq/socket/ New four Unix networking system calls for servers: socket,bind,listen,accept - man 2 socket - man 2 bind - man 2 listen - man 2 accept Sending byte data on the network: Big Endian / Little Endian: - what does the function call htons(portno) do? - it puts the short integer "portno" into network byte order - http://www.cs.rpi.edu/academics/courses/fall96/sysprog/sockets/byteorder.html - http://www.netrino.com/Publications/Glossary/Endianness.php - http://www.rdrop.com/~cary/html/endian_faq.html - http://www.unixpapa.com/incnote/byteorder.html - "network byte order" is Big Endian (send the most significant byte first) - Motorola 680x0, mainframe, and Sun Sparc hardware are big-endian - Intel/AMD x86 hardware (e.g. your PC) is little-endian - little-endian hardware incurs a byte-swap penalty handling network traffic Q: What do htons()/htonl() do and why are they necessary? Q: T/F "network byte order" is Big Endian Q: T/F a Big Endian processor stores the Big End (most significant byte) of a number in the first (lowest) memory location Q: T/F a Little Endian processor sends the Little End (least significant byte) of a number first over a byte-stream communications channel Q: in a memory dump that shows bytes numbered in ascending order from left-to-right on the page, which Endian order shows multi-byte quantities as written "backwards" ? Unix read/recv and write/send system call return values: - for low-level I/O syscalls such as read() and write() that return an integer: - a return of less than zero means an error - the error reason is put in errno; use perror() or error() to print it - a return of zero bytes means EOF when reading via read() or recv() - no more reading can be done after EOF is seen - the contents of the read() buffer are undefined after EOF; don't use it - EOF is not an error - errno is not set - do not call perror() after EOF, EOF is not an error - a return of zero means nothing was written when writing with write/send - this is not an error: you may need to loop to write everything - do not call perror() after writing zero bytes; try again - see the sendall() function below under "writing to network sockets" - a return of > 0 means you did read or write (some) of the data - you may not have read or written *all* of the data! - see the sendall() function below under "writing to network sockets" - EOF is not an error - never call perror() after seeing EOF Note that a *successful* Unix system call may or may not change errno: - see "man 3 errno" - errno is only set for sure after a system call *fails* - errno is *undefined* after a successful syscall - Thus, you cannot test errno to know if a system call failed - Thus, the perror() function is only usable on the most recent syscall. - If you execute other syscalls (e.g. using printf()), they may overwrite errno and you will lose the preceding syscall error. - The following perror() is incorrect, since printf() may overwrite errno: n = write(...); /* the syscall we want to check */ printf("%d bytes read\n", n); /* another successful syscall */ if ( n < 0 ) perror("write failed"); /* WRONG CHECK OF ERRNO */ - Read the NOTES section of "man errno" for how to save/restore errno across a call to another system call, e.g. across printf() Q: Why can't I use a printf() before calling perror? Pedantic Coding --------------- We used htons(portno) but not htonl(INADDR_ANY) - why? - Look at: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#bind and find the paragraph starting "If you are into noticing little things, you might have seen that I didn't put INADDR_ANY into Network Byte Order! Naughty me.". Read the fix; fix your own code. - If you don't fix this, then when you later use some other value than INADDR_ANY here, your code will break. The code is wrong; fix it now! Q: Why isn't the short int AF_INET put into network byte order? my_addr.sin_family = AF_INET; // host byte order serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); // long, network byte order my_addr.sin_port = htons(MYPORT); // short, network byte order - "man bind" refers us to "man 7 ip" which contains these lines: sa_family_t sin_family; /* address family: AF_INET */ u_int32_t s_addr; /* address in network byte order */ u_int16_t sin_port; /* port in network byte order */ "Note that the address and the port are always stored in network byte order. In particular, this means that you need to call htons(3) on the number that is assigned to a port. All address/port manipulation functions in the standard library work in network byte order." - the sin_family is never sent over the network; it doesn't have to be in network byte order Q: Why doesn't the sin_family = AF_INET need to use htonl() or htons()? See "man bind" for the correct cast to use on the second argument to bind() and connect(): "The only purpose of this structure is to cast the structure pointer passed in my_addr in order to avoid compiler warnings." Q: Do you mean AF_INET or PF_INET? I see both - which is correct? - from "man socket" "The manifest constants used under 4.x BSD for protocol families are PF_UNIX, PF_INET, etc., while AF_UNIX etc. are used for address fami- lies. However, already the BSD man page promises: "The protocol family generally is the same as the address family", and subsequent standards use AF_* everywhere." Q: T/F PF_INET and AF_INET are effectively the same thing everywhere References to Notes files (required reading): ------------------------- programming_style.txt header_files.txt makefiles.txt screendumps.txt