----------------------- Lab #04 for NET2003 due February 5, 2007 ----------------------- -Ian! D. Allen - idallen@idallen.ca Remember - knowing how to find out an answer is more important than memorizing the answer. Learn to fish! RTFM! (Read The Fine Manual) Global weight: 3% of your total mark this term. Due date: before 10h00 Monday February 5 The deliverables for this lab exercise are to be submitted online on the Course Linux Server using the "netsubmit" method described in the lab exercise description, below. No paper; no email; no FTP. Late-submission date: I will accept without penalty lab exercises that are submitted late but before 12h00 (noon) on Wednesday, February 7. After that late-submission date, the lab exercise is worth zero marks. Lab exercises submitted by the *due date* will be marked online and your marks will be sent to you by email after the late-submission date. Lab Synopsis ------------ Examine an Apache 2 log file and discover recent Internet attacks. Create an executable shell script to document and decode the attack. NOTE: For full marks, keep your lines shorter than 80 columns in this course. Short lines allow for easy printing and side-by-side comparison of files on a screen. Where to work: See Lab #1. Easy access to Course Notes: See Lab #1. Folding long lines ------------------ Instead of writing this unreadable long line: wget -O - http://teaching.idallen.com/net2003/07w/notes/vi_basics.txt | cat -n | sort -nr | tac | cut -b 8- | tail -32 Break the line on spaces and prefix the preceding line with a backslash. Indent the continuation. Multiple commands on a pipeline may be placed on separate lines for readability: wget -O - http://teaching.idallen.com/net2003/07w/notes/vi_basics.txt \ | cat -n \ | sort -nr \ | tac \ | cut -b 8- \ | tail -32 No trailing blanks are allowed after the backslash. The vim command ":set list" will show trailing blanks and tabs. (":set nolist") ---------------------------------------- Lab Details (on the Course Linux Server) ---------------------------------------- Make sure you are working on the Course Linux Server. You can do some of this work on Knoppix; however, none of the Course Notes are on Knoppix and your files will disappear when you log off unless you copy them to the Course Linux Server. *) For full marks, keep your lines shorter than 80 columns in this course. Short lines allow for easy printing and side-by-side comparison of files on a screen. *) You may find it useful to create separate directories in your linux server account to store the files for each lab exercise, e.g. $ mkdir lab4 $ cd lab4 $ vim lab04cmds.txt Part I - lab04cmds.txt ------ My Apache web server logged a successful attack via an insecure PHP program named "cacti" this month. To analyse this attack, you need to know the meanings of the fields on the lines in the server log file. A) Code the Part II script sections 11a) to 11e), below. Execute the script to create the access.txt and error.txt log files in the current directory. Decoding the fields in the Apache log file entries: Each line in the access.txt file is output by Apache using the format string named "CustomLog" found in the Apache config file here (on the Course Linux Server): /etc/apache2/conf.d/idallen B) Read the above file and find the CustomLog line. (Use "less" and search forward.) The meanings of each %X format directive in that line are given in the Apache 2 documentation: http://httpd.apache.org/docs/2.0/mod/mod_log_config.html#formats Here is the first line of access.txt with some fields explained: 2006-11-26_07:58:18 # date and time 206.47.37.39 # remote IP address 206.47.37.39 # [... fill in ...] - # remote user (if any) 192.168.9.250 # [... fill in ...] ian.idallen.ca # canonical server name ian.idallen.ca # UseCanonicalName setting "GET ..." # first line of the HTTP request received 200 # last HTTP status code issued 5923 # [... fill in ...] "http: ..." # supplied HTTP Referer field "Mozilla ..." # [... fill in ...] 504 # [... fill in ...] 6242 # [... fill in ...] "/home/idallen ..." # local filename accessed by request 0 # time (seconds) to service the request 1302 # time (microseconds) to service the request FMT=idallen2 # flag to put at end of each log line C) Copy the above 18 lines of log fields into file lab04cmds.txt and fill in the missing CustomLog field explanations using the Apache 2 documentation (above). D) Now finish the rest of your Part II script. Part II - lab04script.sh ------- You will build an executable shell script containing a commented list of Unix commands to be executed. Make sure you are working on the Course Linux Server. You can do some of this work on Knoppix; however, none of the Course Notes are on Knoppix and your files will disappear when you log off unless you copy them to the Course Linux Server. In shell scripts, lines starting with octothorpe characters ('#') are ignored as comment lines. Steps 1-10 buid the same script template as you used for the last assignment. You may copy that template rather than recreating it. 1. Fetch a copy of the argv.sh.txt file from the course notes directory (copy the file) and rename your copy to be: lab04script.sh Make the file executable using the chmod command (note the plus): $ chmod +x lab04script.sh 2. Verify that the copied script executes correctly and without error: $ ./lab04script.sh a b c ./lab04script.sh: This script has 3 command line arguments. Argument 0 is [./lab04script.sh] Argument 1 is [a] Argument 2 is [b] Argument 3 is [c] 3. At the top of the lab04script.sh file, fix the existing Assignment Submission label comment to be your own label. Make sure the lines of the label all start with the shell comment character '#'. 4. Delete the line (line 2) starting with "# Display on standard error ...". 5. At the top, under "Syntax", replace "[args...]" with "(no arguments)" so that the line reads: # $0 (no arguments) This script takes no command line arguments. 6. At the top, under "Purpose", replace the paragraph with a short description of what this script does (come back to this - see below). Don't forget this step! 7. Change the line starting with PATH= to be this exact line below by adding "/sbin" and ":" to the start of the PATH string: PATH=/sbin:/bin:/usr/bin ; export PATH Do not add any blanks or quotes to this line. Make sure you see the two characters ":" and ";" as different. (Adding /sbin to the PATH allows the shell to find executable commands in the /sbin directory as well as in /bin and /usr/bin .) At this point, you might re-execute the script to make sure you haven't introduced any typing errors. The output should be the same as above. There should be no error messages on the screen. 8. Delete everything below and after the line "umask 022" in the file. 9. Add a blank line after "umask 022". 10. Delete all the "I!" comment lines. Never submit these lines in your own scripts - they are "instructor" comments from me to you. 11. You will now add, to the bottom of this executable script, at least three lines for every section in this question. The first line will echo the question number onto the screen in the exact format shown. The second (or more) line(s) will be shell comment(s) that will not print on the screen. The last line will be a shell command line or pipeline (one or more commands) that will do the action required by the question. The output from the command(s) will appear on the screen, if the command generates output. (Many Unix/Linux commands work silently and have no output on the screen.) For example, if asked to "display a count of the non-hidden names in the /bin directory" you would echo the question number, enter at least one comment line (less than 80 characters), and enter the correct Unix/Linux command line (one or more commands) into the bottom of the script file like this: echo "--- Question 11y" # count the non-hidden names under /bin ls /bin | wc -l Use the exact format and syntax shown for the echo line that shows the Question number. (Use the cut-and-paste features of vim to copy the line for the next question; don't re-type it every time!) You may have more than one comment line; each line must be less than 80 columns. You may want to open up another login window and work with the shell interactively to test the commands you come up with. Copy each working command into the shell script as you debug it in the interactive window, and run the script after each addition. As you write the script, leave one blank line between each question you answer, to make the script source more readable. The result will be groups of three or more lines, separated from one another by one blank line. Test your script periodically (after each edit) by saving it and running it from the command line (perhaps in a different window): $ ./lab04script.sh Make sure there are no error messages. Run the script after each edit you add; that way you debug only the few lines you added. NOTE: Do not use any "change directory" commands in your script unless the instructions ask for it explicitly. Do not add commands or create output that is not required. Do only the commands listed, in the order listed. If you can't answer a question: echo the question number, add the comment line, but leave out the actual command itself. Here are the individual questions to add to your executable script. Find executable commands that look up and produce the given output. All the command names needed are given in this lab (below) or are already known to you via the Notes file: unix_command_list.txt 11 a) Count the number of characters in the output of the "date" command. (If this command fails, check the spelling of your PATH line.) Next, fetch and uncompress two Apache log files into two local files: b) Uncompress the file "access_01.txt.gz" from the "Data Files" section of the course notes into a file named "access.txt" in your current directory using the "gunzip" program. (HINT: Either feed the file contents as standard input to gunzip or use the gunzip option that tells gunzip to write to standard output. Redirect the standard output of gunzip into file "access.txt".) The access.txt file should contain 19,718 lines. If you get an error about "permission denied", reread the HINT. For full marks, pay attention to folding your long lines with backslashes to keep them less than 80 columns, as shown above. c) Similarly, uncompress the file "error_01.txt.gz" into "error.txt". The error.txt file should contain 361 lines. d) Using one command line, display the checksums of both files you just created. (22695 and 33386) e) Using one command line, count the number of lines/words/characters in both files you just created. (19718 555757 7576609 access.txt) (361 4140 33756 error.txt) Files access.txt and error.txt are the actual Apache access and error log files from my home computer this month. f) Extract and display the line containing the CustomLog directive from its "idallen" Apache config file (given in Part I above). One long CustomLog line should display on your screen. g) Find a pattern that will extract and display from the error.txt file only the nine lines that indicate that the Apache server was shutting down. You can see in the log file that the server is restarted every week, to roll over the log file. One of the ways to break into an HTTP server is by sending in an HTTP request that is so long that it overflows the server buffer and causes it to execute attacker code. Let's see if any long-line attacks were mounted against my server in the past month or so: h) Display the length of the longest line in file access.txt. (HINT: for a command that can do this "man wc") (26095 access.txt) Someone sent at least one 26,000 byte request string to the server! Let's find out how many really long lines are in the access.txt file. The access.txt file contains over 19,000 lines; we don't want to do this search manually: i) The grep pattern '^.\{2000\}' (with quotes) will find lines containing 2,000 or more characters. Display the count (one number) of how many lines are at least this long in the access.txt file. Use the single quotes when you use this pattern on a shell command line, to protect the braces and backslashes from interpretation by the shell. (4 lines found) The command "cut -b 1-5" will display the bytes 1 through 5 of each line read from standard input. Use a similar command as part of the next command pipeline: j) Display the Date/Time field and at least one Remote IP address field from beginnings of the above four long log file lines. Don't display the whole lines - just display enough bytes from the start of the line to show the required information! From the above output, we know the times of the long-line attacks. By manually (not part of this script) reading the error.txt log file (using "less" and searching forward for the above four dates) you can locate the Apache server's logged response to these long lines. k) Find a pattern that will extract and display from the error.txt file only the four lines that indicate that the Apache server rejected an incoming request because it was too long. The error.txt log entries show that the Apache server successfully repelled the long-line attacks. Output: four lines from error.txt Though Apache repelled all attacks, a PHP program running on the server was not as well-written. Crackers broke into my machine by tricking a PHP script into executing the "wget" command as part of a MySQL database query. Let's find out how they broke in, and what they executed. By manually (not part of this script) reading the error.txt log file (using "less"), look for the visible evidence of "wget" output. You'll find the first two uses of wget from site thote.info in the early morning hours of January 09 in the error.txt file. l) By picking a unique string that only appears once in wget output, display the count of the number of times wget was used in the error.txt log file. (count 5 times in error.txt) By manually (not part of this script) reading the access.txt log file (using "less" and searching forward for the January 9 date) you can find what HTTP commands were being executed just before the January 9 wget attack showed up in the error.txt file. On those lines in access.txt you can discover the remote IP address and HTTP request URL that were used to break into my server. You will see in access.txt that on January 9 the GET request was fetching a PHP script named "/cacti/cmd.php" at the time just before the wget executed. The full HTTP request string included UNION, SELECT, and some CHAR() functions. m) From the log file access.txt count just the lines containing the pattern "GET /cacti/cmd.php". You will need to use double (or single) quotes around the search pattern to hide the blank from the shell. (count 7 lines) The two log files tell us that the cmd.php script was (ab)used seven times, starting January 9. Five of those uses resulted in wget executions (a successful break-in). Take a moment (not part of the script) to do a Google search for the string "GET /cacti/cmd.php" and read the January 12 FreeBSD article on the "Cacti remote injection exploit". The cacti package maintains a MySQL database of system activity. The attack injected shell commands into the MySQL request stream using the CHAR() function. One of the shell commands injected was wget. Let's find out exactly what else was injected by the CHAR() function. n) Extract from log file access.txt just the lines containing "CHAR" into file attack.txt and then count the lines/word/chars in the attack.txt file. (Two separate command lines; 5 100 4300 attack.txt) These five lines you just extracted correspond to the five uses of wget that broke into the server. Each of the numbers to the MySQL CHAR() function is the decimal number of an ASCII character. To decode the CHAR string, and see what commands were executed by the cracker's injection, we need to turn all the decimal numbers on these lines into printable ASCII characters. You can do this the hard way ("man ascii"); or, you can write a single command pipeline that does it for you automatically. The lines in the attack.txt file contain both digits, non-digits, and punctuation. An easy way to isolate just the numbers on separate lines is to turn everything in the file that is not a digit into a newline using the "tr" (translate) command. The result will be a list of blank lines and numbers that we can then decode to ASCII. o) Write a single pipeline of commands that will decode the all the attack.txt decimal numbers contained in the CHAR strings into printable characters. I've provided some advanced scripting commands below that you can use to build this pipeline. Each of the command lines below reads from standard input and outputs to standard output; you can try each line separately to see how it works. Most (but not all) of the commands used in these lines will read from file(s) if you want to add the file name(s) to the end of the command line. These are your command pipeline building blocks: sed -e 's/^.*"GET //' -e 's/HTTP.1.0".*$//' - This command reads lines from a file (or stdin) and removes everything on each input line up to and including the string '"GET ' and also everything on each line from the pattern 'HTTP.1.0"' to the end of the line. This removes the parts of a log line that are not part of the actual HTTP request. The quotes are required to protect characters from the shell. tr -sc 0-9 '\n' - This translate command reads lines from stdin (only - no file names allowed) and changes strings of non-digits into newlines. Output from this command is blank lines and lines with numbers on them. ("man tr" to explain -s and -c) The quotes are required to protect characters from the shell. perl -n -e 'print( ($_ >= 32 && $_ <= 126) ? chr($_) : "\n")' - This Perl command fragment reads lines from a file (or stdin). If a line contains a decimal number that is in the printable ASCII character range, it turns the number into a character; non-digits and unprintable numbers are turned into newlines. The quotes are required to protect characters from the shell. tr ';' '\n' - This command reads lines from stdin (only - no file names allowed) and changes all semicolons into newlines. This is useful for turning single command lines such as "a;b;c" into separate lines for easier reading. The quotes are required to protect characters from the shell. cat -s - This command reads lines from a file (or stdin) and removes adjacent blank lines. (Anything we don't want to process gets turned into a newline; this cat reduces the number of lines.) Assemble the above five commands into a single Unix pipeline that will display the contents of the decimal CHAR sequences in the attack.txt file as printable lines of characters on your screen. You will not need any other commands. You will not need to change the arguments on any of the above commands; you only need to get them in the correct order to produce the most readable output. My desktop web server at Algonquin College was also attacked via cacti. Let's find out what they tried to execute on that machine: I've already placed the CHAR lines from my desktop log file into file attack_02.txt.gz in the "Data Files" section of the course notes - all you have to do is uncompress the file to standard output and pipe the uncompressed output into the same kind of pipeline you devised above: p) Uncompress the file "attack_02.txt.gz" directly into the same kind of pipeline as you did above. This time, instead of reading from the file attack.txt, the first command in your pipeline must read the uncompressed stream coming from standard input (from the output of the decompression command). No temporary files are needed. q) Remove any files created by this script. (The current directory will be as you found it before the script started.) r) Last command: Cause the script to exit. If you are curious about other attacks of this type, you might like to Google search for pages containing "cmd.php CHAR UNIION SELECT" and use your command pipeline to decode the CHAR strings in those web pages. Someone wrote a full Perl script that takes 35 lines to try to decode the CHAR strings; but, it is fussy about the log format and doesn't always work. (The pipeline we use above always works!) The securityfocus article is here: http://www.securityfocus.com/bid/21799 All the command names needed are given in this lab (above) or are already known to you via the Notes file: unix_command_list.txt 12. How to work: Execute your script as you add each of the above command lines. Make sure the script generates the correct output and no error messages when you execute it: $ ./lab04script.sh 13. Verify the script format. Above each of the lines in your working script, make sure you have added a line that echoes the question number in the exact format given. Under that echo line, make sure you have comment lines (one or more, each less than 80 characters) explaining in your own words what the command line that follows the comment does. Each answer will look similar to this: echo "--- Question 11z" # Exit the shell script. # An exit code of zero means everything worked. exit 0 In your script source, leave one blank line between each question you answer, to make the script readable. (You may also echo blank lines to the screen to separate the output sections, if you wish.) 14. Go back and finish the Purpose section comment at the top of the script. Scripts without your added comments will not be marked. NOTE: For full marks, keep your lines shorter than 80 columns in this course! Submission ---------- 15. Submit the above two files (see below). Submission Standards: See Lab #1 for details. A. Make sure both files contain an Exterior Assignment Submission label. For full marks, lines must be shorter than 80 columns. B. Submit your files for marking as Lab 04 using the following *single* netsubmit command line exactly as given here: $ netsubmit 04 lab04cmds.txt lab04script.sh Always submit *all* files at the same time for every submission. Files submitted under the wrong names are worth zero marks. P.S. Did you spell all the assignment label fields and file names correctly?