-------------------------------- Practice Unix/Linux Questions #2 -------------------------------- -IAN! idallen@idallen.ca Remember to start shell scripts with four things: A - interpreter line: #!/bin/sh -u B - search path setting: PATH=/bin:/usr/bin ; export PATH C - character sort ordering: LC_COLLATE=C ; export LC_COLLATE D - use only one-byte character sets: LANG=C ; export LANG E - umask setting: umask 022 1. Write a script named labelcheck.sh that checks the spelling of each of the seven lines of the course Assignment Label. The script should contain Unix commands that look for the correctly spelled lines in a file named label.txt in the current directory. Hint: Write seven Unix commands. Each command tries to find one of the seven label lines in the label.txt file. If it fails, it tells the user that the line is missing or mis-spelled. 2. Write a Unix pipeline to print just your numeric Unix userid and nothing else. Hint: You can find this number in the output of the "id" command or by looking for your userid in the Unix password file. 3. In the /usr/bin directory of commands, we suspect that some of the command files are actually the same program under different names. Use Unix tools to identify all the command files under /usr/bin that actually contain the same programs. Hint: The "sum" comand produces a quick checksum of the bytes in a file; files with identical content have identical checksums. The shell can easily generate all the pathnames under /usr/bin. Have the shell pass that list of names to the checksum command and save the output in a temporary file. Extract from that file just the column of checksums and process it to find duplicate checksums, indicating some program has more than one name under /usr/bin. (The "uniq" command has an option to find adjacent duplicate lines. Checksum 355593 appears 4 times.) Look for each of the duplicate checksums in the temporary file to find the lines of programs that have different names but are the same. 4. The simplest use of the Unix "find" command (using only pathname arguments) is to give a recursive list of all the pathnames under a directory. For all pathnames under the /var/www/ directory, show a list of the ten most frequently occurring pathame components along with their occurrence counts. (A pathname component is the part between, before, or after a slash.) Hint: Put each pathname component on its own line by changing the slashes in the pathnames to newlines. Count the unique lines. 5. Write a script to count the number of unique "words" in a file. (A word is separated by blanks or punctuation.) This count will be much less than the actual total number of words in the file. Hint: Translate blanks and punctuation characters into newlines. Take the resulting output and count the number of unique lines. Part II: Show the ten most frequently occurring words. 6. Indiana University has an index of musicians at http://www.music.indiana.edu/music_resources/ A list of artists starting with "S" is under: http://www.music.indiana.edu/music_resources/artistss.html Write a script to dump this page (in formatted form) and count how many lines contain the string "San" on the formatted page. (Note: Count only lines containing the artists names in the page; do not count http links to the artists that may contain "San".) Hint: The answer (in October 2005) is 4-5 lines. You may need to select lines that do not contain the pattern "http" to give an accurate count. 7. You can search for the word "foo" using Yahoo search with the URL: http://ca.search.yahoo.com/search?p=foo Write a script to do a Yahoo search for the artist "Coldplay" and count how many lines contain the phrase "Rock and Pop". Note the exact use of capital letters in the phrase! (The answer in October 2005 is 6 references.) Perform the same count for the artist "Britney Spears", and then for "Beatles". 8. Produce a listing of the Unix password file showing only the userid field and the shell field, in sorted order (by userid). 9. Write a command pipeline that shows only the name of the current month (e.g. October, November). Hint: what Unix command displays a calendar? (You can also get this information if you know some fancy options to the Unix "date" command.) 10. For all non-hidden paths in the /bin directory, find out what type of file each is and produce a list of the unique first words of the type. The output will look something like this: Bourne Bourne-Again ELF setuid symbolic Hint: The shell can easily generate all the pathnames under /bin. Have the shell pass that list of names to a command that can tell you what type of thing each name is, then extract just the first word of the type information from the resulting output lines. Process the list to remove duplicate lines. 11. Write a command pipeline to display a list of the unique permission strings (as output by "ls -l") for all pathnames (including hidden files) in the current directory. (Do not output the "total" line produced by the "ls" command.) Then use another pipeline to output the permission string that occurs most frequently, preceded by its count. For the /bin directory on a typical Linux machine, the output was: -r-xr-xr-x -rwsr-xr-x -rwxr-xr-x drwxr-xr-x lrwxrwxrwx 72 -rwxr-xr-x The actual output may differ, depending on what is in /bin. 12. You can find the length of the longest line in a file using this method: (1) Change all the characters that are not newlines into periods. [You now have a file that only contains periods on every line.] (2) Sort the resulting output and extract one of the longest lines. [When you sort lines of identical characters, they sort by length. Select one of these longest lines.] (3) Count the characters in the longest line. The actual answer is one less than the resulting count. (Why is the answer one less?) What is the length of the longest line in the text file /etc/termcap ? (Answer: 335 minus one.) Hint: The "tr" command has an option that complements the first (source) character set, translating all characters that are *not* in the first set. For example (RTFM): # Example us of "tr" with the "complement" option: # Translate to X all characters that are NOT letters or newlines $ date | tr -c 'a-zA-Z\n' 'X' SunXOctXXXXXXXXXXXXXEDTXXXXX