=================================== Regular Expression Skill Assessment =================================== -IAN! idallen@ncf.ca Here are some descriptions of text manipulation problems of varying levels of difficulty. These are all example of text and data manipulation. Some problems may be solved using Unix utilities that don't use regular expressions. Many of the problems require more than one Unix utility, or the same utility used repeatedly. To succeed in this course, you must be able to do all the Elementary manipulations given here. You must be able to do most of the Basic manipulations. Elementary 1. Change the letters "dog" to "HORSE" everywhere it occurs on all lines. 2. Change all occurrences of the letters "Man" at the beginning of a line to "Person". 3. Change all occurrences of "stick" followed by any punctuation at the end of a line to "Stick.". (The punctuation is replaced by a period.) 4. Change all occurrences of "Dog" or "dog" to "COW". 5. Change all Canadian or American spellings of colour (color) to "Color". 6. Double all vowels in every word on every line. 7. Triple the amount of space between every word. 8. Find and print lines that contain "dog" followed by any number of digits then "cat". 9. Find and print lines that contain the letters "dog" followed anywhere by the letters "cat". 10. Change all occurrences of one or more digits to the single word "NUMBER". 11. Replace all occurrences of one or more blanks with a single blank. 12. Replace all occurrences of one or more tabs or blanks with a single blank. 13. Remove the first 8 characters from every line. 14. Remove all leading blanks or tabs from all lines. 15. Remove all trailing blanks or tabs from all lines. 16. Replace all tab characters with eight spaces. 17. Change all punctuation so that the sentence period lies outside of the closing double quote, e.g. "Hello there." becomes "Hello there". 18. Remove everything leading up to and including the last blank on each line. 19. Remove everything including and after the first blank on each line. 20. Put double quotes around every occurrence of the phrase "user-friendly". Basic 1. Add an extra blank after every period at the end of a sentence. 2. Make sure that every period at the end of a sentence is followed by exactly two blanks. 3. Truncate every line to ten characters. 4. Exchange the first 10 characters with the next 15 characters on every line. 5. Exchange the first number with the second number on every line. 6. Remove all leading zeroes from the first number on each line. Don't mishandle single digit zeroes. 7. Find and print lines that contain all the vowels in alphabetical order, a before e before i before o before u. Test using /usr/dict/ words. 8. Find and print lines that contain all the vowels in any order. Test using /usr/dict/words. 9. Change all occurrences of one or more digits surrounded by spaces to the word "NUMBER" also surrounded by spaces. 10. Change only the second occurrence of a single blank to a colon in each line. 11. Change the only the second-to-last single blank to a colon in each line. 12. Change only the second occurrence of a string of one or more blanks to a colon in each line. 13. Change only the second-to-last occurrence of a string of one or more blanks to a colon in each line. 14. Remove all occurrences of HTML tags whose open and closing angle brackets are on the same line (e.g.
, , , etc.). Remove all of them, not just the first ones. 15. Remove everything on every line that appears between double quotes, leaving only the quotes. (Example: a "bcd" efg "h i" j --> a "" efg "" j ) Handle empty strings (adjacent quotes) correctly. 16. Find lines that contain only one single quote character (an unmatched quote). 17. Put double quotes around every occurrence of the phrase "user-friendly", unless the phrase already has double quotes around it. 18. Find all numbers prefixed by a dollar sign, remove the dollar sign, and suffix the number with "CDN", e.g. $123.45 becomes 123.45CDN. Now do the reverse. 19. Find all numbers with periods separating decimals and change the periods to commas, e.g. 123.45 becomes 123,45. Now do the reverse. 20. Find all numbers with commas separating sets of three digits and change all the commas to spaces, e.g. 1,234,567.23 becomes 1 234 567.23. (You may assume the only use of a comma immediately followed by three digits is as a separator.) 21. Locate common misspellings and mistypings of "@algonquinc.on.ca" and fix them all. (e.g. fix algonqinc.ont.can, etc.) 22. Find all occurrences of your name with or without initials and embedded spaces. (e.g. "Ian D. Allen", "Ian Allen", "I. D. Allen", "ID Allen", "IDAllen", "iallen", etc.) Try to minimize false hits in the middle of words. (e.g. fallen, challenge, Wallenstein, etc.) 23. Remove either single or double quotes from around all strings of one or more digits, e.g. "10" or '10' become just 10. Now do the reverse (add quotes to all numbers). 24. Locate hexadecimal numbers having the form "0xA0FF2375C3" and prefix them with the string "(HEX:)", e.g. 0xDEAD would appear as (HEX:)0xDEAD and 0xBEAD00BEAD00 would appear as (HEX:)0xBEAD00BEAD00. Now do the reverse (remove the prefixes). 25. Use a single regular expression to change every occurrence of the word "dog" to be "dog-eat-dog" and "cat" to be "cat-eat-cat". Now do the reverse. 26. Produce a plain list of email addresses and home pages for everyone with an account on this system.