============================================ Regular Expressions - more practice examples ============================================ -Ian! D. Allen idallen@idallen.ca Here are some examples of manipulating text with regular expression substitutions. See the chapter03guide.txt for explanations of some of these patterns. The patterns below operate on a copy of the Unix password file that contains some Algonquin student userids that have the form abcd0001. If several command lines are given, they all do the same thing. To see only the lines on which the substitutions succeed, append the "p" command to the substitution expression and use sed option "-n" to turn off default output, e.g. sed -n -e 's/[^:]*$//p' pswd *) Use a regular expression substitution to remove all text after the last colon on each line. Don't remove the colon itself. sed -e 's/:[^:]*$/:/' pswd sed -e 's/[^:]*$//' pswd *) Use a regular expression substitution to add the text /bin/bash to the end of every line that ends in a colon. sed -e '/:$/s/$/\/bin\/bash/' pswd sed -e '/:$/s|$|/bin/bash|' pswd sed -e 's/:$/:\/bin\/bash/' pswd sed -e 's|:$|:/bin/bash|' pswd *) Use a regular expression substitution to change all home directores from /home/foo to /var/foo/home on every line. sed -e 's|/home/\([^:]*\)|/var/\1/home|' pswd Note: The above assumes that the first occurrence of /home/ is always part of a home directory, since we aren't limiting the match to a particular field in the line. *) Use a regular expression substitution to reverse the orders of letters and digits in the first field in the file, so that abcd0001 becomes 0001abcd, on every line. (Assume that letters always precede digits in the userids.) sed -e 's/^\([a-zA-Z]*\)\([0-9]*\)/\2\1/' pswd *) Use a regular expression substitution to swap the first two fields in the file, so that abcd0001:x: becomes x:abcd001: on every line, no matter what text is in the two fields. sed -e 's/^\([^:]*\):\([^:]*\)/\2:\1/' pswd *) Use a regular expression substitution to swap the first two adjacent numeric fields on every line, so that :9:13: becomes :13:9:, on every line. sed -e 's/:\([0-9]\+\):\([0-9]\+\):/:\2:\1:/' pswd *) Use a regular expression substitution to remove all blanks that appear between letters (upper or lower case). Do not touch blanks that have non-letters on either side. sed -e 's/\([a-zA-Z]\) \+\([a-zA-Z]\)/\1\2/' pswd *) Delete all lines (including any blank lines) that do not contain any colon or semicolon characters. (Compare this with the next one!) sed -e '/[:;]/!d' *) Delete all lines that contain any character that is not a colon or semicolon. (Compare this with the previous one! It is not the same.) sed -e '/[^:;]/d' Important Note: /[:;]/!d and /[^:;]/d are *not the same*! A line that contains characters that are not X (e.g. aXbcXdeXf) is not the same as a line that doesn't contain any X characters! A blank line is a line that doesn't contain any X characters; a blank line is *not* a line that contains characters that are not X (because a blank line doesn't contain any characters at all). *) Find lines beginning with "umask 022" followed by zero or more blanks or tabs followed by a comment character ("#"), semicolon (";"), or end of line. Delete these lines. (They don't belong in a password file!) sed -e '/^umask 022[[:space:]]*\(#\|;\|$\)/d' sed -e '/^umask 022[[:space:]]*\([#;]\|$\)/d' *) Find lines beginning with "xyzzy 123" followed by zero or more blanks or tabs followed by a comment character ("#"), semicolon (";"), or a real dollar sign ("$"). Delete these lines. sed -e '/^xyzzy 123[[:space:]]*\(#\|;\|\$\)/d' sed -e '/^xyzzy 123[[:space:]]*[#;$]/d' Important Note: (#|;|$) is not the same as [#;$] if it is used where the '$' in (#|;|$) can be interpreted as the end-of-line character. To be sure that '$' is taken as a real dollar sign, backslash it. *) Tricky: Use a regular expression substitution to swap the first field in the file with the last numeric field in the file, on every line. sed -e 's/^\([^:]*\)\(:.*:\)\([0-9]\+\)/\3\2\1/' pswd Note: The above assumes that the file has at least three fields; since, it has to match a minimum of two colon characters. Note: The middle tagged expression :.*: matches any number of characters between colons. Note: It doesn't make sure that the last numeric field is *all* numeric, only that it *starts* with digits. To make sure that the last numeric field is all digits right up to the end of the field, add to the last part of the regexp a fourth tagged expression that matches either the field delimiter or end-of-line. *) Tricky: Use a regular expression substitution to swap the first field in the file with the first numeric field in the file, on every line. sed -e 's/^\([^:]*\)\(:[^0-9]*:\)\([0-9]\+\)/\3\2\1/' pswd Note: The above assumes that the file has at least three fields; since, it has to match a minimum of two colon characters. Note: The middle tagged expression :[^0-9]*: matches any number of non-numeric fields (fields containing only non-digit characters). Note: It doesn't make sure that the first numeric field is *all* numeric, only that it *starts* with digits. To make sure that the first numeric field is all digits right up to the end of the field is harder than the previous example; since, the second tagged expression would have to change to match both purely non-digit fields and also repeated empty fields and fields that contain digits mixed with non-digits, e.g. (:(|[^:]*[^0-9:][^:]*))* where [^:]*[^0-9:][^:]* means "a field (no colon character) containing at least one non-digit". It's doable, but messy, and here it is: sed -e 's/^\([^:]*\)\(\(:\(\|[^:]*[^0-9:][^:]*\)\)*:\)\([0-9]\+\)\(:\|$\)/\5\2\1\6/' pswd The above complex mess makes sure the swap only happens with the first *all-numeric* field, not just with the first field that happens to start with some digits. Note the use of parentheses in front of "*", so that "*" repeats the entire group zero or more times. Note also the use of extra parentheses for grouping, as well as tagging, meaning that we use \5 and \6 in the right-hand-side substitution.