Updated: 2016-10-28 17:25 EDT

1 Matching Patterns: GLOB vs. Regular ExpressionsIndexup to index

There are two different pattern matching facilities that we use in Unix/Linux: GLOB patterns and Regular Expressions.

Regular Expressions are another way to match patterns in text, similar to but more powerful than simple GLOB patterns.

Pay close attention to which of the two situations you’re in, because some of the same special characters common to GLOB and Regular Expressions have different meanings!

1.1 GLOB patterns (review)Indexup to index

There are several major places where GLOB patterns are used:

1.1.1 File GLOB in the Shell: *.txtIndexup to index

In the shell, GLOB patterns may be used to match existing pathnames in the file system:

$ ls *.txt
$ echo ?????.txt
$ touch [ab]*.txt

The shell tries to expand the GLOB to match existing pathnames before the associated command runs.

1.1.2 case statement GLOB in the ShellIndexup to index

GLOB patterns are used in shell case statements to match the text at the top of the case statement:

case "$1" in
/* ) type='Absolute Pathname' ;;
*  ) type='Relative Pathname' ;;
esac

1.1.3 GLOB in the find command: -name '*.txt'Indexup to index

The find command --name operator also matches GLOB patterns against the file system, but it does so recursively in every directory, not just in one directory:

$ find . -name '*.txt'
$ find . -name '?????.txt'
$ find . -name '[ab]*.txt'

We quote the patterns above to hide them from the shell so that the find command receives the pattern and the shell doesn’t try to expand them.

1.2 Regular Expressions – Basic and ExtendedIndexup to index

Regular Expressions (short form: regexp) are text matching patterns similar to GLOB patterns but more powerful. Regexp patterns use all the GLOB pattern matching characters and add more. The characters work slightly differently between GLOB and regexp.

Regexp are used by many Unix/Linux programs and programming languages such as grep, sed, awk, vim, less, more, man, Perl, python, etc.

In an editor (such as vim or sed), a Regular Expression may be used to select characters to be deleted, replaced, or exchanged:

:%s/colou*r/COLOUR/g                   # vim replacement regular expression

$ echo "Colouur bad.  Colour red.  Color tan." | sed -e 's/Colou*r/COLOUR/g'
COLOUR bad.  COLOUR red.  COLOUR tan.

Regexp have a Basic set of pattern matching characters and an Extended set of characters. The grep program family is a very popular user of both Basic and Extended Regular Expressions.

The grep command itself accepts Basic Regular Expression syntax, and needs backslashes in front of some operators to access Extended Regular Expression features. The egrep command accepts Extended Regular Expression syntax and does not need the backslashes. You can do the same text search using either command, but the syntax changes:

$ grep 'publickey for \(idallen\|cst8207[abc]\?\)' /var/log/auth.log   # Basic
$ egrep 'publickey for (idallen|cst8207[abc]?)' /var/log/auth.log      # Extended

From the section REGULAR EXPRESSIONS in the man page for the grep command:

Basic vs Extended Regular Expressions
  In basic regular expressions the meta-characters ?, +, {, |,
  (, and ) lose their special meaning; instead use the backslashed
  versions \?, \+, \{, \|, \(, and \).

Even the bash shell has extended syntax that allows the use of regular expressions instead of simple GLOB patterns.

IMPORTANT: Regular Expressions use some of the same special characters as GLOB patterns, but they mean different things! In particular, *, ?, and . work differently! There are others!

1.3 GLOB patterns are anchored; Regular Expressions floatIndexup to index

GLOB patterns are said to be anchored to the start and end of the line; they must always match the entire text string (usually a file name) from the start to the end.

The GLOB pattern a*b matches only text that starts with a and ends with b – that GLOB pattern doesn’t match just the ab in the middle of xxxabxxx.

The modified GLOB pattern *a*b* now matches the whole text that contains a followed by b anywhere in the text. The modified GLOB pattern does match the entire text xxxabxxx.

Regular Expressions are not by default anchored. They “float” down the text and they may match anywhere in the text string unless you explicitly anchor them to either the start or end of the text using using regexp characters ^ and/or $.

The Regular Expression a.*b matches inside any text that contains a followed by b anywhere in the text. The floating regexp does match the ab in the middle of xxxabxxx.

The modified Regular Expression ^a.*b$ is now anchored to the start and end of the text. The modified expression now matches exactly the same text as the GLOB pattern a*b because it forces the a to match at the start and the b to match at the end. It does not match inside xxxabxxx.

You must remember to anchor the ends of your Regular Expressions if you want to be sure that they match the whole piece of text and not just some part of the text.

Summary:

2 Regular Expressions compared with Algebraic ExpressionsIndexup to index

Like algebraic expressions, more complex Regular Expressions are built up by combining simpler expressions. Regular Expressions have operators similar to algebraic operators, but they mean different things than in algebra. Like algebraic operators, Regular Expression operators have bindings and precedence when combined with other operators.

Before we look at Regular Expressions, let’s take a look at some Algebraic Expressions you’re already comfortable with. Larger Algebraic Expressions are formed by putting smaller expressions together:

Algebraic Expressions
Expression Meaning Comment

a

a

a simple expression

b

b

another simple expression

ab

a x b

ab is a larger expression formed from two smaller ones

concatenating two expressions together means to multiply them

b2

b x b

we might have represented this with b^2, using ^ as an exponentiation operator

ab2

a x (b x b)

not (a x b) x (a x b)

(ab)2

(a x b) x (a x b)

parentheses for grouping

2.1 Basic Regular Expressions using * repetition (zero or more) and parenthesesIndexup to index

Similar to an algebraic exponent, the asterisk/star * Regular Expression operator binds tightly to the immediately preceding Regular Expression and repeats it zero or more times. Parentheses (a feature of Extended Regular Expressions) can be used for grouping, e.g.

$ grep 'suc*eed' document.txt        # find sueed, suceed, succeed, succceed, etc.
$ grep 'Bar\(bar\)*a' document.txt   # find Bara, Barbara, Barbarbara, etc.
$ egrep 'Bar(bar)*a' document.txt    # use egrep Extended regexp syntax

Rhabarbara: https://www.youtube.com/watch?v=dD2mhVc6C_8

Parentheses need backslashes in front of them when using a program such as grep that uses Basic Regular Expression syntax. The egrep program accepts Extended Regular Expression syntax and does not need the backslashes.

Regular Expressions using * repetition (zero or more) and parentheses
Expression Meaning Comment

a

match single ‘a’

a simple expression

b

match single ‘b’

another simple expression

ab

match strings consisting of single ‘a’ followed by single ‘b’

“ab” is a larger expression formed from two smaller ones

concatenating two regular expressions together means “followed immediately by” and we’ll say “followed by”

b*

match zero or more ‘b’ characters

a big difference in meaning from the ’*’ in globbing! This is the regular expression repetition operator.

ab*

‘a’ followed by zero or more ‘b’ characters

why not repeating the two characters ‘ab’ zero or more times? Hint: think of “ab2” in algebra.

\(ab\)*

(‘a’ followed by ‘b’), zero or more times

We can use parenthesis; in Basic Regular Expressions, we use \( and \)

2.2 Concatenating and repeating Regular Expressions using * and \(...\)Indexup to index

As with algebraic multiplication, there is no operator to concatenate Regular Expressions to match longer strings. Simple write one expression and follow it with the next one.

Similar to an algebraic exponent, the asterisk/star * Regular Expression operator binds tightly to the immediately preceding Regular Expression and repeats it zero or more times. Parentheses can be used for grouping, e.g.

Concatenating and repeating Regular Expressions using * and \(...\)
Expression Matches Example Example Matches Comment

one expression followed by another

first followed by second

xy

“xy”

like globbing

expression followed by *

zero or more matches of the immediately preceding expression

x*

“” or “x” or “xx” or “xxx” …etc

NOT like the * in globbing, although .* behaves like * in globbing

expression in parentheses

the expression

\(ab\)

“ab”

parentheses are used for groups

expression in parentheses, followed by *

the expression repeated zero or more times

\(ab\)*

“” or “ab” or “abab” or “ababab”, etc.

parentheses are used for groups

2.3 Special Characters in Basic Regular ExpressionsIndexup to index

Regular Expressions have more special characters than GLOB patterns. Some special characters need backslashes in front of them to enable them in Basic Regular Expressions.

Special Characters in Basic Regular Expressions
Character Matches Example Example Matches Comment

non-special character

itself

x

“x”

like globbing

.

any single character

.

“x” or “y” or “!” or “.” or “*" …etc

like the ‘?’ in globbing

^ used at start of regexp

beginning of a line of text

^x

“x” if it’s the first character on the line

anchors the match to the beginning of a line

^ when not used at start of regexp

^ (itself)

a^b

“a^b”

^ has no special meaning unless its first

$ at end of regexp

end of a line of text

x$

“x” if it’s the last character on the line

anchors the match to the end of a line

$ when not used at end of regexp

$ (itself)

a$b

“a$b”

$ has no special meaning unless its last

\ followed by a special character

that character with its special meaning removed

\.

“.”

like globbing

\ followed by a non-special character

the non-special character (no change)

\a

“a”

\ before a non-special character is ignored

[ and ]

character class

[abc]

“abc”

see Class below

2.4 Regular Expressions match anywhere in a line: anchoring with ^ and $Indexup to index

GLOB Patterns are said to be anchored to the start and end of the string being matched. The GLOB pattern a*b matches text axb but not abx or xab. The a has to be at the start, and the b has to be at the end.

To allow a GLOB pattern to be unanchored and match anywhere inside a string, you need to pad the GLOB with * on both sides:

$ echo a*b                  # anchored: matches axb not abx or xab
$ echo *a*b*                # now matches abx or xab or xabx or xaxbx

The GLOB pattern has to match the whole string, and may need * at each end to allow it do that.

Unlike GLOB Patterns, which are anchored, Regular Expressions are not anchored unless you make them so using the explicit anchor characters ^ and/or $. Unanchored Regular Expressions “float” down the string until a match is found, and they don’t have to extend to the end of the string.

Regular Expressions can match just a piece of text in the middle of a line; they don’t have to match the whole line.

The GLOB pattern a*b doesn’t match the string xabx because GLOB is anchored and has to match the whole string, but the Regular Expression a.*b does match inside the line, because it is unanchored at either end and floats down the string and matches the ab in the middle of string. The regexp starts unanchored (no ^ at the start) and thus “floats” down the string to do the match.

Use the line start ^ and line end $ meta-characters to anchor a Regular Expression to the start or end of a line. Here are some examples of how GLOB patterns and regexp compare:

GLOB        Regular Expression (may use anchors)
----        ------------------------------------
foo         ^foo$
bar[abc]    ^bar[abc]$
[!abc]      ^[^abc]$                 # note in complement GLOB uses ! vs. ^
foo?        ^foo.$
a*b         ^a.*b$
*foo*       foo                      # unanchored GLOB needs * at ends
*a*b*       a.*b                     # unanchored GLOB needs * at ends

Remember that an unanchored Regular Expression may match only part of a line, e.g. the text ab matches only the ab part of xxxabxxx, not the whole xxxabxxx. GLOB patterns must always match the entire line from start to end; they can’t match a substring inside a line the way regexp can.

3 Simple Basic Regular Expression ExamplesIndexup to index

When testing regular expressons with grep:

These grep commands select lines that match these Basic Regular Expressions:

grep 'ab'        # a followed by b
grep 'a*b'       # zero or more a followed by b
grep 'aa*b'      # one or more a followed by b
grep 'aaa*b'     # two or more a followed by b
grep 'a.b'       # a then one of anything then b
grep 'a.*b'      # a then zero or more of anything, then b
grep 'a..*b'     # a then one or more of anything then b
grep 'a...*b'    # a then two or more of anything then b
grep '^a'        # a must be the first character
grep 'b$'        # b must be the last character
grep '^a.*b$'    # a must be first, zero or more anything, b must be last

Find any line that contains at one, two, or three characters of any kind (“any kind” includes spaces and other unprintable characters):

grep '.'         # contains at least one character (or more)
grep '..'        # contains at least two characters (or more)
grep '...'       # contains at least three characters (or more)

grep '^.$'       # contains exactly one character
grep '^..$'      # contains exactly two characters
grep '^...$'     # contains exactly three characters

4 Regular Expression Character Classes [...] – similar to GLOBIndexup to index

The characters inside the square brackets of a character class form a set of characters where order doesn’t matter and repeats don’t affect the meaning. All these below are equivalent and match only one single character a or z or 3:

grep '[az3]'             # match one single a or z or 3
grep '[3az]'             # same - order doesn't matter
grep '[aaazzzz3333]'     # same - bad form - no need to repeat characters

Most Regular Expression special characters lose their meaning when inside square brackets, but watch out for ^, ], and - which do have special meaning inside square brackets, depending on where they occur.

Regular Expressions Character classes [...]
Expression Matches Example Example Matches Comment

character classes [...]

a SINGLE character from the list

[abc]

“a” or “b” or “c”

like globbing

complement of a character class [^...]

a SINGLE character not in the list

[^abc]

any SINGLE character not a or b or c

NOT like GLOB! GLOB uses ! as in [!abc]

special character inside [...]

as if the character is not special

[\]

\

conditions: ] must be first, ^’ must not be first, and - must be last

4.1 Using ^ to complement a character class set: [^abc]Indexup to index

Don’t confuse GLOB with Regular Expressions.

4.2 Having closing ] as part of a character class setIndexup to index

A ] character can be placed inside square brackets to be part of the character class set, but it has to be the first character in the set. []az3] means one of the four characters ], a, z, or 3 and [^]azh] means any single character that is not one of the four characters ], a, z, or 3.

Attempting to put a closing square bracket ] inside square brackets in any other position is a syntax error:

You can put an opening [ anywhere in a character class, e.g.

$ grep '[([{]` doc.txt     # search for lines with '(' or '[' or '{'

4.3 POSIX character classes – e.g. [:digit:]Indexup to index

POSIX Character Class expressions represent an entire range of characters, such as “all the digits” or “all the letters”. The classes have an awkward syntax: The POSIX class name is preceded by [: and followed by :], e.g. [:digit:]. These are the resulting class names:

POSIX Class Description
[:alnum:] alphanumeric characters
[:alpha:] alphabetic characters
[:cntrl:] control characters
[:digit:] digit characters
[:lower:] lower case alphabetic characters
[:print:] visible characters, plus [:space:]
[:punct:] Punctuation and other symbol characters
[:space:] White space (space, tab, CR, LF) characters
[:upper:] upper case alphabetic characters
[:xdigit:] Hexadecimal digit characters
[:graph:] visible characters (anything except spaces and control characters)

These POSIX class names only work inside an enclosing Regular Expression character class expression using (more) square brackets. What looks like double square brackets is really an enclosing square bracket character class expression containing a POSIX class name (which unfortunately also uses square brackets and colons as part of its name), e.g.

grep '[0123456789]'     # a digit (a list of all the digits)
grep '[[:digit:]]'      # a digit - the POSIX class name [:digit:] inside []
grep '[abcd[:digit:]]`  # a digit or letter a or b or c or d
grep '[ab[:digit:]cd]`  # same -- a digit or a or b or c or d
grep '[[:digit:]abcd]`  # same -- a digit or a or b or c or d

Of course you can use multiple POSIX class names inside the character class expression:

grep '[[:alpha:][:digit:]]`   # a letter or a digit
grep '[^[:alpha:][:digit:]]`  # *NOT* a letter or a digit

WARNING: You cannot interchange the [:alpha:] class and a list of all the upper- and lower-case letters; they are not always the same because the POSIX [:alpha:] class changes depending on the local language:

grep '[[:alpha:]]`   # a letter, using the POSIX class name [:alpha:]
grep '[a-zA-Z]'      # NOT THE SAME AS [:alpha:] - DO NOT USE !
grep '[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]' # NOT THE SAME

4.4 POSIX Regular Expression Examples, e.g. ^[[:digit:]]*$Indexup to index

These expressions could be given to grep:

4.5 Character class ranges using [.-.]Indexup to index

What determines what characters line between other characters? The result depends on your current Locale and is not well-defined.

4.6 Do not use Alphabetic Ranges that depend on Locale, e.g. [a-z]Indexup to index

Do not use alphabetic ranges (e.g. [a-z])! The ranges change depending on your system Locale and may change in unexpected ways:

$ touch A B C Z a b c z

$ LC_ALL=C

$ echo *
A B C Z a b c z

$ echo [a-z]
a b c z

$ LC_ALL=en_CA.UTF-8

$ echo *
A a B b C c Z z

$ echo [a-z]
a B b C c Z z

5 Extended Regular Expressions: ? + | { } ( )Indexup to index

Some features of Regular Expressions are called Extended features. These features are described below and use more special characters: ? + | { } ( )

5.1 Basic versus Extended Regular Expression syntax: \| vs. |Indexup to index

The difference between Basic and Extended Regular expressions is whether the program requires you to use a backslash to make use of the Extended features:

Basic:     . * ^ $ \   \| \? \+ \{ \} \( \)       # must use backslash
Extended:  . * ^ $ \    |  ?  +  {  }  (  )       # do *NOT* use backslash

The ordinary grep program uses Basic Regular Expressions, so you have to use backslashes in front of the Extended characters to turn on Extended features. The egrep Extended Regular Expression program (short for grep -E) doesn’t need the backslashes:

$ grep 'Accepted publickey for \(idallen\|cst8207[abc]\?\)' /var/log/auth.log
$ egrep 'Accepted publickey for (idallen|cst8207[abc]?)' /var/log/auth.log

Basic Regular Expressions are used in these programs and you need to use backslashes to turn on Extended features:

Extended Regular Expressions are used in these programs and you do not need backslashes to enable Extended features:

The perl program (and grep -P) has its own set of special Perl-compatible Regular Expression features, not described here.

5.2 Extended Feature: Repetition: ? + {n,m}Indexup to index

Extended Regular Expressions give you more options when repeating a preceding expression:

Regular Expressions – repeat preceding (Repetition)
Basic Extended Repetition Meaning

*

*

zero or more times

\?

?

zero or one times

\+

+

one or more times

\{n\}

{n}

n times, n is an integer

\{n,\}

{n,}

n or more times, n is an integer

\{,m\}

{,m}

m or fewer times, m is an integer (GNU extension)

\{n,m\}

{n,m}

at least n, at most m times, n and m are integers

Examples:

$ egrep 'colou?r' doc.txt                       # color or colour not colouur
$ egrep 'has +spaces' doc.txt                   # one or more spaces between
$ egrep '[0-9]{9}' doc.txt                      # 123456789
$ egrep '[0-9]{3}-[0-9]{3}-[0-9]{4}' doc.txt    # 123-456-7890
$ egrep '^.{80}$' doc.txt                       # 80 character lines
$ grep '^.\{,80\}$' doc.txt                     # 80 character or fewer lines

Note that the {,m} capability is not available in all Extended Regular Expressions, since it is a GNU extension.

5.3 Extended Feature: Alternation (one or the other): ab|cdIndexup to index

Extended Regular Expressions give you a way of matching one expression or another expression using the logical or bar | operator:

$ grep -E 'dog|cat'  doc.txt              # find lines with dog or cat
$ grep 'dog house\|cat fight' doc.txt     # find lines with "dog house" or "cat fight"

You can do a crude form of alternation using the -e option to give the alternatives (as many as you like) in the grep family of programs:

$ fgrep -e 'dog' -e 'cat' doc.txt         # find lines containing dog or cat
$ grep -e '^dog$' -e '^cat$' doc.txt      # find lines with *only* dog or cat

The or | operator binds very loosely. Everything else has higher precedence:

$ grep -E '^a|b$' doc.txt                 # lines starting with a or ending with b

5.4 Extended Feature: Grouping with parentheses a(b|c)dIndexup to index

Parentheses ( and ) are an Extended feature that can be used to group Regular Expressions for repetition, and to override the precedence rules.

$ egrep 'ab|cd' doc.txt              # ab or cd
$ egrep 'a(b|c)d' doc.txt            # a followed by "b or c" followed by d

$ grep -E '^a|b$' doc.txt            # lines starting with a or ending with b
$ grep -E '^(a|b)$' doc.txt          # lines containing only a or only b

$ egrep 'Bar(bar)+a' doc.txt         # Barbara, Barbarbara, etc.

(Visit Barbara at the Rhababer-Barbara-Bar.)

5.5 Extended Feature: Tags or Backreferences; \1 \2 \3Indexup to index

Another extended regular expression feature allows you to match later what matched earlier in a pattern:

6 Regular Expression PrecedenceIndexup to index

As in mathematics, Regular Expression precedence can be overridden with explicit parentheses to do grouping.

Precedence rules summary (BEDMAS for Regexp)
Operation Regex Algebra

grouping

() or \(\)

parentheses

brackets

repetition

* or ? or + or {n} or {n,} or {n,m}

* or \? or \+ or \{n\} or \{n,\} or \{n,m\}

exponentiation

concatenation

ab

multiplication or division

alternation

or \|

addition or subtraction

7 Backslash to remove regexp meaning of a meta-character: \.Indexup to index

To remove the Regular Expression meaning of any Regular Expression meta character, put a backslash in front of it. This applies to both Basic and Extended Regular Expressions. In all types of Regular Expressions:

In Extended Regular Expressions, you need more backslashes to hide the additional Extended Regular Expression meta-characters, e.g. \+ hides the meaning of + and matches a real plus sign in an Extended Regular Expression, just as \? matches a real question mark:

$ egrep 'foo\++` doc.txt        # match one or more plus signs (Extended)
$ grep 'foo+\+` doc.txt         # match one or more plus signs (Basic)

$ egrep 'foo\??` doc.txt        # match an optional question mark (Extended)
$ grep 'foo?\?` doc.txt         # match an optional question mark (Basic)

8 Regular Expression Traps and PitfallsIndexup to index

8.1 POSIX character class names are indivisibleIndexup to index

The POSIX class name includes the surrounding colons and square brackets and nothing should ever be placed inside those brackets. This is a common mistake:

grep '[[^:digit:]]'    # WRONG ! no longer a POSIX class name !
grep '[^[:digit:]]'    # correct - match any single non-digit character

Using what you think is a POSIX character class outside of the enclosing character class square brackets does not work. On some systems, grep will warn you that it doesn’t work:

$ grep '[:alnum:]'       # WRONG !
grep: character class syntax is [[:space:]], not [:space:]

On other systems, the character class expression will quietly match the list of characters inside the outer square brackets, i.e. match one of the characters :, a, l, n, u, or m!

8.2 Regexp matches are as long as possibleIndexup to index

Any Regular Expression match will be as long as possible. They are called “greedy”:

8.3 Don’t use repeat operators at line boundaries in grepIndexup to index

All the expressions below match the same set of lines containing a letter a, but the first expression uses a lot less processing power than the others:

$ grep 'a'      file.txt    # this is the cleanest and fastest one
$ grep 'aa*'    file.txt
$ grep 'a.*'    file.txt
$ grep '.*a'    file.txt
$ grep '.*a.*'  file.txt

If you’re looking for lines containing a piece of text, don’t complicate the regexp with repeat operators that waste computer time but don’t change which lines the regexp finds.

8.4 Unix/Linux regex processing is line basedIndexup to index

8.5 Regular Expressions match anywhere in a line: anchoring with ^ and $Indexup to index

Unlike GLOB Patterns, which are anchored, Regular Expressions are not anchored unless you make them so using the explicit anchor characters ^ and/or $. Unanchored Regular Expressions “float” down the string until a match is found, and they don’t have to extend to the end of the string.

$ echo a*b                  # anchored: matches axb not abx or xab
$ ls | grep '^a.*b$'        # equivalent anchored Regular Expression
$ ls | grep 'a.*b'          # NOT equivalent unanchored Regular Expression

Regular Expressions “float” down the string unless they are anchored.

8.6 Expressions matching zero length strings match everywhereIndexup to index

8.7 Quote all regexp to hide them from the shellIndexup to index

This Regular Expression below sometimes works, and sometimes does not, depending on what file names match the aa* GLOB pattern in the current directory:

grep aa* foo.txt                      # no quotes, GLOB expands: bad idea

8.8 Alphabetic ranges are not well-defined in all LocalesIndexup to index

9 Regular Expressions in programs: vi sed lessIndexup to index

vi reference: http://www.tutorialspoint.com/unix/unix-vi-editor.htm

You can search and replace in vi using a Basic Regular Expression in a Substitution line command. The substitution command by default uses slashes to delimit the text to match and the replacement text:

:%s/colou\?r/COLOUR/g      # make all color and colour upper-case

The program sed (Stream EDitor) can apply a Basic Regular Expression substitution non-interactively by reading a file (or standard input) and writing to standard output:

$ sed -e 's/colou\?r/COLOUR/g' input.txt >output.txt

You can search using Regular Expressions in the interactive programs vi, more, and less (and also man, that uses less) by typing a slash followed by the Regular Expression to search for:

/^ *read                    # find "read" at the start of a line

(Remember that vi and more use Basic Regular Expressions and less uses Extended Regular Expressions.)

9.1 Example: capitalize sentences repeatedly in a document using viIndexup to index

Task: Any lower-case letter following a period and two spaces should be made upper-case. Easy to do using Regular Expressions in vi:

9.2 Example: uncapitalize in middle of wordsIndexup to index

Any upper-case character following a lower case character should be made lower case, e.g. uNcapitalize or aWkward or iN

Advanced: In vim you can also use the syntax /[[:lower:]][[:upper:]]/b1 to both match the text and move the cursor right one position. Then you can just repeat the two characters n. as many times as necessary. The vim editor has very advanced pattern search and cursor position capabilities; type :help regexp

10 Regular Expression ResourcesIndexup to index

10.1 http://lynda.comIndexup to index

Lynda.com has a course on regular expressions

The problem is that it covers our material as well as some more advanced topics that we won’t cover

It is a good presentation, and the following chapters should have minimal references to the “too advanced” material

10.2 Interactive Regular Expression TutorialIndexup to index

For a quick interactive tutorial on Regular Expressions, see http://regexone.com/ but be aware that this tutorial uses some short-hand expressions that we don’t use in this course because they don’t work everywhere:

Shortcut POSIX Character Class
\w similar to [[:alnum:]_]
\W similar to [^[:alnum:]_]
\s similar to [[:space:]]
\S similar to [^[:space:]]

The tutorial does not use or understand the POSIX character classes that are more standard in Unix/Linux programs.

Author: 
| Ian! D. Allen  -  idallen@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/

Plain Text - plain text version of this page in Pandoc Markdown format

Campaign for non-browser-specific HTML   Valid XHTML 1.0 Transitional   Valid CSS!   Creative Commons by nc sa 3.0   Hacker Ideals Emblem   Author Ian! D. Allen