A Brief Introduction to Regular Expressions

"Taking a LiveFire Labs' course is an excellent way to learn Linux/Unix. The lessons are well thought out, the material is explained thoroughly, and you get to perform exercises on a real Linux/Unix box. It was money well spent."

Ray S.
Pembrook Pines, Florida

LiveFire Labs' UNIX and Linux Operating System Fundamentals course was very enjoyable. Although I regularly used UNIX systems for 16 years, I haven't done so since 2000. This course was a great refresher. The exercises were fun and helped me gain a real feel for working with UNIX/Linux OS. Thanks very much!"

Ming Sabourin
Senior Technical Writer
Nuance Communications, Inc.
Montréal, Canada

Read more student testimonials...

Receive UNIX Tips, Tricks, and Shell Scripts by Email

Custom Search

LiveFire Labs' UNIX Tip, Trick, or Shell Script of the Week

A Brief Introduction to UNIX Regular Expressions

One of the examples in last week's tip used the following awk statement to extract, from a file named unixfile, lines (records) that contained the string "learn" in them:

awk '/learn/ { print $2 " " $1 }' unixfile

The string "learn" in this statement is a regular expression that is delimited on each end by the forward slash (/) character. In addition to awk, regular expressions are often used with other UNIX utilities such as grep, sed, and vi.

Regular expressions, often abbreviated as regex or regexp, describe a pattern or particular sequence of characters and are used to search for and replace strings.

Most characters used in a regex will represent themselves, but there are special characters (known as metacharacters) that take on special meaning in the context of the UNIX utility/tool in which they are used.

Since the topic of regular expressions is quite extensive, this brief overview will only focus on two of its frequently used positional or anchor metacharacters, the caret (^) and the dollar sign ($).

The caret is used to match at the beginning of a line, and the dollar sign is used to match at the end of a line. Carets will logically be found on the left-hand side of a regex, and dollar signs on the right.

To demonstrate the usage of these two positional metacharacters, the same data file used for last week's tip will be used again this week. The only change made was the insertion of 4 blank lines between each line of text. The file unixfile now contains the following data:

unix training

learn unix

unix class

learning unix

unix course

Using grep, all lines in unixfile that begin with "unix" will be extracted with the help of the caret metacharacter:

# grep '^unix' unixfile
unix training
unix class
unix course

Removing the caret from the beginning of the regex and adding a dollar sign to the end will cause grep to display lines ending with "unix":

# grep 'unix$' unixfile
learn unix
learning unix

These two metacharacters can also be combined in a single regex to identify/manipulate blank lines. The -c option for grep will be used with a regex containing both the caret and the dollar sign to count the number of blank lines in unixfile:

# grep -c '^$' unixfile
4

You may recognize that this regex would be useful for removing blank lines from a file when needed.

Experienced UNIX system administrators and shell script programmers understand that becoming skilled in the use of UNIX regular expressions is essential for using standard UNIX utilities (e.g. grep, awk, sed, and vi) to their fullest potential.

Online UNIX Training with Hands-on Internet Lab

Learn UNIX from industry professionals

Practice on real servers

Study at work or home

24/7 global access to lab

Start learning within 24 hours

Receive UNIX Tips, Tricks, and Shell Scripts by Email

LiveFire Labs' UNIX Tip, Trick, or Shell Script of the Week

A Brief Introduction to UNIX Regular Expressions