"Taking a LiveFire Labs' course is an excellent way to learn
Linux/Unix. The lessons are well thought out, the material is
explained thoroughly, and you get to perform exercises on a real
Linux/Unix box. It was money well spent."
Ray S.
Pembrook Pines, Florida
LiveFire Labs' UNIX and Linux Operating System Fundamentals
course was very enjoyable. Although I regularly used UNIX systems
for 16 years, I haven't done so since 2000. This course was a
great refresher. The exercises were fun and helped me gain a real
feel for working with UNIX/Linux OS. Thanks very much!"
Ming Sabourin
Senior Technical Writer
Nuance Communications, Inc.
Montréal, Canada
Read more student testimonials...
Receive UNIX Tips, Tricks, and Shell Scripts by Email
LiveFire Labs' UNIX Tip,
Trick, or Shell Script of the Week
A Brief Introduction to UNIX Regular Expressions
One of the examples in last week's tip used the following awk
statement to extract, from a file named unixfile, lines (records) that
contained the string "learn" in them:
awk '/learn/ { print $2 " " $1 }' unixfile
The string "learn" in this statement is a regular expression that is
delimited on each end by the forward slash (/) character. In addition
to awk, regular expressions are often used with other UNIX utilities
such as grep, sed, and vi.
Regular expressions, often abbreviated as regex or regexp, describe a
pattern or particular sequence of characters and are used to search
for and replace strings.
Most characters used in a regex will represent themselves, but there
are special characters (known as metacharacters) that take on special
meaning in the context of the UNIX utility/tool in which they are
used.
Since the topic of regular expressions is quite extensive, this brief
overview will only focus on two of its frequently used positional or
anchor metacharacters, the caret (^) and the dollar sign ($).
The caret is used to match at the beginning of a line, and the dollar
sign is used to match at the end of a line. Carets will logically be
found on the left-hand side of a regex, and dollar signs on the right.
To demonstrate the usage of these two positional metacharacters, the
same data file used for last week's tip will be used again this week.
The only change made was the insertion of 4 blank lines between each
line of text. The file unixfile now contains the following data:
unix training
learn unix
unix class
learning unix
unix course
Using grep, all lines in unixfile that begin with "unix" will be
extracted with the help of the caret metacharacter:
# grep '^unix' unixfile
unix training
unix class
unix course
Removing the caret from the beginning of the regex and adding a dollar
sign to the end will cause grep to display lines ending with "unix":
# grep 'unix$' unixfile
learn unix
learning unix
These two metacharacters can also be combined in a single regex to
identify/manipulate blank lines. The -c option for grep will be used
with a regex containing both the caret and the dollar sign to count
the number of blank lines in unixfile:
# grep -c '^$' unixfile
4
You may recognize that this regex would be useful for removing blank
lines from a file when needed.
Experienced UNIX system administrators and shell script programmers
understand that becoming skilled in the use of UNIX regular
expressions is essential for using standard UNIX utilities (e.g. grep,
awk, sed, and vi) to their fullest potential.