|
|
July
7, 2003 -
A Brief Introduction to Regular
Expressions
|
One
of the examples in last week's tip used
the following awk statement to extract,
from a file named unixfile, lines
(records) that contained the string
"learn" in them:
|
awk '/learn/
{ print $2 " " $1 }' unixfile
|
The string
"learn" in this statement is a
regular expression that is delimited on
each end by the forward slash (/)
character. In addition to awk,
regular expressions are often used with
other UNIX utilities such as grep, sed,
and vi.
Regular expressions, often abbreviated as
regex or regexp, describe a pattern or
particular sequence of characters and are
used to search for and replace strings.
Most characters used in a regex will
represent themselves, but there are
special characters (known as
metacharacters) that take on special
meaning in the context of the UNIX
utility/tool in which they are used.
Since the topic of regular expressions is
quite extensive, this brief overview will
only focus on two of its frequently used
positional or anchor metacharacters, the
caret (^) and the dollar sign ($).
The caret is used to match at the
beginning of a line, and the dollar sign
is used to match at the end of a
line. Carets will logically be found
on the left-hand side of a regex, and
dollar signs on the right.
To demonstrate the usage of these two
positional metacharacters, the same data
file used for last week's tip will be used
again this week. The only change
made was the insertion of 4 blank lines
between each line of text. The file
unixfile now contains the following data:
|
unix
training
learn unix
unix class
learning unix
unix course
|
Using grep, all
lines in unixfile that begin with "unix"
will be extracted with the help of the caret
metacharacter:
|
# grep '^unix' unixfile
unix training
unix class
unix course
|
|
Removing the
caret from the beginning of the regex and
adding a dollar sign to the end will cause
grep to display lines ending with "unix":
|
# grep 'unix$' unixfile
learn unix
learning unix
|
These two
metacharacters can also be combined in a
single regex to identify/manipulate blank
lines. The -c option for grep will be
used with a regex containing both the caret
and the dollar sign to count the number of
blank lines in unixfile:
|
# grep -c '^$' unixfile
4
|
You may
recognize that this regex would be useful
for removing blank lines from a file when
needed.
Experienced UNIX system administrators and
shell script programmers understand that
becoming skilled in the use of regular
expressions is essential for using standard
UNIX utilities (e.g. grep, awk, sed, and vi)
to their fullest potential.
|
|
|
Learn
more...
If you are new to the UNIX or Linux
operating system and would like to learn
more, you may want to consider
registering for LiveFire Labs' UNIX
and Linux Operating System Fundamentals
online training course.
Our innovative hands-on training model
allows you to learn
UNIX by completing hands-on
exercises on real servers in our Internet
Lab.
More
Tips...
· Popular
UNIX Tips from the Past
|
|
|
|
 |
 |
| |
Receive
the UNIX Tip, Trick, or Shell Script of the
Week by Email
|
|
|
|