LiveFire Labs - Online UNIX Training - Company Logo


Online UNIX Training with Hands-on Internet Lab


"Eliminate the expense and inconvenience of classroom training without eliminating the classroom experience."

 


Home
Internet Lab
Console Access
Sample Course

Student Login


LiveFire Labs' UNIX Tip, Trick, or Shell Script of the Week - View


Questions? Call
1.888.843.1637 or send us email

July 7, 2003 - A Brief Introduction to Regular Expressions

One of the examples in last week's tip used the following awk statement to extract, from a file named unixfile, lines (records) that contained the string "learn" in them:

awk '/learn/ { print $2 " " $1 }' unixfile
The string "learn" in this statement is a regular expression that is delimited on each end by the forward slash (/) character.  In addition to awk, regular expressions are often used with other UNIX utilities such as grep, sed, and vi.

Regular expressions, often abbreviated as regex or regexp, describe a pattern or particular sequence of characters and are used to search for and replace strings.

Most characters used in a regex will represent themselves, but there are special characters (known as metacharacters) that take on special meaning in the context of the UNIX utility/tool in which they are used.

Since the topic of regular expressions is quite extensive, this brief overview will only focus on two of its frequently used positional or anchor metacharacters, the caret (^) and the dollar sign ($).

The caret is used to match at the beginning of a line, and the dollar sign is used to match at the end of a line.  Carets will logically be found on the left-hand side of a regex, and dollar signs on the right.

To demonstrate the usage of these two positional metacharacters, the same data file used for last week's tip will be used again this week.  The only change made was the insertion of 4 blank lines between each line of text.  The file unixfile now contains the following data:

unix training

learn unix

unix class

learning unix

unix course

Using grep, all lines in unixfile that begin with "unix" will be extracted with the help of the caret metacharacter:

# grep '^unix' unixfile
unix training
unix class
unix course

Removing the caret from the beginning of the regex and adding a dollar sign to the end will cause grep to display lines ending with "unix":

# grep 'unix$' unixfile
learn unix
learning unix
These two metacharacters can also be combined in a single regex to identify/manipulate blank lines.  The -c option for grep will be used with a regex containing both the caret and the dollar sign to count the number of blank lines in unixfile:

# grep -c '^$' unixfile
4

You may recognize that this regex would be useful for removing blank lines from a file when needed.

Experienced UNIX system administrators and shell script programmers understand that becoming skilled in the use of regular expressions is essential for using standard UNIX utilities (e.g. grep, awk, sed, and vi) to their fullest potential.
 

Learn more...
  

If you are new to the UNIX or Linux operating system and would like to learn more, you may want to consider registering for LiveFire Labs' UNIX and Linux Operating System Fundamentals online training course.

Our innovative hands-on training model allows you to learn UNIX by completing hands-on exercises on real servers in our Internet Lab.


More Tips...

· 
Popular UNIX Tips from the Past

spacer Box Border
 

Receive the UNIX Tip, Trick, or Shell Script of the Week by Email


First Name:


Email Address:






   1.888.843.1637

Home - Contact us - Company info - Privacy Statement   

 
©2002-2003 LiveFire Labs.  All rights reserved.
Linux® is a registered trademark of Linus Torvalds, author and developer of this public domain operating system.
UNIX® is a registered trademark of The Open Group in the United States and other countries.