For a full definition of regular expressions, see our glossary entry. But to get started with this tutorial, all you need to know is that a regular expression is just a form of syntax that allows you to do pattern matching within Perl strings.
Say for instance that someone gives you a file full of customer records and asks you to extract the account reference for each customer. They tell you that a reference starts with a pound symbol (#) and is followed by a special combination of letters and numbers.
All your program needs to do at this stage is read in each record and return true if it contains a customer reference (i.e. a pound symbol) and false if not.
This is a job for Perl’s match, or ‘m//’ operator. Not the ‘m’ operator, but the ‘m//’ operator. Note the double slash? That’s because most programmers use it in the following way:
m//
The expression which defines the thing you want to search for is enclosed (or ‘delimited’) by the forward-slashes. Note that you don’t have to use forward-slashes. Just about any delimiter would do. So, for instance, you could have an ‘m##’ operator or an ‘m!!’ operator. But the convention is to use forward-slashes, so let’s stick with this format for now.
Our expression therefore would take the following form:
m/#/
The thing between the forward slashes is the thing we’re looking for.
Testing the Match Expression with the Binding Operator
Now all we need to do is pass in our customer record and check whether ‘m//’ returns true (found) or false (not found).
Let’s assume that we’ve read in the first line from our file and stored the result in a variable called $line. (If you don’t know how to read in data from files, check out our ‘Reading and Writing’ files tutorial.)
In order to test the results of the match operation, we need to use another operator called the binding operator (=~). This operator tests what’s on its left against the expression:
if ($line =~ m/#/)
{print “True\n”}
else {print “False\n”}
So if $line contains “SMITH,GEORGE A. #1234556”, our little program above will return True. However, if line contains “SMITH,GEORGE A. *1234556”, it will return false.
Here we are only matching on a single character. However, we can search for a whole string if we wish:
if($line #~ m/hello/)
{print “Hello right back at ya.\n”}
else
{print “Be like that then.\n”}
But Perl has many more ways of specifying string expressions for more complex matches which we’ll look at in later sections.
You may also find it useful to know that Perl has a variation on the binding operator (!~) which returns true for no match and false for a match. Use it as follows:
if ($line !~ m/#/)
{print “True\n”}
else
{print “False\n”}
Dealing with Perl’s Metacharacters
Suppose our customer references begin with ‘$’ instead. You would think that our match expression would need to look like this:
m/$/
But unfortunately, you would be wrong! It just so happens that the ‘$’ character is a special case and requires special treatment. In fact, it’s one of Perl’s metacharacters.
Here are Perl’s metacharacters, in all their glory:
^$+*?.|(){}\[]
To use a metacharacter as part of your regular expression, you need to ‘escape’ it.
m/\$/
This can often make regular expressions appear very confusing to the uninitiated. Let’s imagine that you have to match against a backslash, followed by a pipe(‘|’), followed by another backslash, followed by two forward slashes, followed by a caret (‘^’). (Phew! Did you get all that?)
Your match expression would look like this:
m/\\\|\\//\^/
Yuck. Thankfully, situations like these are pretty rare.
Perl (v5 ++) has a rather handy function called quotemeta that can check that you’re escaping metacharacters in your regular expression correctly. Use it like this:
print quotemeta("L00k! @ all these \n[.]**zany characters\n");
Perl escapes all the metacharacters in the expression provided. So the above example will display
L00k\!\ \@\ all\ these\ \
\[\.\]\*\*zany\ characters\
(Note: if you’re using anything other than a metacharacter as the delimiter for your regular expression, you need to include it in the list too. So if your match expression looks like this:
m!#!
Then ‘!’ is added to the list of metacharacters and needs to be escaped accordingly.)
Don't forget Perl's other great support for string manipulation.
