Perl + regular expression and pattern matching
Pattern Matching and Regular Expressions
If you were paying attention, you noticed a huge loophole in the programs above: there’s nothing to prevent you from typing in a string variable when you’re supposed to be typing in a number. You can type in “dog” and “cat”, and the program will try to add “dog” and “cat” (which, if you’re curious, gives a result of zero.) You need some way to check to make sure that the person actually typed in numbers; then, if they didn’t, you can ask them again (with a looping control structure), until they get it right.
Welcome to the concepts of pattern matching and regular expressions, two of Perl’s powerful text-processing tools. Let’s start with a simple pattern first: one letter. If you want to test a variable to see if it contains the (lower-case) letter “z”, use this syntax:
if ($x =~ /z/) {
print “$x has a z in it!\n”;
}
Let’s take that apart: if is just like while, except it only checks once (that is, it won’t loop around again and again.) Like while, it will execute every command inside the curly brackets if the statement inside the parentheses is true.
The statement inside the parentheses works like this: =~ makes a comparison between $x and whatever’s inbetween the two slashes; in this case, if there’s a z anywhere inside $x, then the statement is true.
Let’s up the ante, and match only if $x begins with the letter z:
if ($x =~ /^z/) {
print “$x begins with a z!\n”;
}
^z is a regular expression; the carat (^) stands for the beginning of the string. Thus, the matching statement has to find a z immediately following the beginning of the string in order to be true.
How about words that begin with z and end with e? Use the regexp
/^z.*e$/
The $ stands for the end of the string; the period stands for “any character whatsoever”; combined with the asterisk, it means “zero or more characters.” Without the asterisk,
/^z.e$/
would mean “z followed by one character followed by e.”
There’s a lot of different regular expressions. For instance,
/^z.+e$/
means “z followed by at least one character, followed by e.”
/^z\w*e$/
means “z followed by zero or more word characters followed by e”–that is, “z!e” wouldn’t match.
So to make sure that somebody’s typing in numbers in our adding program, and not words, make the subroutine getnumber look like this:
sub getnumber {
$number = “blah”;
while($number =~ /\D/){
print “Enter a number “;
$number = <>;
chop($number);
}
$number;
}
“\D” is the regular expression for non-digits; if any character in $number is not 0-9, the expression won’t match, and you’ll get asked to enter a number again.
Note how we had to set $number to include a non-digit ($number = “blah”) to get inside the loop the first time around.