One of the strongest points of the Perl programming language is its powerful and efficient pattern matching capability. We will get a flavor for pattern matching capabilities of Perl in this section. Pattern matching allows one to look for specific textual patterns inside a string. The patterns themselves are expressed in terms of what are called regular expressions.
The following program prompts the user for a text file name and then prints those lines that have the five vowels in order. The vowels can be separated by zero or more intervening characters.
Program 1.18
#!/usr/bin/perl #file scanvowels1.pl #Prompts for a file name and prints all those lines in #that file that contain the vowels in order. print "Please give a file name >> "; $filename =; chomp $filename; open (IN, $filename); while ( ){ if ($_ =~ m/a.*e.*i.*o.*u/){ print $_; } }
The while loop reads every line of the text file. In any iteration, the current line is read as a string and is assigned to the special variable $_. The conditional of the if statement is a pattern matching operation. The syntax of a pattern matching operation is the following.
string-variable =~ m/regular-expression/[qualifier]
The pattern matching is performed on string-variable. In the example, pattern matching is performed on the value of $_ or the current line of input from the text file. regular-expression is the pattern that the program searches for in string-variable. The regular expression or the pattern is enclosed between m/ and /. Optionally there can be one or more qualifiers after the second /. In this example, there is no qualifier.
There is an elaborate syntax regarding how a pattern is stated. We have a whole chapter devoted to this topic later in the book, namely, ChapterÊ4. We get just a glimpse in this section. The regular-expression may have certain characters that must literally occur in the target string-variable. An example of a literal character is the first letter of the alphabet a. The regular expression may also have some unusual characters such as . that do not stand for any specific literal character. In particular, . is a wild card character that stands for almost any character we can type. Then, there are other characters such as * that do not stand for any character at all, but specify the count or the number of times a preceding character should occur. The regular expression in this example contains all three types of characters.
The pattern a.*e.*i.*o.*u that is being matched in this program is somewhat complex, but if we understand the parts, it turns out to be really simple. The first character in the pattern: a simply specifies that we are looking for one occurrence of the character a anywhere in string-variable. The next two characters in the pattern .* need to be considered together to make sense. This pair specifies that we are looking for zero or more occurrences of any character other than the newline character \n. The period (.) is a shorthand notation for any character other than \n. The quantity is specified by the asterisk (*). The asterisk means zero or more. So, the first three characters in the pattern: a.* instruct Perl to look for one a followed by zero or more non-newline characters.
The next nine characters in the pattern: e.*i.*o.* tell Perl to look for one e followed by zero or more intervening non-newline characters followed by one i followed by zero or more non-newline characters followed by one o followed finally by zero or more non-newline characters.
The final character in the pattern is u which instructs Perl to look for one occurrence of the character u. The pattern doesnÕt care what follows after this first u.
Now, we see that the whole pattern taken together specifies that we are looking for the five vowels occurring in order.
An example run with a specific file is given below. Only the first few of the many lines printed by the program are shown.
%vowels1.pl
Please give a file name >> pattern-matching.tex
looking for patterns in textual documents. We can write
in each of these files where one word or pattern is substituted
there are several thousand files in this directory structure. Suppose
In the first line, the vowels captured are the ones that are in bold face:
looking for patterns in textual documents. We can write
Why these specific occurrences of the vowels are selected are discussed in ChapterÊ4. It is clear that the five vowels occur in sequence. In this specific run, we counted the number of lines in the file
pattern-matching.tex and the number of lines that satisfy the regular expression. There are 3964 lines in the file and 338 of these have the five vowels in sequence. Of course, there may be zero or more intervening characters including vowels between two consecutive vowels of interest.