Introduction:

Perl is a language which is commonly used for file processing. Regular expressions provide pattern matching and extraction support. Regular expressions can be used in Perl for matching data, not just within files. Perl’s regular expression support is very extensive and rich.

Requirements:

A Perl interpreter, such as ActiveState Perl. Perl may be installed by default on Linux.

Procedure:

If you have something you are matching that you want extracted, you can capture it:

($firstStringToCapture, $secondStringToCapture, $thirdStringToCapture) = $data =~ /matchString(firstExtractString)matchString(secondExtractString)matchString(thirdExtractString)/;

Within the regular expression, extractions are enclosed by ( ). Once you have captured the string, you can use it for later analysis or processing. If you refer to a variable that was not matched, it will still be undefined. You can check this with the defined() function:

if (defined($variable)){ 
print $variable; 
}

You can also refer to extractions with $1$2, etc:

$data =~ /matchString(firstExtractString)/; 
print $1;

Perl supports basic and extended regular expressions. You can terminate a regular expression with modifiers such as s and m to specify how the regular expression treats the data. For instance:

$data =~ /stringToMatch/s;

or

$data =~ /stringToMatch/m;

or both:

$data =~ /stringToMatch/sm;

s treats the data being matched as a single line. This means the regular expression metacharacter will match the “\n” newline as a character. m treats the data as multiple lines, so “\n” will not be matched by the .. This is so that you can identify the “\n” specifically between lines. With s, the ^ and $ metacharacters match the start and end of the line. With m$ and ^ match the start and end of each line. If you would like to match the beginning and end of the data string with m, you can use \A\Z, and \z. These can be used such as:

$data =~ /\AstringToMatch/m;

or

$data =~ /stringToMatch\Z/m;

or

$data =~ /stringToMatch\z/m;

\Z includes matching the “\n” newline, \z only matches the end.