Regular expressions 101

Filed under: htaccess, PHP, Misc. Tips

What are regular expressions?

A regular expression is a special string used to describe or match a search pattern or set of strings. Regular expressions use certain syntax rules as outlined in the section below, “Basic regular expressions syntax”.

Regular expression can be abbreviated as:

  • regexp
  • regex
  • regxp

Plural forms of regular expressions are:

  • regexps
  • regexes
  • regexen


When do you use them?

Many programming languages, such as PHP and Perl, support regular expressions for string manipulation. PHP functions such as preg_replace, preg_match, ereg etc use regular expressions to search and manipulate strings of text. The example below uses the php function preg_split to split the string, “regular expressions are fun” at the letters “r” and “a”.

preg_split('/[ra]/','regular expressions are fun')

Regular expressions are also used in .htaccess rewrite rules. They are particularly useful when redirecting one page to another. For example, the regular expression below is used in rewrite rules to redirect all .htm pages to .html pages.

RewriteRule ^(.+)\.htm /$1\.html [R]

Many text editors make use of regular expressions to search and manipulate bodies of text. BBEdit by Barebones (Mac) and TextPad (Windows) are examples such text editors.

Basic regular expressions syntax

The table below lists the common regular expression syntax and special characters.

Symbol Symbol (in words) Meaning
. full stop match any character
* asterix match zero or more of the previous symbol
+ plus match one or more of the previous symbol
? question match zero or one of the previous symbol or character
\? backslash-something match special characters
^ caret match the start of a string
$ dollar match the end of a string
| pipe match alternates
{x} curly brackets match the previous symbol x number times (x must be a number)
[set] square brackets match any one of the symbols inside the square braces.
[^ ] - caret within square brackets match single charater that is not contained in the square bracket
(pattern) round brackets grouping, remember what the pattern matched as a special variable

Examples of regular expressions

The following table show examples of regular expressions.

Regexp Meaning
.* matches zero or more of any character
.+ matches one or more of any character
a+ matches one or more of a, such as “a”, “aaaa”, “aaaaaaaaaaaa”, but not “bbb”
[ab]+ matches “a”, “b”, or any length and combination of the two
\.s?html? matches “.htm”, “.shtm”, “.html” or “.shtml”
/2005/(.+) matches “/2005/” and any pattern consisting of one or more characters, also stores the pattern after “/2005/” in $1
hello|hi|hey matches hello, hi or hey
[a-z] matches any lower case character
[^abc] matches any single character that is NOT a, b or c
[^a-z] matches any single character that is NOT lowercase

Useful regular expressions

Use the following regular expression to match any email address. This regular expression is useful for validating email addresses in contact forms.

[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}

divider

Leave a Comment