Regular expressions 101
What are regular expressions?
A regular expression is a special string used to describe or match a search pattern or set of strings. Regular expressions use certain syntax rules as outlined in the section below, “Basic regular expressions syntax”.
Regular expression can be abbreviated as:
- regexp
- regex
- regxp
Plural forms of regular expressions are:
- regexps
- regexes
- regexen
Many programming languages, such as PHP and Perl, support regular expressions for string manipulation. PHP functions such as preg_replace, preg_match, ereg etc use regular expressions to search and manipulate strings of text. The example below uses the php function preg_split to split the string, “regular expressions are fun” at the letters “r” and “a”.
preg_split('/[ra]/','regular expressions are fun')
Regular expressions are also used in .htaccess rewrite rules. They are particularly useful when redirecting one page to another. For example, the regular expression below is used in rewrite rules to redirect all .htm pages to .html pages.
RewriteRule ^(.+)\.htm /$1\.html [R]
Many text editors make use of regular expressions to search and manipulate bodies of text. BBEdit by Barebones (Mac) and TextPad (Windows) are examples such text editors.
Basic regular expressions syntax
The table below lists the common regular expression syntax and special characters.
| Symbol | Symbol (in words) | Meaning |
|---|---|---|
| . | full stop | match any character |
| * | asterix | match zero or more of the previous symbol |
| + | plus | match one or more of the previous symbol |
| ? | question | match zero or one of the previous symbol or character |
| \? | backslash-something | match special characters |
| ^ | caret | match the start of a string |
| $ | dollar | match the end of a string |
| | | pipe | match alternates |
| {x} | curly brackets | match the previous symbol x number times (x must be a number) |
| [set] | square brackets | match any one of the symbols inside the square braces. |
| [^ ] - | caret within square brackets | match single charater that is not contained in the square bracket |
| (pattern) | round brackets | grouping, remember what the pattern matched as a special variable |
Examples of regular expressions
The following table show examples of regular expressions.
| Regexp | Meaning |
|---|---|
| .* | matches zero or more of any character |
| .+ | matches one or more of any character |
| a+ | matches one or more of a, such as “a”, “aaaa”, “aaaaaaaaaaaa”, but not “bbb” |
| [ab]+ | matches “a”, “b”, or any length and combination of the two |
| \.s?html? | matches “.htm”, “.shtm”, “.html” or “.shtml” |
| /2005/(.+) | matches “/2005/” and any pattern consisting of one or more characters, also stores the pattern after “/2005/” in $1 |
| hello|hi|hey | matches hello, hi or hey |
| [a-z] | matches any lower case character |
| [^abc] | matches any single character that is NOT a, b or c |
| [^a-z] | matches any single character that is NOT lowercase |
Useful regular expressions
Use the following regular expression to match any email address. This regular expression is useful for validating email addresses in contact forms.
[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}

