Regular expression notation for HTTP Server
This topic provides a general overview of regular expression notation for the IBM® HTTP Server for i Web server.
A regular expression notation specifies a pattern of character strings. One or more regular expressions can be used to create a matching pattern. Certain characters (sometimes called wildcards) have special meanings. The following table describes the commonly used pattern matching scheme.
Regular expression pattern matching
Pattern | Description |
---|---|
string | string with no special characters matches the values that contain the string. |
[set] | Match a single character specified by the set of single characters within the square brackets. |
[a-z] | Match a character in the range specified within the square brackets. |
[^abc] | Match any single character not specified in the set of single characters within the square brackets. |
{n} | Match exactly n times. |
{n,} | Match at least n times. |
{n,m} | Match at least n times, but no more than m times. |
^ | Match the start of the string. |
$ | Match the end of the string. |
. | Match any character (except Newline). |
* | Match zero or more of preceding character. |
+ | Match one or more of preceding character. |
? | Match one or zero of preceding character. |
string1|string2 | Match string1 or string2. |
\ | Signifies an escape character. When preceding any of the characters that have special meaning, the escape character removes any special meaning from the character. For example, the backslash is useful to remove special meaning from a period in an IP address. |
(group) | Group a character in a regular expression. If a match is found the first group can be accessed using $1. The second group can be accessed using $2 and so on. |
(?<name>regex) | Named capturing group. Captures the text matched by "regex" into the group "name". The name can contain letters and numbers but must start with a letter. |
\1 through \9 | Backreference. Substituted with the text matched between the 1st through 9th numbered capturing group. |
\10 through \99 | Backreference. Substituted with the text matched between the 10th through 99th numbered capturing group. |
\w | Match an alphanumeric character. |
\W | Match a character that is not an alphanumeric character. |
\s | Match a white-space character. |
\S | Match a character that is not a white space character |
\t | Tab character. |
\n | Newline character. |
\r | Return character. |
\f | Form feed character. |
\v | Vertical tab character. |
\a | Bell character. |
\b | word boundary |
\B | not a word boundary |
\0dd | Octal character, for example \076 matches character
">". Note: d must between 0 and 7
|
\ddd | Octal character, for example \101 matches character
"A". Note: d must between 0 and 7
|
\o{ddd..} | Octal character, for example \o{123} matches
character "S" Note: d must between 0 and 7
|
\xnn | Hex character, for example \x41 matches character "A". |
\cx | Control character, for example \cJ matches newline
character "\n". Note: x is any ASCII printing character
|
\d | Match a decimal digit |
\D | Match a character that is not a decimal digit |
\Q...\E | Escape sequence. Characters between \Q and \E are treated as literals |
Examples of regular expression pattern matching
Pattern | Examples of strings that match |
---|---|
ibm | ibm01, myibm, aibmbc |
^ibm$ | ibm |
^ibm0[0-4][0-9]$ | ibm000 through ibm049 |
ibm[3-8] | ibm3, myibm4, aibm5b |
^ibm | ibm01, ibm |
ibm$ | myibm, ibm, 3ibm |
ibm... | ibm123, myibmabc, aibm09bcd |
ibm*1 | ibm1, myibm1, aibm1abc, ibmkkkkk12 |
^ibm0.. | ibm001, ibm099, ibm0abcd |
^ibm0..$ | ibm001, ibm099 |
10.2.1.9 | 10.2.1.9, 10.2.139.6, 10.231.98.6 |
^10\.2\.1\.9$ | 10.2.1.9 |
^10\.2\.1\.1[0-5]$ | 10.2.1.10, 10.2.1.11, 10.2.1.12, 10.2.1.13, 10.2.1.14, 10.2.1.15 |
^192.\.168\..*\..*$ | (All addresses on class B subnet 192.168.0.0) |
^192.\.168\.10\..*$ | (All addresses on class C subnet 192.168.10.0) |