re_comp() — Compile regular expression
Standards
Standards / Extensions | C or C++ | Dependencies |
---|---|---|
XPG4.2 | both |
Format
#define _XOPEN_SOURCE_EXTENDED 1
#include <re_comp.h>
char *re_comp(const char *string);
General description
The re_comp() function converts a regular expression string into an internal form suitable for pattern matching by re_exec().
The parameter string is a pointer to a character string defining a source regular expression to be compiled.
If re_comp() is called with a NULL argument, the current regular expression remains unchanged.
- The re_comp() and re_exec() functions are supported on the thread-level. They must be issued from the same thread to work properly.
-
The re_comp() and re_exec() functions are provided for historical reasons. These functions were part of the Legacy Feature in Single UNIX Specification, Version 2. They have been withdrawn and are not supported as part of Single UNIX Specification, Version 3. New applications should use the newer functions fnmatch(), glob(), regcomp() and regexec(), which provide full internationalized regular expression functionality compatible with IEEE Std 1003.1-2001.
- The z/OS® UNIX implementation of the re_comp() function supports only the POSIX locale. Any other locales will yield unpredictable results.
The re_comp() function supports simple regular expressions, which are defined below.
Simple regular expressions: A Simple Regular Expression (SRE) specifies a set of character strings. The simplest form of regular expression is a string of characters with no special meaning. A small set of special characters, known as metacharacters, do have special meaning when encountered in patterns.
- An ordinary character c (not a special character) is a one character regular expression that matches itself.
- A backslash (\) followed by any special character (that is, \c where c is
any special character) is a one character regular expression that
matches the special character itself. The special characters are:
- ., *, [, and \ (period, asterisk, left square bracket, and backslash, respectively) which are always special, except when they appear within square brackets ([]).
- ^(caret or circumflex), which is special at the beginning of the entire regular expression, or when it immediately follows the left of a pair of square brackets ([]).
- $ (dollar symbol), which is special at the end of the regular expression.
- The character used to bound (delimit) an entire regular expression, which is special for that regular expression.
Note: A backslash (\) followed by an ordinary character is a one character regular expression that matches the ordinary character itself. - A period (.) is a one-character RE that matches any character, except newline.
- A non-empty string within square brackets ([string])
is a one-character RE that matches any one character in that string.
Thus, [abc], if compared to other strings, would
match any which contained a, b, or c.
If the caret symbol (^) is the first character of the string within square brackets (that is, [^string]), the one-character RE matches any characters except newline and the remaining characters within the square brackets. Thus, [^abc], if compared to other strings, would fail to match any which contains even one a, b, or c.
Ranges may be specified as c–c. The hyphen symbol, within square brackets, means "through". It may be used to indicate a range of consecutive ASCII characters. For example, [0–9] is equivalent to [0123456789].
The – (hyphen) can be used by itself, but only if it is the first (after an initial ^, if any), or last character in the expression.
The right square bracket (]) can be used as part of the string but only if it is the first character within it (after an initial ^, if any). For example, the expression []a–d] matches either a right square bracket or one of the characters a through d.
- A one-character RE is a RE that matches whatever the one-character RE matches.
- A one-character RE followed by an asterisk symbol (*) is a RE that matches 0 or more occurrences of the one-character RE. For example, (a*e) will match any of the following: e, ae, aaaaae. The longest leftmost match is chosen.
- A one-character RE followed by \{m\},
\{m,\}, or \{m,u\}
is a RE that matches a range of occurrences of the one-character RE.
Nonnegative integer values enclosed in \{\} indicate the number of
times to apply the preceding one-character RE. m is
the minimum number and u is the maximum
number. u must be less than 256. If you
specify only m, it indicates the exact number
of times to apply the regular expression.
\{m,\} is equivalent to \{m,u\}. They both match m or more occurrences of the expression. The * (asterisk) operation is equivalent to \{0,\}.
The maximum number of occurrences is matched.
- REs can be concatenated. The concatenation of REs is a RE that matches the concatenation of the strings matched by each component of the RE.
- A RE enclosed between the character sequences \( and\) is a RE that matches whatever the unadorned RE matches. The \( and \) sequences are ignored.
- The expression \n (where 1 <= n <= 9) matches the same string of characters as was matched by an expression enclosed between \( and \) earlier in the same regular expression. The sub-expression it specified is that beginning with the nth occurrence of \( counting from the left. For example, in the expression, \(a\)r\(e\)\1, the \1 is equivalent to a, giving area.
- A caret (^) at the beginning of an entire RE constrains that RE to match an initial segment of a line.
- A dollar symbol ($) at the end of an entire RE constrains that RE to match a final segment of a line. For example, the construct ^entire RE$ constrains the entire RE to match the entire line.
Returned value
If the string pointed to by the string argument is successfully converted, re_comp() returns a NULL pointer.
If unsuccessful, re_comp() returns a pointer to an error message string (NULL-terminated).
EDC7008E No previous regular expression
EDC7009E Regular expression too long
EDC7010E \(\) imbalance
EDC7011E \{\} imbalance
EDC7012E [] imbalance
EDC7013E Too many \(\) pairs.
EDC7014E Incorrect range values in \{\}
EDC7015E Back reference number in \digit incorrect
EDC7016E Incorrect endpoint in range expression