regcomp() — Compile Regular Expression


#include <regex.h>
int regcomp(regex_t *preg, const char *pattern, int cflags);

Language Level: XPG4

Threadsafe: Yes.

Locale Sensitive: The behavior of this function might be affected by the LC_CTYPE and LC_COLLATE categories of the current locale. This function is not available when LOCALETYPE(*CLD) is specified on the compilation command. For more information, see Understanding CCSIDs and Locales.


The regcomp() function compiles the source regular expression pointed to by pattern into an executable version and stores it in the location pointed to by preg. You can then use the regexec()function to compare the regular expression to other strings.

The cflags flag defines the attributes of the compilation process:

cflag Description String
  • When LOCALETYPE(*LOCALE) is specified, the newline character of the integrated file system will be matched by regular expressions.
  • When LOCALETYPE(*LOCALEUTF) is specified, the database newline character will be matched.
If the REG_ALT_NL flag is not set, the default for LOCALETYPE(*LOCALE) is to match the database newline, and the default for LOCALETYPE(*LOCALEUTF) is to match the integrated file system newline.
For UTF-8 and UTF-32, the newline character of the integrated file system and the database newline character are the same.
REG_EXTENDED Support extended regular expressions.
REG_NEWLINE Treat newline character as a special end-of-line character; it then establishes the line boundaries matched by the ] and $ patterns, and can only be matched within a string explicitly using \n. (If you omit this flag, the newline character is treated like any other character.)
REG_ICASE Ignore case in match.
REG_NOSUB Ignore the number of subexpressions specified in pattern. When you compare a string to the compiled pattern (using regexec()), the string must match the entire pattern. The regexec() function then returns a value that indicates only if a match was found; it does not indicate at what point in the string the match begins, or what the matching string is.

Regular expressions are a context-independent syntax that can represent a wide variety of character sets and character set orderings, which can be interpreted differently depending on the current locale. The functions regcomp(), regerror(), regexec(), and regfree() use regular expressions in a similar way to the UNIX awk, ed, grep, and egrep commands.

Return Value

If the regcomp() function is successful, it returns 0. Otherwise, it returns an error code that you can use in a call to the regerror() function, and the content of preg is undefined.

Example that uses regcomp()

Start of change#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
   regex_t    preg;
   char       *string = "a very simple simple simple string";
   char       *pattern = "\\(sim[a-z]le\\) \\1";
   int        rc;
   size_t     nmatch = 2;
   regmatch_t pmatch[2];
   if (0 != (rc = regcomp(&preg, pattern, 0))) {
      printf("regcomp() failed, returning nonzero (%d)\n", rc);
   if (0 != (rc = regexec(&preg, string, nmatch, pmatch, 0))) {
      printf("Failed to match '%s' with '%s',returning %d.\n",
      string, pattern, rc);
   else {
      printf("With the whole expression, "
             "a matched substring \"%.*s\" is found at position %d to %d.\n",
             pmatch[0].rm_eo - pmatch[0].rm_so, &string[pmatch[0].rm_so],
             pmatch[0].rm_so, pmatch[0].rm_eo - 1);
      printf("With the sub-expression, "
             "a matched substring \"%.*s\" is found at position %d to %d.\n",
             pmatch[1].rm_eo - pmatch[1].rm_so, &string[pmatch[1].rm_so],
             pmatch[1].rm_so, pmatch[1].rm_eo - 1);
   return 0;
      The output should be similar to :
      With the whole expression, a matched substring "simple simple" is found
      at position 7 to 19.
      With the sub-expression, a matched substring "simple" is found
      at position 7 to 12.
}End of change

Related Information

[ Top of Page | Previous Page | Next Page | Contents | Index ]