re_comp() — Compile regular expression

Standards

Standards / Extensions C or C++ Dependencies
XPG4.2 both  

Format

#define _XOPEN_SOURCE_EXTENDED 1
#include <re_comp.h>

char *re_comp(const char *string);

General description

Restriction: This function is not supported in AMODE 64.

The re_comp() function converts a regular expression string into an internal form suitable for pattern matching by re_exec().

The parameter string is a pointer to a character string defining a source regular expression to be compiled.

If re_comp() is called with a NULL argument, the current regular expression remains unchanged.

Strings passed to re_comp() must be terminated by a NULL byte, and may include newline characters.
Notes:
  1. The re_comp() and re_exec() functions are supported on the thread-level. They must be issued from the same thread to work properly.
  2. The re_comp() and re_exec() functions are provided for historical reasons. These functions were part of the Legacy Feature in Single UNIX Specification, Version 2. They have been withdrawn and are not supported as part of Single UNIX Specification, Version 3. New applications should use the newer functions fnmatch(), glob(), regcomp() and regexec(), which provide full internationalized regular expression functionality compatible with IEEE Std 1003.1-2001.

  3. The z/OS® UNIX implementation of the re_comp() function supports only the POSIX locale. Any other locales will yield unpredictable results.

The re_comp() function supports simple regular expressions, which are defined below.

Simple regular expressions: A Simple Regular Expression (SRE) specifies a set of character strings. The simplest form of regular expression is a string of characters with no special meaning. A small set of special characters, known as metacharacters, do have special meaning when encountered in patterns.

The following one-character regular expressions (RE) match a single character:
  1. An ordinary character c (not a special character) is a one character regular expression that matches itself.
  2. A backslash (\) followed by any special character (that is, \c where c is any special character) is a one character regular expression that matches the special character itself. The special characters are:
    1. ., *, [, and \ (period, asterisk, left square bracket, and backslash, respectively) which are always special, except when they appear within square brackets ([]).
    2. ^(caret or circumflex), which is special at the beginning of the entire regular expression, or when it immediately follows the left of a pair of square brackets ([]).
    3. $ (dollar symbol), which is special at the end of the regular expression.
    4. The character used to bound (delimit) an entire regular expression, which is special for that regular expression.
    Note: A backslash (\) followed by an ordinary character is a one character regular expression that matches the ordinary character itself.
  3. A period (.) is a one-character RE that matches any character, except newline.
  4. A non-empty string within square brackets ([string]) is a one-character RE that matches any one character in that string. Thus, [abc], if compared to other strings, would match any which contained a, b, or c.

    If the caret symbol (^) is the first character of the string within square brackets (that is, [^string]), the one-character RE matches any characters except newline and the remaining characters within the square brackets. Thus, [^abc], if compared to other strings, would fail to match any which contains even one a, b, or c.

    Ranges may be specified as cc. The hyphen symbol, within square brackets, means "through". It may be used to indicate a range of consecutive ASCII characters. For example, [0–9] is equivalent to [0123456789].

    The – (hyphen) can be used by itself, but only if it is the first (after an initial ^, if any), or last character in the expression.

    The right square bracket (]) can be used as part of the string but only if it is the first character within it (after an initial ^, if any). For example, the expression []a–d] matches either a right square bracket or one of the characters a through d.

The following rules may be used to construct REs from one character REs:
  1. A one-character RE is a RE that matches whatever the one-character RE matches.
  2. A one-character RE followed by an asterisk symbol (*) is a RE that matches 0 or more occurrences of the one-character RE. For example, (a*e) will match any of the following: e, ae, aaaaae. The longest leftmost match is chosen.
  3. A one-character RE followed by \{m\}, \{m,\}, or \{m,u\} is a RE that matches a range of occurrences of the one-character RE. Nonnegative integer values enclosed in \{\} indicate the number of times to apply the preceding one-character RE. m is the minimum number and u is the maximum number. u must be less than 256. If you specify only m, it indicates the exact number of times to apply the regular expression.

    \{m,\} is equivalent to \{m,u\}. They both match m or more occurrences of the expression. The * (asterisk) operation is equivalent to \{0,\}.

    The maximum number of occurrences is matched.

  4. REs can be concatenated. The concatenation of REs is a RE that matches the concatenation of the strings matched by each component of the RE.
  5. A RE enclosed between the character sequences \( and\) is a RE that matches whatever the unadorned RE matches. The \( and \) sequences are ignored.
  6. The expression \n (where 1 <= n <= 9) matches the same string of characters as was matched by an expression enclosed between \( and \) earlier in the same regular expression. The sub-expression it specified is that beginning with the nth occurrence of \( counting from the left. For example, in the expression, \(a\)r\(e\)\1, the \1 is equivalent to a, giving area.
An entire RE may be constrained to match only an initial segment or final segment of a line (or both).
  1. A caret (^) at the beginning of an entire RE constrains that RE to match an initial segment of a line.
  2. A dollar symbol ($) at the end of an entire RE constrains that RE to match a final segment of a line. For example, the construct ^entire RE$ constrains the entire RE to match the entire line.

Returned value

If the string pointed to by the string argument is successfully converted, re_comp() returns a NULL pointer.

If unsuccessful, re_comp() returns a pointer to an error message string (NULL-terminated).

The following re_comp() error messages are defined:
  EDC7008E No previous regular expression
  EDC7009E Regular expression too long
  EDC7010E \(\) imbalance
  EDC7011E \{\} imbalance
  EDC7012E [] imbalance
  EDC7013E Too many \(\) pairs.
  EDC7014E Incorrect range values in \{\}
  EDC7015E Back reference number in \digit incorrect
  EDC7016E Incorrect endpoint in range expression
Note: The error message string is not to be freed by the application. It will be freed when the thread terminates.

Related information