Standards
Standards / Extensions |
C or C++ |
Dependencies |
XPG4.2 |
C only |
|
Format
#define _XOPEN_SOURCE_EXTENDED 1
#include <libgen.h>
char *regcmp(const char *pattern[,...], (char *)0);
char *regex(const char *cmppat, const char *subject[,subexp,...]);
extern char *__loc1;
General description
Restriction: This function is not supported
in AMODE 64.
The regcmp() function concatenates regular
expression (RE) patterns specified by a list of one or more pattern arguments.
The end of this list must be delimited by a NULL pointer. The regcmp()
function then converts the concatenated RE pattern into an internal
form suitable for use by the pattern matching regex() function. If
conversion is successful, regcmp() returns a pointer to the converted
pattern. Otherwise, it returns a NULL pointer. The regcmp() function
uses malloc() to obtain storage for the converted pattern. It is the
application's responsibility to free unneeded space so allocated.
The
regex() function executes a converted pattern cmppat against
a subject string. If cmppat matches
all or part of the subject string, the regex()
function returns a pointer to the next unmatched character in the subject string
and sets the external variable __loc1 to point the
first matched character in the subject string.
If no match is found between cmppat and
the subject string, the regex() function
returns a NULL pointer.
The regcmp() and regex() functions
are supported in any locale. However, results are unpredictable if
they are not run in the same locale.
Following are valid RE
symbols and their meaning to the regcmp() and regex() functions:
- Expression
- Meaning
- NUL
- Terminate RE pattern and text string
- c
- Any non-special character, c, is a one-character RE which matches
itself.
- \s
- A backslash (\) followed by a special character, s, is a one-character
RE which matches the special character itself.
The following characters
are special:
- period, ., asterisk, *, plus, +, dollar, $, left square bracket,
[, left brace, {, right brace, }, left parenthesis, (, right parenthesis,
), and backslash, \, are always special except when they appear
within square brackets ([]).
- caret (^) is special at the beginning of an entire RE (which
is another name for a pattern).
Note: An non-special character preceded by \ is a one-character
RE which matches the non-special character.
- yz
- Concatenation of REs y and z matches concatenation of strings
matched by y and z.
- .
- The period (.) special character RE matches any single character
except the <newline> character.
- ^
- The caret (^) at the beginning of an entire RE is an RE which
matches the beginning of a string. Thus, it anchors or limits
matches by the entire RE to the beginning of strings.
- $
- The dollar ($) at the end of an entire RE is an RE which only
the end of a string (delimited by the <NUL> character). Thus,
it anchors or limits matches by the entire RE to the end of
strings.
Note: \n (the C language designation for a <newline>
character) must be used in an entire RE to match any embedded or trailing <newline>
character in a text string.
- (...)
- Parentheses are used to delimit a sub-expression which matches
whatever the REs comprising the sub-expression would have matched
without the delimiting parentheses.
- (...)$n
- $n, where n is a digit between 0 and 9, inclusive, may be used
to tag a sub-expression. The tag tells the regex() function to return
the substring matched by the sub-expression at address specified by
(n+1)th argument after subject.
- *
- A one-character RE or sub-expression followed by an asterisk
(*) is a RE that matches zero or more occurrences of the one-character
RE or sub-expression. If there is any choice, the longest leftmost
string that permits a match is chosen.
- +
- A one-character RE or sub-expression followed by a plus
(+) is a RE that matches one or more occurrences of the one-character
RE or sub-expression. Whenever a choice exists, the RE matches as
many occurrences as possible.
- {m,n}
- A one-character RE or sub-expression followed by integer
values, m and n, enclosed in braces is a RE which matches repeated
occurrences of whatever the preceding one-character RE or sub-expression
matched. The value of m, which must be in the range 0 to 255, inclusive,
is the minimum number of occurrences required for a match. The value
of n which, if specified, must also must be in the range 0 to 255,
inclusive, is the maximum. The value of n, if specified, must be greater
than or equal to the value m. The following brace expressions are
valid:
- {m}
- Matches exactly m occurrences of the preceding one-character RE
or sub-expression.
- {m,}
- Matches m or more occurrences of the preceding one-character RE
or sub-expression. There is no limit on the number of occurrences
which will be matched. The plus (+) and asterisk (*) operations are
equivalent to {1,} and {0,}, respectively.
- {m,n}
- Matches between m and n occurrences, inclusive.
Whenever a choice exists, the RE matches as many
occurrence as possible.
- [...]
- A non-empty list of characters enclosed by square brackets is
a one-character RE that matches any one character in the list.
- [^...]
- A non-empty list of characters preceded by a caret (^) enclosed
by square brackets is a one-character RE that matches any character except <newline>
and the characters in the list. The ^ has special meaning only if
it is the first character after the left bracket ([).
- [c1-c2]
- The hyphen (-) between two characters c1 and c2 within square
brackets designates the list of characters whose collating values
fall between the collating values of c1 and c2 in the current locale.
The collating value of c2 must be greater than or equal to c1. Also,
c2 may not be used as the ending point of one range and the starting
point of another range. In other words, c1-c2-c3 is invalid.
The
- loses special meaning if it occurs first or last in the bracket
expression or if it is used for c1 or c2.
The right bracket,
], does not terminate a bracket expression when it is the first character
within it (after an initial ^, if any). For example, the expression
[]0-9] matches a right bracket or a digit in the range 0-9, inclusive.
Notes: - Multiple duplication symbols applied to the same RE will be interpreted
in the following order of precedence:
- *
- +
- {}
- RE Order of precedence is as follows, from high to low:
- escaped character \character
- bracket expression [...]
- sub-expression (...)
- duplication * + {}
- concatenation yz
- anchors ^ $
Note: The regcmp() and regex() functions are
provided for historical reasons. These functions were part of the
Legacy Feature in Single UNIX Specification,
Version 2. They have been withdrawn and are not supported as part
of Single UNIX Specification,
Version 3. New applications should use the newer functions fnmatch(),
glob(), regcomp() and regexec(), which provide full internationalized
regular expression functionality compatible with IEEE Std 1003.1-2001.
If
it is necessary to continue using these functions in an application
written for Single UNIX Specification,
Version 3, define the feature test macro _UNIX03_WITHDRAWN before
including any standard system headers. The macro exposes all interfaces
and symbols removed in Single UNIX Specification,
Version 3.
Returned value
If the pattern formed by
concatenating the list of pattern arguments
is successfully converted, regcmp() returns a pointer to the converted
pattern. Otherwise, it returns a NULL pointer. If regcmp() is unable
to allocate storage for the converted pattern, it sets errno to ENOMEM.
If
regex() successfully matches the converted pattern cmppat to
all or part of the subject string, it returns
a pointer to the next unmatched character in subject.
Otherwise, it returns a NULL pointer.