Standards
Standards / Extensions |
C or C++ |
Dependencies |
XPG4
XPG4.2
|
both |
|
Format
#define INIT declarations
#define GETC() getc_code
#define PEEK() peek_code
#define UNGETC() ungetc_code
#define RETURN(ptr) return_code
#define ERROR(val) error_code
#define _XOPEN_SOURCE
#include <regexp.h>
char *compile(char *instring, char *expbuf, const char *endbuf, int eof);
General description
Restriction: This function is not supported in AMODE 64.
The
compile() function takes as input a simple regular expression and
produces a compiled expression that can be used with the step() and
advance() functions.
The first parameter instring is
never used explicitly by compile(). It is a pointer to a character
string defining a source regular expression. It is useful for programs
that pass down different pointers to input characters. Programs which
invoke functions to input characters or have characters in an external
array can pass down (char *)0 for this parameter.
expbuf is
a pointer to the place where the compiled regular expression will
be placed.
endbuf points to one more
than the highest address where the compiled regular expression may
be placed. If the compiled expression cannot fit in (endbuf-expbuf)
bytes, a call to ERROR(50) is made. (See "Returned Value" below.)
eof is
the character which marks the end of the regular expression.
The z/OS® UNIX services
implementation of the compile() function does not accept internationalized
simple expressions as input. Internationalized simple expressions
(for example, [[=c=]] (an equivalence class))
may yield unpredictable results.
Programs must have the following
five macros declared before the
#include <regexp.h> statement.
The macros GETC(), PEEKC() and UNGETC() operate on the regular expression
given as input to compile().
- GETC()
- This macro returns the value of the next character (byte) in the
regular expression pattern. Successive calls to GETC() should return
successive characters of the regular expression.
- PEEK()
- This macro returns the next character (byte) in the regular expression
pattern. Immediate successive calls to PEEK() should return the same
byte, which should also be the next character returned by GETC().
- UNGETC(c)
- This macro causes the argument c to
be returned by the next call to GETC(). No more than one character
is ever needed and this character is guaranteed to be the last character
read by GETC(). The value of the macro UNGETC() is always ignored.
- RETURN(ptr)
- This macro is used on normal exit of the compile() function. The
value of the argument ptr is a pointer to
the character after the last character of the compiled regular expression.
- ERROR(val)
- This macro is the abnormal return from compile(). The argument val is
an error number. (See "Returned Value" below for meanings.) This
call should never return.
Notes: - z/OS UNIX services
do not provide any default macros if the above user macros are not
provided.
- Each program that includes the <regexp.h> must have
a #define statement for INIT. It is used for dependent declarations
and initializations. For example, it can be used to set a variable
to point to the beginning of the regular expression so that this variable
can be used in the declarations for GETC(), PEEK(), and UNGETC().
- The external variables cirf, sed,
and nbra are reserved.
- The application must provide the proper serialization for the
compile(), step(), and advance() functions if they are run under a
multithreaded environment.
Simple regular expressions
A Simple Regular Expression (SRE) specifies
a set of character strings. The simplest form of regular expression
is a string of characters with no special meaning. A small set of
special characters, known as metacharacters, do have special meaning
when encountered in patterns.
- Expression
- Meaning
- c
- The character c where c is
not a special character.
- \c
- The character c where c is
any special character. For example, a\.e is equivalent
to a.e.
- ^
- The beginning of the string being compared
- $
- The dollar symbol matches the end of the string.
- .
- The period symbol matches any one character.
- [string]
- A string within square brackets specifies any of the characters
in string. Thus, [abc],
if compared to other strings, would match any which contained a, b,
or c.
The ] (right bracket) can be used alone within a pair of
brackets, but only if it immediately follows either the opening left
bracket or if it immediately follows [^.
Ranges may be specified
as c–c. The hyphen
symbol, within square brackets, means "through". It fills in the intervening
characters according to the collating sequence. For example, [a–z]
is equivalent to [abc…xyz]. If the end character in the range is lower
in collating sequence to the start character, then only the range
start and range end characters are accepted in the search pattern.
For example, [9–1] is equivalent to [91]. Note that ranges in Simple
Regular Expressions are only valid if the LC_COLLATE category is set
to the C locale.
The – (hyphen) can be used by itself, but
only if it is the first or last character in the expression. For example,
the expression []a-f]
matches either the ] or one of the characters a through f.
- [^string]
- The caret symbol, when inside square brackets, negates the characters
within the square brackets. Thus, [^abc], if compared
to other strings, would fail to match any
which contains even one a, b, or c.
Note: Characters ., *,
[, and \ (period, asterisk, left square bracket, and backslash, respectively)
have special meaning, except when they appear within square brackets
([]), or are preceded by \.
- *
- The asterisk symbol indicates 0 or more of any preceding characters.
For example, (a*e) will match any of the following: e,
ae, aae, aaae, .... The longest leftmost match is chosen.
- rx
- The occurrence of regular expression r followed
by the occurrence of regular expression x.
- \{m\} \{m,\}
\{m,u\}
- Integer values enclosed in \{\} indicate the number of times to
apply the preceding regular expression. m is
the minimum number and u is the maximum
number. u must be less than 256. If you
specify only m, it indicates the exact number
of times to apply the regular expression.
\{m,\}
is equivalent to \{m,255\}.
They both match m or more occurrences of
the expression. The * (asterisk) operation is equivalent to
\{0,\}.
The maximum number of occurrences is matched.
- \(r\)
- The regular expression r. The \( and
\) sequences are ignored.
- \n
- When \n (where 1 <= n <=
9) appears in a concatenated regular expression, it stands for the
regular expression x, where x is
the nth regular expression enclosed in \(
and \) sequences that appeared earlier in the concatenated regular
expression. For example, in the pattern \(c\)onc\(ate\)n\2,
the \2 is equivalent to ate, giving concatenate.
The character ^ at the beginning of an expression
permits a successful match only immediately after a newline or at
the beginning of each of the string to which a match is to be applied.
The character $ at the end of an expression requires a trailing newline.
Notes: - The compile() function is physically embedded in the regexp.h header.
This header will be protected from multiple invocations just like
other c headers.
- The compile(), step(), and advance() functions are provided for
historical reasons. These functions were part of the Legacy Feature
in Single UNIX Specification,
Version 2. They have been withdrawn and are not supported as part
of Single UNIX Specification,
Version 3. New applications should use the newer functions fnmatch(),
glob(), regcomp() and regexec(), which provide full internationalized
regular expression functionality compatible with IEEE Std 1003.1-2001.
Returned value
If successful, compile()
exits using the user-provided macro RETURN(ptr).
The value of the argument ptr is a pointer
to the character after the last character of the compiled regular
expression.
If unsuccessful, compile() exits using the user-provided
macro ERROR(
val). The argument
val is
an error number identifying the error. The following error numbers
are defined:
- Errcode
- Description String
- 11
- Range endpoint too large
- 16
- Bad number
- 25
- \digit out of range
- 36
- Illegal or missing delimiter
- 41
- No remembered search string
- 42
- \( \) imbalance
- 43
- Too many \(
- 44
- More than two numbers given in \{ \}
- 45
- } expected after \
- 46
- First number exceeds second in \{ \}
- 49
- [ ] imbalance
- 50
- Regular expression overflow