Brief Background
A regular expression consists of a character string where some
characters are given special meaning with regard to pattern matching.
Regular expressions have been in use from the early days of computing,
and provide a powerful and efficient way to parse, interpret and
search and replace text within an application.
Supported Syntax
Within a regular expression, the following characters have special meaning:
. matches any single character
\x matches the character x, even if x is a syntactical character
[abc] matches any character in the set a, b or c.
[^abc] matches any character not in the set a, b or c.
[a-z] matches any character in the range a to z, inclusive.
[a-zABC0-9] matches any character in the range a to z, 0 to 9 or A, B. or C
\ is used to escape the characters ] and - which have a syntactical meaning inside a character set: [\-\\] matches the characters \ or ]
!(A) matches whatever the expression A would not match/contain.
(A)|(B) matches whatever matches the expression A or matches the expression B.
(A)&(B) matches whatever matches the expression A and matches the expression B.
(A)&(!(B)) matches whatever matches the expression A but not B.
? matches the preceding expression or the null string (same as {0,1})
* matches the null string or any number of repetitions of the preceding expression (same as {0,*})
+ matches one or more repetitions of the preceding expression (same as {1,*})
{m} matches exactly m repetitions of the preceding expression
{m,n} matches between m and n repetitions of the preceding expression, inclusive
{m,*} matches m or more repetitions of the preceding expression
Java Integration
In a Java environment, a regular expression operates on a string of
Unicode characters, represented either as an instance of
java.lang.String or as an array of the primitive
char type. This means that the unit of matching is a
Unicode character, not a single byte. Generally this will not present
problems in a Java program, because Java takes pains to ensure that
all textual data uses the Unicode standard.
Note: Currently jrexx only supports
Unicode characters from HEX 0000-00FF.
Because Java string processing takes care of certain escape sequences,
they are not implemented in jrexx. You should be
aware that the following escape sequences are handled by the Java
compiler if found in the Java source:
\b backspaceIn addition, note that the \u escape sequences are meaningful anywhere in a Java program, not merely within a singly- or doubly-quoted character string, and are converted prior to any of the other escape sequences. For example, the line
\f form feed
\n newline
\r carriage return
\t horizontal tab
\" double quote
\' single quote
\\ backslash
\xxx character, in octal (000-377)
\uxxxx Unicode character, in hexadecimal (0000-00FF)