Regular expression constructs and characters

In the Eclipse tool, you can build rules in the Regular Expression Builder by selecting constructs and combining them. You can modify regular expressions, subexpressions, and the associated character classes in the Regular Expression Generator.

Character classes

Each character can be classified into one of the following character classes:

Table 1. Character classes
Character Class Shorthand Meaning
Any character . Any character symbol
Alphanumeric [0-9\p{L}] Any word character or digit
Non-Alphanumeric [^0-9\p{L}] Any character that is not a word character or a digit
Letter \p{L} Any word character
Lowercase letters \p{Ll} Any lowercase word character
Uppercase letters \p{Lu} Any uppercase word character
Digit \d Any digit (0-9)
White space \s Any white space character
Other [^0-9\s\p{L}] Any character that is not a word character, digit, or white space character

Some character classes comprise other character classes. For example, lowercase letters are also members of the Letter class.

Construct categories

A list of Construct categories is provided in Regular Expression Builder. You can select the required categories and modify the rule by selecting the constructs under that category. Constructs can be selected by clicking the button or by checking the check box (whichever is provided).

Each of the Construct categories is included in the following list. For details of each category, see the accompanying tables:
Table 2. Character category
Construct Matches
\\ The backslash character
\t The tab character
\n The newline (line feed) character
\r The carriage-return character
\e The escape character
The space character
Table 3. Character classes category
Construct Matches
[abc] [abc]: a, b, or c (simple class)
[^abc] [^abc]: Any character except a, b, or c (negation)
[a-z] [a-z]: a through z, inclusive (range)
Table 4. Predefined character classes
Construct Matches
. Matches any character except the newline character (\n)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A white space character: [\t\n\x0B\f\r], any one of space, tab, newline, return, or newpage character
\S A non-white space character: [^\s]
\w A word character: [a-zA-Z_0-9], any alphanumeric character
\W A non-word character: [^\w]
Table 5. Boundary matches
Construct Matches
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
Table 6. Greedy quantifiers
Construct Matches
X? Subexpression X appears once or not at all
X* Subexpression X appears zero or more times
X+ Subexpression X appears one or more times
X{n} Subexpression X appears exactly n times
X{n,} Subexpression X appears at least n times
X{n,m} Subexpression X appears at least n but not more than m times
Table 7. Logical operators
Construct Matches
XY Subexpression X followed by subexpression Y
X|Y Subexpression X or subexpression Y
Table 8. Match flags
Construct Matches
?d Enable UNIX lines mode
?i Enable non case-sensitive matching
?x Permits white space and comments in pattern
?m Enables multiline mode
?s Enable dotall mode
?u Enables unicode-aware case sensitivity