In the Eclipse tool, you can build rules in
the Regular Expression Builder by selecting constructs and combining
them. You can modify regular expressions, subexpressions, and the
associated character classes in the Regular Expression Generator.
Character classes
Each character can be classified into one of the
following character classes:
Table 1. Character classes
| Character
Class |
Shorthand |
Meaning |
| Any
character |
. |
Any
character symbol |
| Alphanumeric |
[0-9\p{L}] |
Any word
character or digit |
| Non-Alphanumeric |
[^0-9\p{L}] |
Any
character that is not a word character or a digit |
| Letter |
\p{L} |
Any word
character |
| Lowercase
letters |
\p{Ll} |
Any
lowercase word character |
| Uppercase
letters |
\p{Lu} |
Any
uppercase word character |
| Digit |
\d |
Any
digit (0-9) |
| White
space |
\s |
Any
white space character |
| Other |
[^0-9\s\p{L}] |
Any
character that is not a word character, digit, or white space
character |
Some character classes comprise other character
classes. For example, lowercase letters are also members of the
Letter class.
Construct categories
A list of Construct categories is provided in
Regular Expression Builder. You can select the required categories
and modify the rule by selecting the constructs under that category.
Constructs can be selected by clicking the button or by checking the
check box (whichever is provided).
Each of the Construct categories is included in the following list.
For details of each category, see the accompanying tables:
Table 2. Character category
| Construct
|
Matches |
| \\ |
The
backslash character |
| \t |
The tab
character |
| \n |
The
newline (line feed) character |
| \r |
The
carriage-return character |
| \e |
The
escape character |
|
The
space character |
Table 3. Character classes category
| Construct |
Matches |
| [abc] |
[abc]:
a, b, or c (simple class) |
| [^abc] |
[^abc]:
Any character except a, b, or c (negation) |
| [a-z] |
[a-z]:
a through z, inclusive (range) |
Table 4. Predefined character
classes
| Construct |
Matches |
| . |
Matches
any character except the newline character (\n) |
| \d |
A
digit: [0-9] |
| \D |
A
non-digit: [^0-9] |
| \s |
A white
space character: [\t\n\x0B\f\r], any one of space, tab, newline,
return, or newpage character |
| \S |
A
non-white space character: [^\s] |
| \w |
A word
character: [a-zA-Z_0-9], any alphanumeric character |
| \W |
A
non-word character: [^\w] |
Table 5. Boundary matches
| Construct |
Matches |
| ^ |
The
beginning of a line |
| $ |
The end
of a line |
| \b |
A word
boundary |
| \B |
A
non-word boundary |
Table 6. Greedy quantifiers
| Construct |
Matches |
| X? |
Subexpression
X appears once or not at all
|
| X* |
Subexpression
X appears zero or more times
|
| X+ |
Subexpression
X appears one or more times
|
| X{n} |
Subexpression
X appears exactly n times
|
| X{n,} |
Subexpression
X appears at least n times
|
| X{n,m} |
Subexpression
X appears at least n but not more than m times
|
Table 7. Logical operators
| Construct |
Matches |
| XY |
Subexpression
X followed by subexpression Y
|
| X|Y |
Subexpression
X or subexpression Y
|
Table 8. Match flags
| Construct |
Matches |
| ?d |
Enable
UNIX lines mode |
| ?i |
Enable
non case-sensitive matching |
| ?x |
Permits
white space and comments in pattern |
| ?m |
Enables
multiline mode |
| ?s |
Enable
dotall mode |
| ?u |
Enables
unicode-aware case sensitivity |