This class compiles regular expressions over the ChunkedSentenceTokens in
a sentence into an NFA. There is a lot of redundancy in their
expressiveness. This is largely because it supports pattern matching on
the fields This is not necessary but is an optimization and a shorthand
(i.e. <pos="NNPS?"> is equivalent to "<pos="NNP" | pos="NNPS">
and (?:<pos="NNP"> | <pos="NNPS">).
Here are some equivalent examples:
-
<pos="JJ">* <pos="NNP.">+
-
<pos="JJ">* <pos="NNPS?">+
-
<pos="JJ">* <pos="NNP" | pos="NNPS">+
-
<pos="JJ">* (?:<pos="NNP"> | <pos="NNPS">)+
Note that (3) and (4) are not preferred for efficiency reasons. Regex OR
(in example (4)) should only be used on multi-ChunkedSentenceToken
sequences.
The Regular Expressions support named groups (: ... ), unnamed
groups (?: ... ), and capturing groups ( ... ). The operators allowed are
+, ?, *, and |. The Logic Expressions (that describe each
ChunkedSentenceToken) allow grouping "( ... )", not '!', or '|', and and
'&'.
- Parameters:
regex -
- Returns: