public class LayeredTokenPattern extends Object
A class that defines a regular expression over the tokens appearing in a
LayeredSequence object.
For example, suppose we want to find parts of sentences that match the
pattern "DT cow", where "DT" is the part-of-speech tag representing a
determiner. Assume that sentences are represented as LayeredSequence
objects, where the words layer has the name "word" and the part-of-speech
layer has the name "pos". Then the above pattern can be constructed by
calling new LayeredTokenPattern(
"DT_pos cow_word"). Given a test sentence sent, the
matcher(LayeredSequence) method will return a
LayeredTokenMatcher object that will allow you to access the ranges
and groups.
The patterns are expressed using the standard Pattern
language, but with the following changes.
The basic unit of match is not a character, but instead a token. A token
consists of two parts: a value and a layer name. A token is expressed using
an underscore to separate the two. For example Foo_bar will match
when the token @{code Foo} appears on the layer with the name bar. In
the example above, the token DT_pos will match the word- POS pair
(w, p) pair when p = DT. The value of w is allowed to
be anything. Currently there is no way to match the value of multiple layers
at once (e.g. match all occurrences of "bank" that are nouns).
The value of a token can only have characters from this set:
[a-zA-Z0-9\\-.,:;?!"'`]. The layer name can only have characters from
this set: [a-zA-Z0-9\\-].
When expressing a pattern, tokens must be space separated.
In the following examples pos refers to a part-of-speech layer, and
word refers to a word layer.
^John_word lives_word in_word NNP_pos+ - matches sentences that start
with "John lives in" and then is followed by at least one proper noun.^(NNP_pos+) lives_word in_word (NNP_pos+) ._pos$ - matches sentences
that start with at least one proper noun, followed by "lives in", followed by
at least one proper noun, and then ending with a period. Captures the two
proper nouns as groups (see LayeredTokenMatcher).| Constructor and Description |
|---|
LayeredTokenPattern(String patternString)
Constructs a new instance from the given String pattern
|
| Modifier and Type | Method and Description |
|---|---|
Pattern |
getEncodedPattern() |
LayeredTokenMatcher |
matcher(LayeredSequence seq)
Returns a matcher object, which can be used to scan seq for any
subsequences that match this pattern.
|
String |
toString() |
public LayeredTokenPattern(String patternString) throws SequenceException
patternString - SequenceException - if unable to compile patternStringpublic LayeredTokenMatcher matcher(LayeredSequence seq) throws SequenceException
seq - SequenceException - if unable to create a matcher over seqpublic Pattern getEncodedPattern()
LayeredTokenPattern
was compiled into.Copyright © 2010-2013 University of Washington CSE. All Rights Reserved.