|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectedu.washington.cs.knowitall.sequence.LayeredTokenPattern
public class LayeredTokenPattern
A class that defines a regular expression over the tokens appearing in a
LayeredSequence object.
For example, suppose we want to find parts of sentences that match the
pattern "DT cow", where "DT" is the part-of-speech tag representing a
determiner. Assume that sentences are represented as LayeredSequence
objects, where the words layer has the name "word" and the part-of-speech
layer has the name "pos". Then the above pattern can be constructed by
calling new LayeredTokenPattern(
"DT_pos cow_word"). Given a test sentence sent, the
matcher(LayeredSequence) method will return a
LayeredTokenMatcher object that will allow you to access the ranges
and groups.
The patterns are expressed using the standard Pattern
language, but with the following changes.
The basic unit of match is not a character, but instead a token. A token
consists of two parts: a value and a layer name. A token is expressed using
an underscore to separate the two. For example Foo_bar will match
when the token @{code Foo} appears on the layer with the name bar. In
the example above, the token DT_pos will match the word- POS pair
(w, p) pair when p = DT. The value of w is allowed to
be anything. Currently there is no way to match the value of multiple layers
at once (e.g. match all occurrences of "bank" that are nouns).
The value of a token can only have characters from this set:
[a-zA-Z0-9\\-.,:;?!"'`]. The layer name can only have characters from
this set: [a-zA-Z0-9\\-].
When expressing a pattern, tokens must be space separated.
In the following examples pos refers to a part-of-speech layer, and
word refers to a word layer.
^John_word lives_word in_word NNP_pos+ - matches sentences that start
with "John lives in" and then is followed by at least one proper noun.^(NNP_pos+) lives_word in_word (NNP_pos+) ._pos$ - matches sentences
that start with at least one proper noun, followed by "lives in", followed by
at least one proper noun, and then ending with a period. Captures the two
proper nouns as groups (see LayeredTokenMatcher).
| Constructor Summary | |
|---|---|
LayeredTokenPattern(String patternString)
Constructs a new instance from the given String pattern |
|
| Method Summary | |
|---|---|
Pattern |
getEncodedPattern()
|
LayeredTokenMatcher |
matcher(LayeredSequence seq)
Returns a matcher object, which can be used to scan seq for any subsequences that match this pattern. |
String |
toString()
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public LayeredTokenPattern(String patternString)
throws SequenceException
patternString -
SequenceException - if unable to compile patternString| Method Detail |
|---|
public String toString()
toString in class Object
public LayeredTokenMatcher matcher(LayeredSequence seq)
throws SequenceException
seq -
SequenceException - if unable to create a matcher over seqpublic Pattern getEncodedPattern()
LayeredTokenPattern
was compiled into.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||