edu.washington.cs.knowitall.sequence
Class LayeredTokenPattern

java.lang.Object
  extended by edu.washington.cs.knowitall.sequence.LayeredTokenPattern

public class LayeredTokenPattern
extends Object

A class that defines a regular expression over the tokens appearing in a LayeredSequence object.

For example, suppose we want to find parts of sentences that match the pattern "DT cow", where "DT" is the part-of-speech tag representing a determiner. Assume that sentences are represented as LayeredSequence objects, where the words layer has the name "word" and the part-of-speech layer has the name "pos". Then the above pattern can be constructed by calling new LayeredTokenPattern( "DT_pos cow_word"). Given a test sentence sent, the matcher(LayeredSequence) method will return a LayeredTokenMatcher object that will allow you to access the ranges and groups.

The patterns are expressed using the standard Pattern language, but with the following changes.

The basic unit of match is not a character, but instead a token. A token consists of two parts: a value and a layer name. A token is expressed using an underscore to separate the two. For example Foo_bar will match when the token @{code Foo} appears on the layer with the name bar. In the example above, the token DT_pos will match the word- POS pair (w, p) pair when p = DT. The value of w is allowed to be anything. Currently there is no way to match the value of multiple layers at once (e.g. match all occurrences of "bank" that are nouns).

The value of a token can only have characters from this set: [a-zA-Z0-9\\-.,:;?!"'`]. The layer name can only have characters from this set: [a-zA-Z0-9\\-].

When expressing a pattern, tokens must be space separated.

In the following examples pos refers to a part-of-speech layer, and word refers to a word layer.

Author:
afader

Constructor Summary
LayeredTokenPattern(String patternString)
          Constructs a new instance from the given String pattern
 
Method Summary
 Pattern getEncodedPattern()
           
 LayeredTokenMatcher matcher(LayeredSequence seq)
          Returns a matcher object, which can be used to scan seq for any subsequences that match this pattern.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

LayeredTokenPattern

public LayeredTokenPattern(String patternString)
                    throws SequenceException
Constructs a new instance from the given String pattern

Parameters:
patternString -
Throws:
SequenceException - if unable to compile patternString
Method Detail

toString

public String toString()
Overrides:
toString in class Object

matcher

public LayeredTokenMatcher matcher(LayeredSequence seq)
                            throws SequenceException
Returns a matcher object, which can be used to scan seq for any subsequences that match this pattern.

Parameters:
seq -
Returns:
the matcher
Throws:
SequenceException - if unable to create a matcher over seq

getEncodedPattern

public Pattern getEncodedPattern()
Returns:
the character-level pattern that this LayeredTokenPattern was compiled into.


Copyright © 2010-2012 University of Washington CSE. All Rights Reserved.