edu.washington.cs.knowitall.nlp
Class ChunkedSentence

java.lang.Object
  extended by edu.washington.cs.knowitall.sequence.SimpleLayeredSequence
      extended by edu.washington.cs.knowitall.sequence.BIOLayeredSequence
          extended by edu.washington.cs.knowitall.nlp.ChunkedSentence
All Implemented Interfaces:
LayeredSequence
Direct Known Subclasses:
ChunkedExtraction

public class ChunkedSentence
extends BIOLayeredSequence

An immutable class that represents a tokenized, POS-tagged, and noun-phrase chunked sentence.

Author:
afader

Field Summary
static String NP_LAYER
          The layer name for the NP chunk tags.
protected  com.google.common.collect.ImmutableList<edu.washington.cs.knowitall.commonlib.Range> offsets
           
static String POS_LAYER
          The layer name for the part of speech tags.
static String TOKEN_LAYER
          The layer name for the tokens.
 
Constructor Summary
ChunkedSentence(ChunkedSentence sent)
          Constructs a new instance using the given tokens, POS tags, and NP chunk tags, each of which must have the same length.
ChunkedSentence(com.google.common.collect.ImmutableList<edu.washington.cs.knowitall.commonlib.Range> offsets, com.google.common.collect.ImmutableList<String> tokens, com.google.common.collect.ImmutableList<String> posTags, com.google.common.collect.ImmutableList<String> npChunkTags)
          Constructs a new instance using the given tokens, POS tags, and NP chunk tags, each of which must have the same length.
ChunkedSentence(List<edu.washington.cs.knowitall.commonlib.Range> offsets, List<String> tokens, List<String> posTags, List<String> npChunkTags)
           
ChunkedSentence(List<String> tokens, List<String> posTags, List<String> npChunkTags)
           
ChunkedSentence(edu.washington.cs.knowitall.commonlib.Range[] offsets, String[] tokens, String[] posTags, String[] npChunkTags)
           
ChunkedSentence(String[] tokens, String[] posTags, String[] npChunkTags)
          Constructs a new instance using the given tokens, POS tags, and NP chunk tags, each of which must have the same length.
 
Method Summary
 ChunkedSentence clone()
          Returns a copy of this object.
 String getChunkTag(int i)
           
 com.google.common.collect.ImmutableList<String> getChunkTags()
           
 com.google.common.collect.ImmutableList<String> getChunkTags(int start, int length)
           
 com.google.common.collect.ImmutableList<String> getChunkTags(edu.washington.cs.knowitall.commonlib.Range range)
           
 String getChunkTagsAsString()
           
 com.google.common.collect.ImmutableCollection<edu.washington.cs.knowitall.commonlib.Range> getNpChunkRanges()
           
 com.google.common.collect.ImmutableList<edu.washington.cs.knowitall.commonlib.Range> getOffsets()
           
 String getOffsetsAsString()
           
 String getPosTag(int i)
           
 com.google.common.collect.ImmutableList<String> getPosTags()
           
 com.google.common.collect.ImmutableList<String> getPosTags(int start, int length)
           
 com.google.common.collect.ImmutableList<String> getPosTags(edu.washington.cs.knowitall.commonlib.Range range)
           
 String getPosTagsAsString()
           
 String getPosTagsAsString(int start, int length)
           
 String getPosTagsAsString(edu.washington.cs.knowitall.commonlib.Range range)
           
 edu.washington.cs.knowitall.commonlib.Range getRange()
           
 ChunkedSentence getSubSequence(int start, int length)
          Returns a new ChunkedSentence object that starts at the given start index and has the given length.
 ChunkedSentence getSubSequence(edu.washington.cs.knowitall.commonlib.Range range)
          Returns a new ChunkedSentence object that starts at the given range.
 String getToken(int i)
           
 edu.washington.cs.knowitall.commonlib.Range getTokenRange(int charStart, int charEnd)
          Converts a character range into getTokensAsString into a bounding token range.
 com.google.common.collect.ImmutableList<String> getTokens()
           
 com.google.common.collect.ImmutableList<String> getTokens(int start, int length)
           
 com.google.common.collect.ImmutableList<String> getTokens(edu.washington.cs.knowitall.commonlib.Range range)
           
 String getTokensAsString()
           
 String getTokensAsString(int start, int length)
           
 String getTokensAsString(edu.washington.cs.knowitall.commonlib.Range range)
           
 String toOpenNlpFormat()
           
 String toString()
           
 
Methods inherited from class edu.washington.cs.knowitall.sequence.BIOLayeredSequence
addSpanLayer, addSpanLayerRanges, getSpans, getSpans, getSubSequence, getSubSequence, isSpanLayer
 
Methods inherited from class edu.washington.cs.knowitall.sequence.SimpleLayeredSequence
addLayer, addLayer, addLayer, equals, get, getLayer, getLayerAsString, getLayerAsString, getLayerAsString, getLayerNames, getLength, getNumLayers, hashCode, hasLayer
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

TOKEN_LAYER

public static final String TOKEN_LAYER
The layer name for the tokens.

See Also:
Constant Field Values

POS_LAYER

public static final String POS_LAYER
The layer name for the part of speech tags.

See Also:
Constant Field Values

NP_LAYER

public static final String NP_LAYER
The layer name for the NP chunk tags.

See Also:
Constant Field Values

offsets

protected final com.google.common.collect.ImmutableList<edu.washington.cs.knowitall.commonlib.Range> offsets
Constructor Detail

ChunkedSentence

public ChunkedSentence(String[] tokens,
                       String[] posTags,
                       String[] npChunkTags)
                throws SequenceException
Constructs a new instance using the given tokens, POS tags, and NP chunk tags, each of which must have the same length. The NP chunks should be expressed using the standard B-NP, I-NP, O tags.

Parameters:
tokens -
posTags -
npChunkTags -
Throws:
SequenceException - if the layers are of different lengths, or if unable to interpret npChunkTags

ChunkedSentence

public ChunkedSentence(edu.washington.cs.knowitall.commonlib.Range[] offsets,
                       String[] tokens,
                       String[] posTags,
                       String[] npChunkTags)
                throws SequenceException
Throws:
SequenceException

ChunkedSentence

public ChunkedSentence(com.google.common.collect.ImmutableList<edu.washington.cs.knowitall.commonlib.Range> offsets,
                       com.google.common.collect.ImmutableList<String> tokens,
                       com.google.common.collect.ImmutableList<String> posTags,
                       com.google.common.collect.ImmutableList<String> npChunkTags)
                throws SequenceException
Constructs a new instance using the given tokens, POS tags, and NP chunk tags, each of which must have the same length. The NP chunks should be expressed using the standard B-NP, I-NP, O tags.

Parameters:
tokens -
posTags -
npChunkTags -
Throws:
SequenceException - if the layers are of different lengths, or if unable to interpret npChunkTags

ChunkedSentence

public ChunkedSentence(List<String> tokens,
                       List<String> posTags,
                       List<String> npChunkTags)
                throws SequenceException
Throws:
SequenceException

ChunkedSentence

public ChunkedSentence(List<edu.washington.cs.knowitall.commonlib.Range> offsets,
                       List<String> tokens,
                       List<String> posTags,
                       List<String> npChunkTags)
                throws SequenceException
Throws:
SequenceException

ChunkedSentence

public ChunkedSentence(ChunkedSentence sent)
Constructs a new instance using the given tokens, POS tags, and NP chunk tags, each of which must have the same length. The NP chunks should be expressed using the standard B-NP, I-NP, O tags.

Parameters:
tokens -
posTags -
npChunkTags -
Method Detail

getOffsets

public com.google.common.collect.ImmutableList<edu.washington.cs.knowitall.commonlib.Range> getOffsets()

getRange

public edu.washington.cs.knowitall.commonlib.Range getRange()

getSubSequence

public ChunkedSentence getSubSequence(edu.washington.cs.knowitall.commonlib.Range range)
Returns a new ChunkedSentence object that starts at the given range.

Overrides:
getSubSequence in class BIOLayeredSequence
Parameters:
range -
Returns:

getSubSequence

public ChunkedSentence getSubSequence(int start,
                                      int length)
Returns a new ChunkedSentence object that starts at the given start index and has the given length.

Overrides:
getSubSequence in class BIOLayeredSequence
Parameters:
start -
length -
Returns:

getTokenRange

public edu.washington.cs.knowitall.commonlib.Range getTokenRange(int charStart,
                                                                 int charEnd)
Converts a character range into getTokensAsString into a bounding token range.

Parameters:
charStart -
charEnd -
Returns:

clone

public ChunkedSentence clone()
Returns a copy of this object.

Overrides:
clone in class BIOLayeredSequence

getTokens

public com.google.common.collect.ImmutableList<String> getTokens()
Returns:
an unmodifiable list over the tokens of this sentence.

getPosTags

public com.google.common.collect.ImmutableList<String> getPosTags()
Returns:
an unmodifiable list over the POS tags of this sentence.

getTokens

public com.google.common.collect.ImmutableList<String> getTokens(int start,
                                                                 int length)
Parameters:
start -
length -
Returns:
the first length tokens starting at index start.

getTokens

public com.google.common.collect.ImmutableList<String> getTokens(edu.washington.cs.knowitall.commonlib.Range range)
Parameters:
range -
Returns:
the tokens at the indexes given by range.

getPosTags

public com.google.common.collect.ImmutableList<String> getPosTags(int start,
                                                                  int length)
Parameters:
start -
length -
Returns:
the first length POS tags starting at index start.

getPosTags

public com.google.common.collect.ImmutableList<String> getPosTags(edu.washington.cs.knowitall.commonlib.Range range)
Parameters:
range -
Returns:
the POS tags at the indexes given by range.

getNpChunkRanges

public com.google.common.collect.ImmutableCollection<edu.washington.cs.knowitall.commonlib.Range> getNpChunkRanges()
Returns:
an unmodifiable list over the ranges of the NP chunks in this sentence.

getChunkTags

public com.google.common.collect.ImmutableList<String> getChunkTags()
Returns:
an unmodifiable list over the NP chunk tags of this sentence.

getChunkTags

public com.google.common.collect.ImmutableList<String> getChunkTags(int start,
                                                                    int length)
Parameters:
start -
length -
Returns:
the first length NP chunk tags starting at index start.

getChunkTags

public com.google.common.collect.ImmutableList<String> getChunkTags(edu.washington.cs.knowitall.commonlib.Range range)
Parameters:
range -
Returns:
the first length NP chunk tags in the range range.

getOffsetsAsString

public String getOffsetsAsString()

getTokensAsString

public String getTokensAsString()
Returns:
the tokens of this sentence joined by spaces.

getTokensAsString

public String getTokensAsString(int start,
                                int length)
Parameters:
start -
length -
Returns:
length tokens starting at start, joined by spaces.

getTokensAsString

public String getTokensAsString(edu.washington.cs.knowitall.commonlib.Range range)
Parameters:
range -
Returns:
the tokens at the indexes of range, joined by spaces.

getPosTagsAsString

public String getPosTagsAsString(int start,
                                 int length)
Parameters:
start -
length -
Returns:
length POS tags starting at start, joined by spaces.

getPosTagsAsString

public String getPosTagsAsString(edu.washington.cs.knowitall.commonlib.Range range)
Parameters:
range -
Returns:
the POS tags at the indexes of range, joined by spaces.

getPosTagsAsString

public String getPosTagsAsString()
Returns:
the POS tags of this sentence, joined by spaces.

getChunkTagsAsString

public String getChunkTagsAsString()
Returns:
the NP chunk tags of this sentence (in B-NP, I-NP, O format), joined by strings.

toString

public String toString()
Overrides:
toString in class Object
Returns:
the tokens of this sentence joined by spaces

toOpenNlpFormat

public String toOpenNlpFormat()
Returns:
the tokens, POS tags, and NP chunk tags of this string in Open NLP format (square brackets around chunks, then token/tag).

getToken

public String getToken(int i)
Parameters:
i -
Returns:
the token at index i

getPosTag

public String getPosTag(int i)
Parameters:
i -
Returns:
the part-of-speech tag at index i

getChunkTag

public String getChunkTag(int i)
Parameters:
i -
Returns:
the chunk tag at index i


Copyright © 2010-2012 University of Washington CSE. All Rights Reserved.