edu.washington.cs.knowitall.nlp
Class ChunkedDocumentReader

java.lang.Object
  extended by edu.washington.cs.knowitall.nlp.ChunkedDocumentReader

public class ChunkedDocumentReader
extends Object

A class for converting raw text into ChunkedDocument objects. The behavior of this class depends on two parameters: a SentenceExtractor object, which converts a String into a list of String sentences; and a SentenceChunker object, which converts a String sentence into a ChunkedSentence object.

Author:
afader

Constructor Summary
ChunkedDocumentReader()
          Uses the object returned by DefaultObjects.getDefaultHtmlSentenceExtractor() as the default sentence extractor, and OpenNlpSentenceChunker as the default sentence chunker.
ChunkedDocumentReader(SentenceChunker sentChunker)
          Uses the object returned by DefaultObjects.getDefaultHtmlSentenceExtractor() as the default sentence extractor.
ChunkedDocumentReader(SentenceExtractor sentExtractor)
          Uses OpenNlpSentenceChunker as the default sentence chunker.
ChunkedDocumentReader(SentenceExtractor sentExtractor, SentenceChunker sentChunker)
           
 
Method Summary
 SentenceChunker getSentenceChunker()
           
 SentenceExtractor getSentenceExtractor()
           
 ChunkedDocument readDocument(File file)
          Reads a document from the given file, using File.getAbsolutePath() as the id of the document.
 ChunkedDocument readDocument(InputStream input, String id)
          Reads a document from the input, assigning it the given id
 ChunkedDocument readDocument(String docStr, String id)
          Reads a document from the given string, assigning it the given id
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ChunkedDocumentReader

public ChunkedDocumentReader(SentenceExtractor sentExtractor,
                             SentenceChunker sentChunker)
                      throws IOException
Parameters:
sentExtractor - the object responsible for converting a String to String sentences
sentChunker - the object responsible for converting a String sentence to a ChunkedSentence object
Throws:
IOException

ChunkedDocumentReader

public ChunkedDocumentReader(SentenceExtractor sentExtractor)
                      throws IOException
Uses OpenNlpSentenceChunker as the default sentence chunker.

Parameters:
sentExtractor - the object responsible for converting a String to String sentences
Throws:
IOException

ChunkedDocumentReader

public ChunkedDocumentReader(SentenceChunker sentChunker)
                      throws IOException
Uses the object returned by DefaultObjects.getDefaultHtmlSentenceExtractor() as the default sentence extractor.

Parameters:
sentChunker - the object responsible for converting a String sentence to a ChunkedSentence object
Throws:
IOException

ChunkedDocumentReader

public ChunkedDocumentReader()
                      throws IOException
Uses the object returned by DefaultObjects.getDefaultHtmlSentenceExtractor() as the default sentence extractor, and OpenNlpSentenceChunker as the default sentence chunker.

Throws:
IOException
Method Detail

getSentenceExtractor

public SentenceExtractor getSentenceExtractor()
Returns:
the object responsible for converting a String to String sentences

getSentenceChunker

public SentenceChunker getSentenceChunker()
Returns:
the object responsible for converting a String sentence to a ChunkedSentence object

readDocument

public ChunkedDocument readDocument(InputStream input,
                                    String id)
                             throws ExtractorException
Reads a document from the input, assigning it the given id

Parameters:
input -
id -
Returns:
the document
Throws:
ExtractorException

readDocument

public ChunkedDocument readDocument(File file)
                             throws ExtractorException
Reads a document from the given file, using File.getAbsolutePath() as the id of the document.

Parameters:
file -
Returns:
the document
Throws:
ExtractorException

readDocument

public ChunkedDocument readDocument(String docStr,
                                    String id)
                             throws ExtractorException
Reads a document from the given string, assigning it the given id

Parameters:
docStr -
id -
Returns:
the document
Throws:
ExtractorException - if unable to run sentence extractor


Copyright © 2010-2012 University of Washington CSE. All Rights Reserved.