public class ChunkedDocumentReader extends Object
ChunkedDocument objects. The
behavior of this class depends on two parameters: a SentenceExtractor
object, which converts a String into a list of String sentences; and a
SentenceChunker object, which converts a String sentence into a
ChunkedSentence object.| Constructor and Description |
|---|
ChunkedDocumentReader()
Uses the object returned by
DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor, and OpenNlpSentenceChunker as the default
sentence chunker. |
ChunkedDocumentReader(SentenceChunker sentChunker)
Uses the object returned by
DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor. |
ChunkedDocumentReader(SentenceExtractor sentExtractor)
Uses
OpenNlpSentenceChunker as the default sentence chunker. |
ChunkedDocumentReader(SentenceExtractor sentExtractor,
SentenceChunker sentChunker) |
| Modifier and Type | Method and Description |
|---|---|
SentenceChunker |
getSentenceChunker() |
SentenceExtractor |
getSentenceExtractor() |
ChunkedDocument |
readDocument(File file)
Reads a document from the given file, using
File.getAbsolutePath() as the id of the document. |
ChunkedDocument |
readDocument(InputStream input,
String id)
Reads a document from the input, assigning it the given id
|
ChunkedDocument |
readDocument(String docStr,
String id)
Reads a document from the given string, assigning it the given id
|
public ChunkedDocumentReader(SentenceExtractor sentExtractor, SentenceChunker sentChunker) throws IOException
sentExtractor - the object responsible for converting a String to String
sentencessentChunker - the object responsible for converting a String sentence to a
ChunkedSentence objectIOExceptionpublic ChunkedDocumentReader(SentenceExtractor sentExtractor) throws IOException
OpenNlpSentenceChunker as the default sentence chunker.sentExtractor - the object responsible for converting a String to String
sentencesIOExceptionpublic ChunkedDocumentReader(SentenceChunker sentChunker) throws IOException
DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor.sentChunker - the object responsible for converting a String sentence to a
ChunkedSentence objectIOExceptionpublic ChunkedDocumentReader()
throws IOException
DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor, and OpenNlpSentenceChunker as the default
sentence chunker.IOExceptionpublic SentenceExtractor getSentenceExtractor()
public SentenceChunker getSentenceChunker()
ChunkedSentence objectpublic ChunkedDocument readDocument(InputStream input, String id) throws ExtractorException
input - id - ExtractorExceptionpublic ChunkedDocument readDocument(File file) throws ExtractorException
File.getAbsolutePath() as the id of the document.file - ExtractorExceptionpublic ChunkedDocument readDocument(String docStr, String id) throws ExtractorException
docStr - id - ExtractorException - if unable to run sentence extractorCopyright © 2010-2013 University of Washington CSE. All Rights Reserved.