|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectedu.washington.cs.knowitall.nlp.ChunkedDocumentReader
public class ChunkedDocumentReader
A class for converting raw text into ChunkedDocument objects. The
behavior of this class depends on two parameters: a SentenceExtractor
object, which converts a String into a list of String sentences; and a
SentenceChunker object, which converts a String sentence into a
ChunkedSentence object.
| Constructor Summary | |
|---|---|
ChunkedDocumentReader()
Uses the object returned by DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor, and OpenNlpSentenceChunker as the default
sentence chunker. |
|
ChunkedDocumentReader(SentenceChunker sentChunker)
Uses the object returned by DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor. |
|
ChunkedDocumentReader(SentenceExtractor sentExtractor)
Uses OpenNlpSentenceChunker as the default sentence chunker. |
|
ChunkedDocumentReader(SentenceExtractor sentExtractor,
SentenceChunker sentChunker)
|
|
| Method Summary | |
|---|---|
SentenceChunker |
getSentenceChunker()
|
SentenceExtractor |
getSentenceExtractor()
|
ChunkedDocument |
readDocument(File file)
Reads a document from the given file, using File.getAbsolutePath() as the id of the document. |
ChunkedDocument |
readDocument(InputStream input,
String id)
Reads a document from the input, assigning it the given id |
ChunkedDocument |
readDocument(String docStr,
String id)
Reads a document from the given string, assigning it the given id |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public ChunkedDocumentReader(SentenceExtractor sentExtractor,
SentenceChunker sentChunker)
throws IOException
sentExtractor - the object responsible for converting a String to String
sentencessentChunker - the object responsible for converting a String sentence to a
ChunkedSentence object
IOException
public ChunkedDocumentReader(SentenceExtractor sentExtractor)
throws IOException
OpenNlpSentenceChunker as the default sentence chunker.
sentExtractor - the object responsible for converting a String to String
sentences
IOException
public ChunkedDocumentReader(SentenceChunker sentChunker)
throws IOException
DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor.
sentChunker - the object responsible for converting a String sentence to a
ChunkedSentence object
IOException
public ChunkedDocumentReader()
throws IOException
DefaultObjects.getDefaultHtmlSentenceExtractor() as the default
sentence extractor, and OpenNlpSentenceChunker as the default
sentence chunker.
IOException| Method Detail |
|---|
public SentenceExtractor getSentenceExtractor()
public SentenceChunker getSentenceChunker()
ChunkedSentence object
public ChunkedDocument readDocument(InputStream input,
String id)
throws ExtractorException
input - id -
ExtractorException
public ChunkedDocument readDocument(File file)
throws ExtractorException
File.getAbsolutePath() as the id of the document.
file -
ExtractorException
public ChunkedDocument readDocument(String docStr,
String id)
throws ExtractorException
docStr - id -
ExtractorException - if unable to run sentence extractor
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||