|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectedu.washington.cs.knowitall.extractor.Extractor<String,String>
edu.washington.cs.knowitall.extractor.SentenceExtractor
edu.washington.cs.knowitall.extractor.HtmlSentenceExtractor
public class HtmlSentenceExtractor
An Extractor class for extracting NpChunkedSentence objects from a
String containing HTML. Is backed by an OpenNLP SentenceDetector object.
Uses the code in HtmlUtils to extract plain text from HTML.
| Constructor Summary | |
|---|---|
HtmlSentenceExtractor()
Constructs a new HtmlSentenceExtractor object using the default OpenNLP
SentenceDetector object, as returned by DefaultObjects.getDefaultSentenceDetector(). |
|
HtmlSentenceExtractor(opennlp.tools.sentdetect.SentenceDetector detector)
Constructs a new SentenceExtractor object using the given OpenNLP SentenceDetector
object. |
|
| Method Summary | |
|---|---|
protected Collection<String> |
extractCandidates(String htmlBlock)
Runs the OpenNLP SentenceDetector object on the given String source,
and returns an Iterable object over the detected sentences. |
static void |
main(String[] args)
Extracts sentences from HTML passed via standard input, or through a file given as an argument to the program. |
| Methods inherited from class edu.washington.cs.knowitall.extractor.SentenceExtractor |
|---|
getSentenceDetector |
| Methods inherited from class edu.washington.cs.knowitall.extractor.Extractor |
|---|
addMapper, compose, extract, getMappers |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public HtmlSentenceExtractor(opennlp.tools.sentdetect.SentenceDetector detector)
SentenceExtractor object using the given OpenNLP SentenceDetector
object.
detector -
public HtmlSentenceExtractor()
throws IOException
HtmlSentenceExtractor object using the default OpenNLP
SentenceDetector object, as returned by DefaultObjects.getDefaultSentenceDetector().
IOException| Method Detail |
|---|
protected Collection<String> extractCandidates(String htmlBlock)
SentenceExtractorSentenceDetector object on the given String source,
and returns an Iterable object over the detected sentences.
extractCandidates in class SentenceExtractorhtmlBlock - the source to extract from.
public static void main(String[] args)
throws Exception
BracketsRemover mapper class,
and filters sentences using the SentenceEndFilter, SentenceStartFilter, and
SentenceLengthFilter mapper classes. Prints the resulting sentences to standard output,
one sentence per line.
args -
Exception
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||