public class WikipediaSentenceSource extends SentenceSource
pages-articles.xml.bz2 from
http://download.wikimedia.org/backup-index.html, e.g.
http://download.wikimedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2.| Constructor and Description |
|---|
WikipediaSentenceSource(InputStream xmlInput,
Language language) |
WikipediaSentenceSource(InputStream xmlInput,
Language language,
Pattern filter) |
| Modifier and Type | Method and Description |
|---|---|
String |
getSource() |
boolean |
hasNext() |
Sentence |
next()
Return the next sentence.
|
acceptSentence, remove, toStringpublic WikipediaSentenceSource(InputStream xmlInput, Language language)
public WikipediaSentenceSource(InputStream xmlInput, Language language, Pattern filter)
public boolean hasNext()
hasNext in interface Iterator<Sentence>hasNext in class SentenceSourcepublic Sentence next()
SentenceSourcenext in interface Iterator<Sentence>next in class SentenceSourcepublic String getSource()
getSource in class SentenceSource