Class WikipediaSentenceSource

java.lang.Object
org.languagetool.dev.dumpcheck.SentenceSource
org.languagetool.dev.dumpcheck.WikipediaSentenceSource
All Implemented Interfaces:
Iterator<Sentence>

public class WikipediaSentenceSource extends SentenceSource
Provides access to the sentences of a Wikipedia XML dump. Note that conversion exceptions are logged to STDERR and are otherwise ignored. To get an XML dump, download pages-articles.xml.bz2 from http://download.wikimedia.org/backup-index.html, e.g. http://download.wikimedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2.
Since:
2.4