Package org.languagetool.dev.dumpcheck
Class SentenceSourceIndexer
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.languagetool.dev.dumpcheck.SentenceSourceIndexer
- All Implemented Interfaces:
AutoCloseable,ContentHandler,DTDHandler,EntityResolver,ErrorHandler
Creates a Lucene index of a
SentenceSource.
Performance examples (Dell XPS 13 9360):
German Wikipedia and Tatoeba With POS tags: 22,000 sentences per minute
German Wikipedia and Tatoeba Without POS tags: 2.4 million sentences per minute- Since:
- 2.4
-
Field Summary
Fields -
Method Summary
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warningMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.xml.sax.ContentHandler
declaration
-
Field Details
-
MAX_DOC_COUNT_VALUE
- See Also:
-
MAX_DOC_COUNT_FIELD
- See Also:
-
MAX_DOC_COUNT_FIELD_VAL
- See Also:
-
-
Method Details
-
close
- Specified by:
closein interfaceAutoCloseable- Throws:
Exception
-
main
- Throws:
Exception
-