public class AnalyzingSentenceTokenizer
extends org.apache.lucene.analysis.Tokenizer
| Constructor and Description |
|---|
AnalyzingSentenceTokenizer(org.apache.lucene.util.AttributeFactory factory,
boolean removeBadSentences,
org.apache.lucene.analysis.CharArraySet stopWords,
float commaWordThreshold,
float maxStopwordRatio,
int minSentenceLength)
Construct a token stream processing the given input using the given AttributeFactory.
|
| Modifier and Type | Method and Description |
|---|---|
void |
end()
Sets the final offset, does not reset internal state.
|
boolean |
incrementToken() |
protected boolean |
incrementTokenInternal() |
void |
reset()
Method is called after the input has been set.
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic AnalyzingSentenceTokenizer(org.apache.lucene.util.AttributeFactory factory,
boolean removeBadSentences,
org.apache.lucene.analysis.CharArraySet stopWords,
float commaWordThreshold,
float maxStopwordRatio,
int minSentenceLength)
factory - the factory.removeBadSentences - if true, sentences with too many stopwords are filtered out.stopWords - the stopwords.commaWordThreshold - the threshold that defines the "comma density" that, if exceeded, causes a sentence to be split into
sub-sentences that are analyzed individually.maxStopwordRatio - if the ratio of stopwords exceeds this threshold, the sentence is filtered out.minSentenceLength - a sentence must contain at least this many words, otherwise it is not analyzed and always emitted.public void end()
throws IOException
end in class org.apache.lucene.analysis.TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class org.apache.lucene.analysis.TokenizerIOExceptionpublic final boolean incrementToken()
throws IOException
incrementToken in class org.apache.lucene.analysis.TokenStreamtrue to indicate to the caller to read the current attribute state and false to
indicate the end of the token stream.IOExceptionprotected boolean incrementTokenInternal()
throws IOException
true if the current attribute state should be emittedIOExceptionCopyright © 2020 solr.cool. All rights reserved.