Class NoPunctuationTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class NoPunctuationTokenizer
    extends org.apache.lucene.analysis.util.CharTokenizer
    For this tokenizer every character is a tokenChar except whitespace and . , ; ? ! : [ ] ( ) { } " '
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

        org.apache.lucene.util.AttributeSource.State
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static char[] PUNCTS  
      • Fields inherited from class org.apache.lucene.analysis.Tokenizer

        input
      • Fields inherited from class org.apache.lucene.analysis.TokenStream

        DEFAULT_TOKEN_ATTRIBUTE_FACTORY
      • Fields inherited from class org.apache.lucene.util.AttributeSource

        DEFAULT_ATTRIBUTE_FACTORY
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected boolean isTokenChar​(int c)  
      • Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer

        end, incrementToken, normalize, reset
      • Methods inherited from class org.apache.lucene.analysis.Tokenizer

        close, correctOffset, setReader
      • Methods inherited from class org.apache.lucene.util.AttributeSource

        addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
    • Field Detail

      • PUNCTS

        public static final char[] PUNCTS
    • Constructor Detail

      • NoPunctuationTokenizer

        public NoPunctuationTokenizer​(Reader input)
      • NoPunctuationTokenizer

        public NoPunctuationTokenizer​(org.apache.lucene.util.AttributeFactory factory,
                                      Reader input)
    • Method Detail

      • isTokenChar

        protected boolean isTokenChar​(int c)
        Specified by:
        isTokenChar in class org.apache.lucene.analysis.util.CharTokenizer