lux.index.analysis
Class XmlTextTokenStream

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by lux.index.analysis.TextOffsetTokenStream
              extended by lux.index.analysis.XmlTextTokenStream
All Implemented Interfaces:
Closeable

public final class XmlTextTokenStream
extends TextOffsetTokenStream

Extracts tokens from an s9api XML document tree (XdmNode) in order to make them available to Lucene classes that accept TokenStreams, like the indexer and highlighter.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Field Summary
protected  Reader charStream
           
protected  Iterator<net.sf.saxon.s9api.XdmNode> contentIter
           
protected  net.sf.saxon.s9api.XdmNode curNode
           
protected static net.sf.saxon.s9api.XdmSequenceIterator EMPTY
           
protected  org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAtt
           
 
Constructor Summary
XmlTextTokenStream(String fieldName, org.apache.lucene.analysis.Analyzer analyzer, org.apache.lucene.analysis.TokenStream wrapped, net.sf.saxon.s9api.XdmNode doc, Offsets offsets)
          Creates a TokenStream returning tokens drawn from the text content of the document.
 
Method Summary
 org.apache.lucene.analysis.TokenStream getWrappedTokenStream()
           
 boolean incrementToken()
           
protected  boolean incrementWrappedTokenStream()
           
 void reset()
           
 void reset(Reader reader)
           
protected  void setWrappedTokenStream(org.apache.lucene.analysis.TokenStream wrapped)
           
 
Methods inherited from class lux.index.analysis.TextOffsetTokenStream
resetTokenizer
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
close, end
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

curNode

protected net.sf.saxon.s9api.XdmNode curNode

contentIter

protected Iterator<net.sf.saxon.s9api.XdmNode> contentIter

termAtt

protected org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAtt

charStream

protected Reader charStream

EMPTY

protected static final net.sf.saxon.s9api.XdmSequenceIterator EMPTY
Constructor Detail

XmlTextTokenStream

public XmlTextTokenStream(String fieldName,
                          org.apache.lucene.analysis.Analyzer analyzer,
                          org.apache.lucene.analysis.TokenStream wrapped,
                          net.sf.saxon.s9api.XdmNode doc,
                          Offsets offsets)
Creates a TokenStream returning tokens drawn from the text content of the document.

Parameters:
fieldName - nominally: the field to be analyzed; the analyzer receives this when the token stream is reset at node boundaries
analyzer - specifies what text processing to apply to node text
wrapped - a TokenStream generated by the analyzer
doc - tokens will be drawn from all of the text in this document
offsets - if provided, character offsets are captured in this object In theory this can be used for faster highlighting, but until that is proven, this should always be null.
Method Detail

reset

public void reset()
           throws IOException
Overrides:
reset in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

reset

public void reset(Reader reader)
           throws IOException
Throws:
IOException

incrementToken

public boolean incrementToken()
                       throws IOException
Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

getWrappedTokenStream

public org.apache.lucene.analysis.TokenStream getWrappedTokenStream()
Returns:
the underlying stream of text tokens to which additional xml-related attributes are added by this.

setWrappedTokenStream

protected void setWrappedTokenStream(org.apache.lucene.analysis.TokenStream wrapped)

incrementWrappedTokenStream

protected boolean incrementWrappedTokenStream()
                                       throws IOException
Throws:
IOException


Copyright © 2013. All Rights Reserved.