lux.index.analysis
Class XmlTextTokenStream
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
lux.index.analysis.TextOffsetTokenStream
lux.index.analysis.XmlTextTokenStream
- All Implemented Interfaces:
- Closeable
public final class XmlTextTokenStream
- extends TextOffsetTokenStream
Extracts tokens from an s9api XML document tree (XdmNode) in order to make them
available to Lucene classes that accept TokenStreams, like the indexer and highlighter.
| Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
|
Field Summary |
protected Reader |
charStream
|
protected Iterator<net.sf.saxon.s9api.XdmNode> |
contentIter
|
protected net.sf.saxon.s9api.XdmNode |
curNode
|
protected static net.sf.saxon.s9api.XdmSequenceIterator |
EMPTY
|
protected org.apache.lucene.analysis.tokenattributes.CharTermAttribute |
termAtt
|
|
Constructor Summary |
XmlTextTokenStream(String fieldName,
org.apache.lucene.analysis.Analyzer analyzer,
org.apache.lucene.analysis.TokenStream wrapped,
net.sf.saxon.s9api.XdmNode doc,
Offsets offsets)
Creates a TokenStream returning tokens drawn from the text content of the document. |
| Methods inherited from class org.apache.lucene.analysis.TokenStream |
close, end |
| Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState |
curNode
protected net.sf.saxon.s9api.XdmNode curNode
contentIter
protected Iterator<net.sf.saxon.s9api.XdmNode> contentIter
termAtt
protected org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAtt
charStream
protected Reader charStream
EMPTY
protected static final net.sf.saxon.s9api.XdmSequenceIterator EMPTY
XmlTextTokenStream
public XmlTextTokenStream(String fieldName,
org.apache.lucene.analysis.Analyzer analyzer,
org.apache.lucene.analysis.TokenStream wrapped,
net.sf.saxon.s9api.XdmNode doc,
Offsets offsets)
- Creates a TokenStream returning tokens drawn from the text content of the document.
- Parameters:
fieldName - nominally: the field to be analyzed; the analyzer receives this when the
token stream is reset at node boundariesanalyzer - specifies what text processing to apply to node textwrapped - a TokenStream generated by the analyzerdoc - tokens will be drawn from all of the text in this documentoffsets - if provided, character offsets are captured in this object
In theory this can be used for faster highlighting, but until that is proven,
this should always be null.
reset
public void reset()
throws IOException
- Overrides:
reset in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
reset
public void reset(Reader reader)
throws IOException
- Throws:
IOException
incrementToken
public boolean incrementToken()
throws IOException
- Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
getWrappedTokenStream
public org.apache.lucene.analysis.TokenStream getWrappedTokenStream()
- Returns:
- the underlying stream of text tokens to which additional xml-related attributes are added by this.
setWrappedTokenStream
protected void setWrappedTokenStream(org.apache.lucene.analysis.TokenStream wrapped)
incrementWrappedTokenStream
protected boolean incrementWrappedTokenStream()
throws IOException
- Throws:
IOException
Copyright © 2013. All Rights Reserved.