lux.index.analysis
Class ElementTokenStream
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
lux.index.analysis.TextOffsetTokenStream
lux.index.analysis.ElementTokenStream
- All Implemented Interfaces:
- Closeable
public final class ElementTokenStream
- extends TextOffsetTokenStream
A TokenStream that extracts text from a Saxon Document model (XdmNode) and generates
a token for every "word" for every element that contains it.
TODO: control over element transparency
| Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
|
Field Summary |
protected Reader |
charStream
|
protected Iterator<net.sf.saxon.s9api.XdmNode> |
contentIter
|
protected net.sf.saxon.s9api.XdmNode |
curNode
|
protected static net.sf.saxon.s9api.XdmSequenceIterator |
EMPTY
|
protected org.apache.lucene.analysis.tokenattributes.CharTermAttribute |
termAtt
|
|
Constructor Summary |
ElementTokenStream(String fieldName,
org.apache.lucene.analysis.Analyzer analyzer,
org.apache.lucene.analysis.TokenStream wrapped,
net.sf.saxon.s9api.XdmNode doc,
Offsets offsets)
|
| Methods inherited from class org.apache.lucene.analysis.TokenStream |
close, end |
| Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState |
curNode
protected net.sf.saxon.s9api.XdmNode curNode
contentIter
protected Iterator<net.sf.saxon.s9api.XdmNode> contentIter
termAtt
protected org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAtt
charStream
protected Reader charStream
EMPTY
protected static final net.sf.saxon.s9api.XdmSequenceIterator EMPTY
ElementTokenStream
public ElementTokenStream(String fieldName,
org.apache.lucene.analysis.Analyzer analyzer,
org.apache.lucene.analysis.TokenStream wrapped,
net.sf.saxon.s9api.XdmNode doc,
Offsets offsets)
updateNodeAtts
protected void updateNodeAtts()
reset
public void reset()
throws IOException
- Overrides:
reset in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
reset
public void reset(Reader reader)
throws IOException
- Throws:
IOException
incrementToken
public boolean incrementToken()
throws IOException
- Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
getWrappedTokenStream
public org.apache.lucene.analysis.TokenStream getWrappedTokenStream()
- Returns:
- the underlying stream of text tokens to which additional xml-related attributes are added by this.
setWrappedTokenStream
protected void setWrappedTokenStream(org.apache.lucene.analysis.TokenStream wrapped)
incrementWrappedTokenStream
protected boolean incrementWrappedTokenStream()
throws IOException
- Throws:
IOException
Copyright © 2013. All Rights Reserved.