public abstract class XmlTokenStreamBase
extends org.apache.lucene.analysis.TokenStream
This is the root of a set of xml-aware TokenStream classes that work by selecting text a node at a time from an XML document, and then passing that text to the wrapped TokenStream. The wrapped TokenStream is re-used for each text node. The outermost link in the chain will be a TokenFilter that applies a sequence of structure-related Attributes to each text token (ie a list of QNames, but can be any kind of structural attribute that should be composed with each text token).
The token stream topology is: this( this.wrapped (this.tokenizer )) For example, for the element-text field we have ElementTokenStream (a subclass of this class):
ElementTokenStream (QNameTokenFilter (LowerCaseFilter (StandardTokenizer)))
We can't follow the standard Lucene pattern of Analyzer as a factory for a TokenStream since we want to be able to extend any arbitrary textual Analyzer, but the constraints of the Analyzer class design prevent it from being extended in a straightforward manner. Thus we have essentially an outer (XML) stream wrapping an inner (Text) stream.
FIXME: make the constructor protected; allow construction only through static builders defined on each derived class. This will enable us to hide the complexity of wrapping the token stream, which is the same pattern for each of these; only the classes vary. But we can't do the work in the constructor due to Java structural issues.| Modifier and Type | Field and Description |
|---|---|
protected Reader |
charStream |
protected Iterator<net.sf.saxon.s9api.XdmNode> |
contentIter |
protected net.sf.saxon.s9api.XdmNode |
curNode |
protected ElementVisibility |
defVis |
protected HashMap<net.sf.saxon.s9api.QName,ElementVisibility> |
eltVis |
protected static net.sf.saxon.s9api.XdmSequenceIterator |
EMPTY |
protected org.apache.lucene.analysis.tokenattributes.CharTermAttribute |
termAtt |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
ElementVisibility |
getDefaultVisibility() |
ElementVisibility |
getElementVisibility(net.sf.saxon.s9api.QName qname) |
org.apache.lucene.analysis.TokenStream |
getWrappedTokenStream() |
boolean |
incrementToken() |
protected boolean |
incrementWrappedTokenStream() |
void |
reset() |
void |
reset(Reader reader) |
void |
setDefaultVisibility(ElementVisibility vis) |
void |
setElementVisibility(net.sf.saxon.s9api.QName qname,
ElementVisibility vis)
sets the visibility of elements with the given name
|
protected void |
setWrappedTokenStream(org.apache.lucene.analysis.TokenStream wrapped) |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringprotected net.sf.saxon.s9api.XdmNode curNode
protected Iterator<net.sf.saxon.s9api.XdmNode> contentIter
protected org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAtt
protected Reader charStream
protected ElementVisibility defVis
protected HashMap<net.sf.saxon.s9api.QName,ElementVisibility> eltVis
protected static final net.sf.saxon.s9api.XdmSequenceIterator EMPTY
public void reset()
throws IOException
reset in class org.apache.lucene.analysis.TokenStreamIOExceptionpublic void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableclose in class org.apache.lucene.analysis.TokenStreamIOExceptionpublic void reset(Reader reader) throws IOException
IOExceptionpublic boolean incrementToken()
throws IOException
incrementToken in class org.apache.lucene.analysis.TokenStreamIOExceptionpublic org.apache.lucene.analysis.TokenStream getWrappedTokenStream()
protected void setWrappedTokenStream(org.apache.lucene.analysis.TokenStream wrapped)
protected boolean incrementWrappedTokenStream()
throws IOException
IOExceptionpublic ElementVisibility getElementVisibility(net.sf.saxon.s9api.QName qname)
qname - the name of an element as an s9api QNamepublic void setElementVisibility(net.sf.saxon.s9api.QName qname,
ElementVisibility vis)
qname - the name of an element as an s9api QNamevis - the visibility of the element's content from the perspective of containing elements.
visibility.public ElementVisibility getDefaultVisibility()
ElementVisibility.OPAQUE.public void setDefaultVisibility(ElementVisibility vis)
Copyright © 2013. All Rights Reserved.