public abstract class TextOffsetTokenStream extends XmlTokenStreamBase
This TokenStream records the offsets (character positions in the original text) of every token. It records the start offset of each text node, and whenever there is a difference between the length of the serialized XML and the length of the text, it records the offset just after the discrepancy. For example if a character entity (like &) occurs in the XML, this is translated to "&" in the text, and a character offset is recorded for the character just following the "&".
charStream, contentIter, curNode, defVis, eltVis, EMPTY, termAtt| Constructor and Description |
|---|
TextOffsetTokenStream(String fieldName,
org.apache.lucene.analysis.Analyzer analyzer,
org.apache.lucene.analysis.TokenStream wrapped,
net.sf.saxon.s9api.XdmNode doc,
Offsets offsets) |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
resetTokenizer(CharSequence text) |
close, getDefaultVisibility, getElementVisibility, getWrappedTokenStream, incrementToken, incrementWrappedTokenStream, reset, reset, setDefaultVisibility, setElementVisibility, setWrappedTokenStreamaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringprotected boolean resetTokenizer(CharSequence text)
Copyright © 2013. All Rights Reserved.