|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectlux.index.XmlIndexer
public class XmlIndexer
Indexes XML documents. The constructor accepts a set of flags that define a set of fields known to XmlIndexer. The fields are represented by instances of XmlField. Instances of XmlField are immutable; they hold no data, merely serving as markers. Additional fields can also be added using addField(). A field may be associated with a StAXHandler; the indexer is responsible for feeding the handlers with StAX (XML) events. Some fields may share the same handler. The association between field and handler is implicit: the field calls an XmlIndexer getter to retrieve the handler. Also, this class is not thread-safe This is all kind of a mess, and not readily extendable. If you want to add a new type of field (a new XmlField instance), you have to modify the indexer, which has knowledge of all the possible fields. This is not a good design. Also, not every combination of indexing options will actually work. We need to consider which things one might actually want to turn on and off. We could make each field act as a StAXHandler factory? For efficiency though, some fields share the same handler instance. For now, we leave things as they are; we'll refactor as we add more fields. Indexing is triggered by a call to indexDocument(). read(InputStream) parses and gathers the values. which are retrieved by calling XmlField.getFieldValues(XmlIndexer) for each field.
| Constructor Summary | |
|---|---|
XmlIndexer()
Make a new instance with default options |
|
XmlIndexer(IndexConfiguration config)
Make a new instance with the given configuration. |
|
XmlIndexer(IndexConfiguration indexConfig,
Compiler compiler)
Make a new instance with the given options and Compiler. |
|
XmlIndexer(long options)
Make a new instance with the given options. |
|
| Method Summary | |
|---|---|
org.apache.lucene.document.Document |
createLuceneDocument()
|
net.sf.saxon.s9api.XdmValue |
evaluateXPath(String xpath)
this is primarily for internal use |
IndexConfiguration |
getConfiguration()
|
byte[] |
getDocumentBytes()
|
String |
getDocumentText()
|
XmlPathMapper |
getPathMapper()
Primarily for internal use. |
SaxonDocBuilder |
getSaxonDocBuilder()
Primarily for internal use. |
String |
getURI()
|
net.sf.saxon.s9api.XdmNode |
getXdmNode()
|
net.sf.saxon.s9api.XPathCompiler |
getXPathCompiler()
this is primarily for internal use |
void |
index(InputStream xml,
String inputUri)
Index the document read from the stream, caching field values to be written to the Lucene index. |
void |
index(net.sf.saxon.om.NodeInfo doc,
String inputUri)
Index the document read from the String, caching field values to be written to the Lucene index. |
void |
index(Reader xml,
String inputUri)
Index the document read from the Reader, caching field values to be written to the Lucene index. |
void |
indexDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
InputStream xmlStream)
Index and write a document to the Lucene index. |
void |
indexDocument(org.apache.lucene.index.IndexWriter indexWriter,
String path,
net.sf.saxon.om.NodeInfo node)
Index and write a document to the Lucene index. |
void |
indexDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
String xml)
Index and write a document to the Lucene index. |
protected void |
init()
initialize the indexer; an extension of the constructors. |
org.apache.lucene.index.IndexWriter |
newIndexWriter(org.apache.lucene.store.Directory dir)
Constructs a new Lucene IndexWriter for the given index directory supplied with the proper analyzers for each field. |
void |
reset()
Clear out internal storage cached by #index when indexing a document |
void |
storeDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
byte[] bytes)
Fully read the stream and store it as a document without attempting to parse or index it. |
void |
storeDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
InputStream input)
Fully read the stream and store it as a document without attempting to parse or index it. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public XmlIndexer()
public XmlIndexer(IndexConfiguration config)
config - the index configuration to usepublic XmlIndexer(long options)
options - the index configuration options to use
public XmlIndexer(IndexConfiguration indexConfig,
Compiler compiler)
indexConfig - the index configuration options to usecompiler - the indexer will make XPath that is compatible with this compiler| Method Detail |
|---|
protected void init()
public org.apache.lucene.index.IndexWriter newIndexWriter(org.apache.lucene.store.Directory dir)
throws IOException
dir - the directory where the index is stored
IOException - if there is a problem with the indexpublic net.sf.saxon.s9api.XPathCompiler getXPathCompiler()
public net.sf.saxon.s9api.XdmValue evaluateXPath(String xpath)
throws net.sf.saxon.s9api.SaxonApiException
xpath - an xpath expression to evaluate
net.sf.saxon.s9api.SaxonApiException - if there is an error during compilation or evaluation
public void index(InputStream xml,
String inputUri)
throws XMLStreamException
xml - the document, as a byte-based InputStreaminputUri - the uri to assign to the document
XMLStreamException
public void index(Reader xml,
String inputUri)
throws XMLStreamException
xml - the document, as a character-based ReaderinputUri - the uri to assign to the document
XMLStreamException
public void index(net.sf.saxon.om.NodeInfo doc,
String inputUri)
throws XMLStreamException
doc - the document (or element) as a Saxon NodeInfoinputUri - the uri to assign to the document
XMLStreamExceptionpublic void reset()
public String getURI()
public net.sf.saxon.s9api.XdmNode getXdmNode()
public String getDocumentText()
public byte[] getDocumentBytes()
storeDocument(IndexWriter, String, InputStream)
was called.
public void indexDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
String xml)
throws XMLStreamException,
IOException
indexWriter - the Lucene IndexWriter for the index to write todocUri - the uri to assign to the document; any scheme will
be stripped: only the path is stored in the indexxml - the text of an xml document to index
XMLStreamException - if there is an error parsing the document
IOException - if there is an error writing to the index
public void indexDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
InputStream xmlStream)
throws XMLStreamException,
IOException
indexWriter - the Lucene IndexWriter for the index to write todocUri - the uri to assign to the document; any scheme will
be stripped: only the path is stored in the indexxmlStream - a stream from which the text of an xml document is to be read
XMLStreamException - if there is an error parsing the document
IOException - if there is an error writing to the index
public void storeDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
InputStream input)
throws IOException
indexWriter - the Lucene IndexWriter for the index to write todocUri - the uri to assign to the document; any scheme will be stripped: only the path is stored in the indexinput - the stream to read the document from
IOException - if there is an error writing to the index
public void storeDocument(org.apache.lucene.index.IndexWriter indexWriter,
String docUri,
byte[] bytes)
throws IOException
indexWriter - the Lucene IndexWriter for the index to write todocUri - the uri to assign to the document; any scheme will be stripped: only the path is stored in the indexbytes - the document bytes to store
IOException - if there is an error writing to the index
public void indexDocument(org.apache.lucene.index.IndexWriter indexWriter,
String path,
net.sf.saxon.om.NodeInfo node)
throws XMLStreamException,
IOException
indexWriter - the Lucene IndexWriter for the index to write topath - the uri to assign to the documentnode - an xml document to index, as a Saxon NodeInfo
XMLStreamException - if there is an error parsing the document
IOException - if there is an error writing to the indexpublic org.apache.lucene.document.Document createLuceneDocument()
Document created
from the field values stored in this indexer. The document is ready
to be inserted into Lucene via IndexWriter.addDocument(java.lang.Iterable extends org.apache.lucene.index.IndexableField>).public SaxonDocBuilder getSaxonDocBuilder()
SaxonDocBuilder used by the indexer to construct XdmNodes.public XmlPathMapper getPathMapper()
XmlPathMapper used by the indexer to gather node paths.public IndexConfiguration getConfiguration()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||