public class ParsingReader extends Reader
Parser
to parse the content from a given input stream. A ContentHandler class and a pipe is used to convert the
push-based SAX event stream to the pull-based character stream defined by the Reader interface.
Based on an implementation from the Tika source. This version adds functionality for markup output.| Modifier and Type | Field and Description |
|---|---|
protected org.apache.tika.parser.ParseContext |
context
The parse context.
|
protected ContentHandler |
handler
Receives SAX events.
|
protected InputStream |
input
The binary stream being parsed.
|
protected org.apache.tika.metadata.Metadata |
metadata
Metadata associated with the document being parsed.
|
protected org.apache.tika.parser.Parser |
parser
Parser instance used for parsing the given binary stream.
|
protected Reader |
reader
Buffered read end of the pipe.
|
| Constructor and Description |
|---|
ParsingReader(InputStream input)
Creates a reader for the content of the given binary stream.
|
ParsingReader(InputStream input,
String name)
Creates a reader for the content of the given binary stream with the given name.
|
ParsingReader(org.apache.tika.parser.Parser parser,
InputStream input,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context) |
ParsingReader(org.apache.tika.parser.Parser parser,
InputStream input,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context,
java.util.function.Function<Writer,ContentHandler> handler)
Creates a reader for the content of the given binary stream
with the given document metadata.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Closes the read end of the pipe.
|
void |
parse()
Parses the given binary stream and writes the text content to the write end of the pipe.
|
int |
read(char[] buffer,
int off,
int len)
Reads parsed text from the pipe connected to the parsing thread.
|
protected final org.apache.tika.parser.Parser parser
protected final Reader reader
protected final InputStream input
protected final org.apache.tika.metadata.Metadata metadata
protected final org.apache.tika.parser.ParseContext context
protected final ContentHandler handler
public ParsingReader(InputStream input) throws IOException
input - binary streamIOException - if the document can not be parsedpublic ParsingReader(InputStream input, String name) throws IOException
input - binary streamname - document nameIOException - if the document can not be parsedpublic ParsingReader(org.apache.tika.parser.Parser parser,
InputStream input,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
throws IOException
IOExceptionpublic ParsingReader(org.apache.tika.parser.Parser parser,
InputStream input,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context,
java.util.function.Function<Writer,ContentHandler> handler)
throws IOException
close() method is called on this reader.parser - parser instanceinput - binary streammetadata - document metadatacontext - parsing contextIOException - if the document can not be parsedpublic int read(char[] buffer,
int off,
int len)
throws IOException
read in class Readerbuffer - character bufferoff - start offset within the bufferlen - maximum number of characters to readIOException - if the parsing thread has failed or
if for some reason the pipe does not work properlypublic void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableclose in class ReaderIOException - if the pipe cannot be closedpublic void parse()
Copyright © 2018 The International Consortium of Investigative Journalists. All rights reserved.