| Package | Description |
|---|---|
| org.icij.extract | |
| org.icij.extract.document | |
| org.icij.extract.extractor | |
| org.icij.extract.parser | |
| org.icij.extract.queue | |
| org.icij.extract.report | |
| org.icij.spewer |
| Modifier and Type | Field and Description |
|---|---|
protected BlockingQueue<TikaDocument> |
Scanner.queue |
| Constructor and Description |
|---|
Scanner(DocumentFactory factory,
BlockingQueue<TikaDocument> queue) |
Scanner(DocumentFactory factory,
BlockingQueue<TikaDocument> queue,
SealableLatch latch) |
Scanner(DocumentFactory factory,
BlockingQueue<TikaDocument> queue,
SealableLatch latch,
Notifiable notifiable)
Creates a
Scanner that sends all results straight to the underlying BlockingQueue on a
single thread. |
ScannerVisitor(Path path,
BlockingQueue<TikaDocument> queue,
DocumentFactory factory,
Options<String> options)
Instantiate a new task for scanning the given path.
|
| Modifier and Type | Class and Description |
|---|---|
class |
EmbeddedTikaDocument |
| Modifier and Type | Method and Description |
|---|---|
TikaDocument |
DocumentFactory.create(Path path) |
TikaDocument |
DocumentFactory.create(Path path,
BasicFileAttributes attributes) |
TikaDocument |
DocumentFactory.create(Path path,
long size) |
TikaDocument |
DocumentFactory.create(Path path,
org.apache.tika.metadata.Metadata metadata) |
TikaDocument |
DocumentFactory.create(String path) |
TikaDocument |
DocumentFactory.create(String id,
Path path) |
TikaDocument |
DocumentFactory.create(String id,
Path path,
long size) |
TikaDocument |
DocumentFactory.create(String id,
Path path,
org.apache.tika.metadata.Metadata metadata) |
TikaDocument |
DocumentFactory.create(String id,
String path) |
TikaDocument |
DocumentFactory.create(URL url) |
| Modifier and Type | Method and Description |
|---|---|
String |
Identifier.generate(TikaDocument tikaDocument)
Generate an identifier for a root tikaDocument.
|
String |
PathDigestIdentifier.generate(TikaDocument document) |
String |
DigestIdentifier.generate(TikaDocument tikaDocument) |
String |
PathIdentifier.generate(TikaDocument tikaDocument) |
String |
Identifier.hash(TikaDocument tikaDocument)
Generate or retrieve (from metadata) a hash digest of the tikaDocument's underlying file data.
|
String |
AbstractIdentifier.hash(TikaDocument tikaDocument) |
| Modifier and Type | Method and Description |
|---|---|
void |
DocumentConsumer.accept(TikaDocument tikaDocument)
Consume a file.
|
Reader |
Extractor.extract(TikaDocument tikaDocument)
This method will wrap the given
TikaDocument in a TikaInputStream and return a Reader
which can be used to initiate extraction on demand. |
void |
Extractor.extract(TikaDocument tikaDocument,
Spewer spewer)
Extract and spew content from a document.
|
void |
Extractor.extract(TikaDocument tikaDocument,
Spewer spewer,
Reporter reporter)
Extract and spew content from a document.
|
protected Reader |
Extractor.extract(TikaDocument tikaDocument,
org.apache.tika.io.TikaInputStream input)
Create a pull-parser from the given
TikaInputStream. |
| Constructor and Description |
|---|
EmbeddingHTMLParsingReader(TikaDocument parent,
String open,
String close,
org.apache.tika.parser.Parser parser,
org.apache.tika.io.TikaInputStream input,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context) |
| Modifier and Type | Method and Description |
|---|---|
Future<Long> |
DocumentQueueDrainer.drain(TikaDocument poison)
Like
DocumentQueueDrainer.drain() except that draining will stop when the given poison pill is returned from the queue. |
| Constructor and Description |
|---|
DocumentQueueDrainer(DocumentQueue queue,
java.util.function.Consumer<TikaDocument> consumer)
Create a new drainer that will drain documents from the given queue to the given consumer on a single thread.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
Reporter.check(TikaDocument tikaDocument,
ExtractionStatus result)
Check an extraction result.
|
boolean |
HashMapReportMap.fastPut(TikaDocument key,
Report value) |
boolean |
ReportMap.fastPut(TikaDocument key,
Report value)
Allow implementations to define a faster method for putting values into the map that doesn't require the
previous value to be returned.
|
Report |
Reporter.report(TikaDocument tikaDocument)
Check the extraction result of a given tikaDocument.
|
void |
Reporter.save(TikaDocument tikaDocument,
ExtractionStatus status)
Save the extraction status for the given tikaDocument.
|
void |
Reporter.save(TikaDocument tikaDocument,
ExtractionStatus status,
Exception exception)
Save the extraction status and optional exception for the given tikaDocument.
|
void |
Reporter.save(TikaDocument tikaDocument,
Report report)
Save the extraction report for the given tikaDocument.
|
boolean |
Reporter.skip(TikaDocument tikaDocument)
Check whether a path should be skipped.
|
| Modifier and Type | Method and Description |
|---|---|
TikaDocument[] |
Spewer.write(Path path) |
| Modifier and Type | Method and Description |
|---|---|
void |
RESTSpewer.write(TikaDocument tikaDocument,
Reader reader) |
void |
FileSpewer.write(TikaDocument tikaDocument,
Reader reader) |
void |
PrintStreamSpewer.write(TikaDocument tikaDocument,
Reader reader) |
abstract void |
Spewer.write(TikaDocument tikaDocument,
Reader reader) |
void |
RESTSpewer.writeMetadata(TikaDocument tikaDocument) |
void |
FileSpewer.writeMetadata(TikaDocument tikaDocument) |
void |
PrintStreamSpewer.writeMetadata(TikaDocument tikaDocument) |
abstract void |
Spewer.writeMetadata(TikaDocument tikaDocument) |
Copyright © 2018. All rights reserved.