public class DocumentConsumer extends ExecutorProxy implements java.util.function.Consumer<TikaDocument>
accept(TikaDocument). All tasks are sent to a
work-stealing thread pool.
The parallelism of the thread pool is defined in the call to the constructor.
A task is defined as both the extraction from a file and the output of extracted data.
Completion is only considered successful if both parts of the task complete with no exceptions.
The final status of each task is saved to the reporter, if any is set.| Modifier and Type | Field and Description |
|---|---|
protected Extractor |
extractor |
protected Spewer |
spewer |
executor| Constructor and Description |
|---|
DocumentConsumer(Spewer spewer,
Extractor extractor)
Create a new consumer with the default pool size, which is the number of available processors.
|
DocumentConsumer(Spewer spewer,
Extractor extractor,
ExecutorService executor)
Create a new consumer that submits tasks to the given
Executor. |
DocumentConsumer(Spewer spewer,
Extractor extractor,
int poolSize)
Create a new consumer with the given pool size.
|
| Modifier and Type | Method and Description |
|---|---|
void |
accept(TikaDocument tikaDocument)
Consume a file.
|
static int |
defaultPoolSize()
Returns the default thread pool size, which is equivalent to the number of available processors minus 1, or 1
- whichever is greater.
|
Reporter |
getReporter()
Get the reporter.
|
void |
setReporter(Reporter reporter)
Set the reporter.
|
awaitTermination, shutdown, shutdownNowclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitandThenpublic DocumentConsumer(Spewer spewer, Extractor extractor, ExecutorService executor)
Executor.spewer - the Spewer used to write extracted text and metadataextractor - the Extractor used to extract from filesexecutor - the executor used to run consuming taskspublic DocumentConsumer(Spewer spewer, Extractor extractor, int poolSize)
BlockingThreadPoolExecutor, which means that calls
to accept(org.icij.extract.document.TikaDocument) will block when the thread pool is full of running tasks.spewer - the Spewer used to write extracted text and metadataextractor - the Extractor used to extract from filespoolSize - the fixed size of the thread pool used to consume documentspublic static int defaultPoolSize()
public void setReporter(Reporter reporter)
reporter - reporterpublic Reporter getReporter()
public void accept(TikaDocument tikaDocument)
BlockingThreadPoolExecutor is being used (the default when no
ExecutorService is passed to the constructor) then this method will block until a thread becomes
available. Otherwise the behaviour is similar to Executor.execute(Runnable), causing the task
to be put in a queue.accept in interface java.util.function.Consumer<TikaDocument>tikaDocument - the tikaDocument to consumeRejectedExecutionException - if unable to queue the consumer task for execution, including when the
current thread is interrupted.Copyright © 2018. All rights reserved.