public class Scanner extends ExecutorProxy
scan(java.nio.file.Path) is called, the job is put in an unbounded queue and executed in serial. This makes sense as
it's usually the file system which is a bottleneck and not the CPU, so parallelization won't help.
The scan(java.nio.file.Path) method is non-blocking, which is useful for creating parallelized producer-consumer setups, where
files are processed as they're scanned.
Encountered documents are put in a given queue. This is a classic producer, putting elements in a queue which are
then extracted by a consumer.
The queue should be bounded, to avoid the scanner filling up memory, but the bound should be high enough to create a
significant buffer between the scanner and the consumer.
Documents are pushed into the queue synchronously and if the queue is bounded, only when a space becomes available.
This implementation is thread-safe.| Modifier and Type | Field and Description |
|---|---|
protected BlockingQueue<TikaDocument> |
queue |
executor| Constructor and Description |
|---|
Scanner(DocumentFactory factory,
BlockingQueue<TikaDocument> queue) |
Scanner(DocumentFactory factory,
BlockingQueue<TikaDocument> queue,
SealableLatch latch) |
Scanner(DocumentFactory factory,
BlockingQueue<TikaDocument> queue,
SealableLatch latch,
Notifiable notifiable)
Creates a
Scanner that sends all results straight to the underlying BlockingQueue on a
single thread. |
| Modifier and Type | Method and Description |
|---|---|
Scanner |
configure(Options<String> options)
Configure the scanner with the given options.
|
ScannerVisitor |
createScannerVisitor(Path path) |
void |
exclude(String pattern)
Add a glob pattern for excluding files and directories.
|
boolean |
followSymLinks()
Check whether symlinks will be followed.
|
void |
followSymLinks(boolean followLinks)
Set whether symlinks should be followed.
|
SealableLatch |
getLatch()
Get the latch.
|
int |
getMaxDepth()
Get the currently set maximum depth to recurse when scanning.
|
long |
getNumberOfFiles(Path path) |
boolean |
ignoreHiddenFiles()
Check whether hidden files will be ignored.
|
void |
ignoreHiddenFiles(boolean ignoreHiddenFiles)
Set whether hidden files should be ignored.
|
boolean |
ignoreSystemFiles()
Check whether system files will be ignored.
|
void |
ignoreSystemFiles(boolean ignoreSystemFiles)
Set whether system files should be ignored.
|
void |
include(String pattern)
Add a glob pattern for including files.
|
long |
queued() |
Future<Path> |
scan(Path path)
Queue a scanning job.
|
List<Future<Path>> |
scan(Path[] paths)
Submit all of the given paths to the scanner for execution, returning a list of
Future objects
representing those tasks. |
List<Future<Path>> |
scan(String[] paths) |
void |
setMaxDepth(int maxDepth)
Set the maximum depth to recurse when scanning.
|
awaitTermination, shutdown, shutdownNowprotected final BlockingQueue<TikaDocument> queue
public Scanner(DocumentFactory factory, BlockingQueue<TikaDocument> queue)
public Scanner(DocumentFactory factory, BlockingQueue<TikaDocument> queue, SealableLatch latch)
public Scanner(DocumentFactory factory, BlockingQueue<TikaDocument> queue, SealableLatch latch, Notifiable notifiable)
Scanner that sends all results straight to the underlying BlockingQueue on a
single thread.queue - results from the scanner will be put on this queuelatch - signalled when a document is queuednotifiable - receives notifications when new file documents are queuedpublic Scanner configure(Options<String> options)
options - options for configuring the scannerpublic void include(String pattern)
pattern - the glob patternpublic void exclude(String pattern)
pattern - the glob patternpublic void followSymLinks(boolean followLinks)
followLinks - whether to follow symlinkspublic boolean followSymLinks()
public void ignoreHiddenFiles(boolean ignoreHiddenFiles)
ignoreHiddenFiles - whether to ignore hidden filespublic boolean ignoreHiddenFiles()
public void ignoreSystemFiles(boolean ignoreSystemFiles)
ignoreSystemFiles - whether to ignore system filespublic boolean ignoreSystemFiles()
public void setMaxDepth(int maxDepth)
maxDepth - maximum depthpublic int getMaxDepth()
public SealableLatch getLatch()
public long queued()
public Future<Path> scan(Path path)
ExecutorProxy.awaitTermination(long, TimeUnit) to block.path - the path to scanFuture that can be used to wait on the result or cancel.public ScannerVisitor createScannerVisitor(Path path)
public List<Future<Path>> scan(Path[] paths)
Future objects
representing those tasks.Future for each path scannedscan(Path)public List<Future<Path>> scan(String[] paths)
scan(Path[])public long getNumberOfFiles(Path path) throws IOException
IOExceptionCopyright © 2018 The International Consortium of Investigative Journalists. All rights reserved.