Package org.pipecraft.pipes.terminal
Class AsyncSharderPipe<T>
- java.lang.Object
-
- org.pipecraft.pipes.terminal.TerminalPipe
-
- org.pipecraft.pipes.terminal.AsyncSharderPipe<T>
-
- Type Parameters:
T- The input items' data type
- All Implemented Interfaces:
Closeable,AutoCloseable,BasePipe
- Direct Known Subclasses:
AsyncSharderByHashPipe
public class AsyncSharderPipe<T> extends TerminalPipe
A terminal pipe that receives an async pipe as input, and splits the contents of the input pipe into multiple files according to some sharding criteria based on individual items. The async input allows high throughput through parallel writes to different files. The writing is done using the threads provided by the input pipe. The implementation allows calling close() by any thread after start() has been invoked. Note that this implementation keeps all shard files open at the same time, so make sure the system can handle this number of open files.- Author:
- Eyal Schneider
-
-
Constructor Summary
Constructors Constructor Description AsyncSharderPipe(AsyncPipe<T> input, EncoderFactory<? super T> encoderFactory, Function<? super T,String> shardSelectorFunction, File folder)Constructor Uses default file write optionsAsyncSharderPipe(AsyncPipe<T> input, EncoderFactory<? super T> encoderFactory, Function<? super T,String> shardSelectorFunction, File folder, FileWriteOptions writeOptions)Constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()floatgetProgress()Map<String,Integer>getShardSizes()voidstart()Performs pre-processing prior to item flow throw the pipe.
-
-
-
Constructor Detail
-
AsyncSharderPipe
public AsyncSharderPipe(AsyncPipe<T> input, EncoderFactory<? super T> encoderFactory, Function<? super T,String> shardSelectorFunction, File folder, FileWriteOptions writeOptions)
Constructor- Parameters:
input- The input pipeencoderFactory- The encoder factory to use for writing items into the different shardsshardSelectorFunction- Given an item, selects the corresponding shard id. Files will use this id as a name. Must not return null for any non null input!folder- The folder where to place all shards. Must exist.writeOptions- Specify how the shard files should be written
-
AsyncSharderPipe
public AsyncSharderPipe(AsyncPipe<T> input, EncoderFactory<? super T> encoderFactory, Function<? super T,String> shardSelectorFunction, File folder)
Constructor Uses default file write options- Parameters:
input- The input pipeencoderFactory- The encoder factory to use for writing items into the different shardsshardSelectorFunction- Given an item, selects the corresponding shard id. Files will use this id as a name. Must not return null for any non null input!folder- The folder where to place all shards. Must exist.
-
-
Method Detail
-
close
public void close() throws IOException- Throws:
IOException
-
start
public void start() throws PipeException, InterruptedExceptionDescription copied from interface:BasePipePerforms pre-processing prior to item flow throw the pipe. Implementations must call the same method for all their input pipes before accessing their items. This is typically done here.- Throws:
PipeException- In case of pipe errors in this pipe or somewhere up-stream.InterruptedException- In case that the operation has been interrupted by another thread.
-
getShardSizes
public Map<String,Integer> getShardSizes()
- Returns:
- The counts of items written to each shard. Call this method only after start() has been called and completed successfully.
-
getProgress
public float getProgress()
- Specified by:
getProgressin interfaceBasePipe- Overrides:
getProgressin classTerminalPipe- Returns:
- The pipe flow progress, as a floating number between 0.0 and 1.0. Important implementation rules: 1) Calling this method before start() call is complete isn't allowed and has an undefined behavior. 2) Implementation should do best effort to provide an estimate of the progress this pipe has made (0.0 - 1.0) 3) When the pipe is fully consumed, getProgress() should return 1.0. 4) Results must be monotonous, i.e. results of consecutive calls may never be decreasing. 5) Thread safety: progress may be maintained by some thread/s but monitoring by other threads. Implementations must be thread safe.
-
-