Package org.pipecraft.pipes.terminal
Class SharderBySeqPipe<T>
- java.lang.Object
-
- org.pipecraft.pipes.terminal.TerminalPipe
-
- org.pipecraft.pipes.terminal.CompoundTerminalPipe
-
- org.pipecraft.pipes.terminal.SharderBySeqPipe<T>
-
- Type Parameters:
T- The input items' data type
- All Implemented Interfaces:
Closeable,AutoCloseable,BasePipe
public class SharderBySeqPipe<T> extends CompoundTerminalPipe
A terminal pipe that splits the contents of the input pipe according to some criteria which breaks the input pipe into disjoint contiguous sequences. Unlike other sharder pipes, this implementation assumes that input items are already grouped by target shard, therefore it can work file by file, avoiding the need to maintain many open files at the same time. Note that if a sequence corresponds to an already processed shard, the shard's file will be overwritten.- Author:
- Eyal Schneider
-
-
Constructor Summary
Constructors Constructor Description SharderBySeqPipe(Pipe<T> input, EncoderFactory<? super T> encoderFactory, FailableFunction<? super T,String,PipeException> shardSelectorFunction, File folder)Constructor Uses default file write optionsSharderBySeqPipe(Pipe<T> input, EncoderFactory<? super T> encoderFactory, FailableFunction<? super T,String,PipeException> shardSelectorFunction, File folder, FileWriteOptions fileWriteOptions)Constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected TerminalPipecreatePipeline()Map<String,Integer>getShardSizes()-
Methods inherited from class org.pipecraft.pipes.terminal.CompoundTerminalPipe
close, getProgress, start
-
-
-
-
Constructor Detail
-
SharderBySeqPipe
public SharderBySeqPipe(Pipe<T> input, EncoderFactory<? super T> encoderFactory, FailableFunction<? super T,String,PipeException> shardSelectorFunction, File folder, FileWriteOptions fileWriteOptions)
Constructor- Parameters:
input- The input pipeencoderFactory- The encoder factory to use for writing items into the different shardsshardSelectorFunction- Given an item, selects the corresponding shard id. Files will use this id as a name. Must not return null for any non null input!folder- The folder where to place all shards. Must exist.fileWriteOptions- Define how files should be written
-
SharderBySeqPipe
public SharderBySeqPipe(Pipe<T> input, EncoderFactory<? super T> encoderFactory, FailableFunction<? super T,String,PipeException> shardSelectorFunction, File folder)
Constructor Uses default file write options- Parameters:
input- The input pipeencoderFactory- The encoder factory to use for writing items into the different shardsshardSelectorFunction- Given an item, selects the corresponding shard id. Files will use this id as a name. May be stateful. Must not return null for any non null input!folder- The folder where to place all shards. Must exist.
-
-
Method Detail
-
createPipeline
protected TerminalPipe createPipeline() throws PipeException, InterruptedException
- Specified by:
createPipelinein classCompoundTerminalPipe- Returns:
- A new terminal pipeline to represent the logic of this pipe
- Throws:
PipeException- In case of a pipeline creation errorInterruptedException- In case that the thread is interrupted
-
-