Package org.pipecraft.pipes.terminal
Class SharderByHashPipe<T>
- java.lang.Object
-
- org.pipecraft.pipes.terminal.TerminalPipe
-
- org.pipecraft.pipes.terminal.SharderByItemPipe<T>
-
- org.pipecraft.pipes.terminal.SharderByHashPipe<T>
-
- Type Parameters:
T- The input items' data type
- All Implemented Interfaces:
Closeable,AutoCloseable,BasePipe
public class SharderByHashPipe<T> extends SharderByItemPipe<T>
A terminal pipe that splits the contents of the input pipe into multiple files, according to a hash on a some feature of the item. The original order is preserved in each shard. Note that this implementation keeps all shard files open at the same time, so make sure the system can handle this number of open files.- Author:
- Eyal Schneider
-
-
Constructor Summary
Constructors Constructor Description SharderByHashPipe(Pipe<T> input, EncoderFactory<T> encoderFactory, FailableFunction<? super T,?,PipeException> featureSelectorFunction, int shardCount, File folder, FileWriteOptions writeOptions)ConstructorSharderByHashPipe(Pipe<T> input, EncoderFactory<T> encoderFactory, FailableFunction<? super T,?,PipeException> featureSelectorFunction, Function<Integer,String> fileNameFunction, int shardCount, File folder, FileWriteOptions writeOptions)Constructor
-
Method Summary
-
Methods inherited from class org.pipecraft.pipes.terminal.SharderByItemPipe
close, getShardSizes, start
-
Methods inherited from class org.pipecraft.pipes.terminal.TerminalPipe
getProgress
-
-
-
-
Constructor Detail
-
SharderByHashPipe
public SharderByHashPipe(Pipe<T> input, EncoderFactory<T> encoderFactory, FailableFunction<? super T,?,PipeException> featureSelectorFunction, Function<Integer,String> fileNameFunction, int shardCount, File folder, FileWriteOptions writeOptions)
Constructor- Parameters:
input- The input pipeencoderFactory- The encoder factory to use for writing items into the different shardsfeatureSelectorFunction- Given an item, selects some feature from it to be hashed and used for shard selection. Must not return null for any non null input!fileNameFunction- Given a shard id, returns the file corresponding file nameshardCount- The required number of shards.folder- The folder where to place all shards. Must exist. The files will be named according to fileNameFunction.writeOptions- Specify how the shard files should be written
-
SharderByHashPipe
public SharderByHashPipe(Pipe<T> input, EncoderFactory<T> encoderFactory, FailableFunction<? super T,?,PipeException> featureSelectorFunction, int shardCount, File folder, FileWriteOptions writeOptions)
Constructor- Parameters:
input- The input pipeencoderFactory- The encoder factory to use for writing items into the different shardsfeatureSelectorFunction- Given an item, selects some feature from it to be hashed and used for shard selection. Must not return null for any non null input!shardCount- The required number of shards.folder- The folder where to place all shards. Must exist. The files will be named "0","1","2"...shardCount-1writeOptions- Specify how the shard files should be written
-
-