Class HashReductorPipe<I,O>
- java.lang.Object
-
- org.pipecraft.pipes.sync.inter.CompoundPipe<O>
-
- org.pipecraft.pipes.sync.inter.reduct.HashReductorPipe<I,O>
-
- Type Parameters:
I- The data type of items in the input pipeO- The data type of output items
- All Implemented Interfaces:
Closeable,AutoCloseable,BasePipe,Pipe<O>
- Direct Known Subclasses:
DedupPipe
public class HashReductorPipe<I,O> extends CompoundPipe<O>
Scans the input pipe and performs a reduction operation on families of items based on some discriminating property. The discrimination logic and reduction logic are provided by the caller. UnlikeSequenceReductorPipethe input need not be sorted in order for it to work properly. This pipe makes use of temporary disk space. The configuration of partitionCount is critical for bounding memory usage. The more partitions are used, the less memory is required. It's recommended to set this number to {estimated total input data volume} / {max memory allowed for this pipe to use}.- Author:
- Eyal Schneider
-
-
Constructor Summary
Constructors Constructor Description HashReductorPipe(Pipe<I> input, CodecFactory<I> inputCodec, int partitionCount, File tmpFolder, ReductorConfig<I,?,?,O> reductorConfig)Constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()protected Pipe<O>createPipeline()-
Methods inherited from class org.pipecraft.pipes.sync.inter.CompoundPipe
getProgress, next, peek, start
-
-
-
-
Constructor Detail
-
HashReductorPipe
public HashReductorPipe(Pipe<I> input, CodecFactory<I> inputCodec, int partitionCount, File tmpFolder, ReductorConfig<I,?,?,O> reductorConfig)
Constructor- Parameters:
input- The input pipe to wrapinputCodec- A codec allowing writing/reading input recordspartitionCount- The number of partitions to split input into. Assuming a good hash function on item keys, and assuming that the families defined by the discriminator are even in size, the caller can assume the partitions are more-less balanced in size. This number determines the amount of memory to be used, so it should be defined with caution. The more partitions are used, the less total memory is required. However, note that for each partition the class maintains an open file on disk.tmpFolder- The folder where to store temporary datareductorConfig- The reduction configuration
-
-
Method Detail
-
createPipeline
protected Pipe<O> createPipeline() throws PipeException, InterruptedException
- Specified by:
createPipelinein classCompoundPipe<O>- Returns:
- A new pipeline to represent the logic of this pipe
- Throws:
PipeException- In case of a pipeline creation errorInterruptedException- In case that the thread is interrupted
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classCompoundPipe<O>- Throws:
IOException
-
-