Class HashJoinPipe<K,L,R>
- java.lang.Object
-
- org.pipecraft.pipes.sync.inter.CompoundPipe<JoinRecord<K,L,R>>
-
- org.pipecraft.pipes.sync.inter.join.HashJoinPipe<K,L,R>
-
- Type Parameters:
K- The type of the key used for matching recordsL- The type of left side recordsR- The type of right side records
- All Implemented Interfaces:
Closeable,AutoCloseable,BasePipe,Pipe<JoinRecord<K,L,R>>
public class HashJoinPipe<K,L,R> extends CompoundPipe<JoinRecord<K,L,R>>
A pipe performing a join operation between a 'left' pipe of type L, and a list of 'right' pipes of type R. In contrast toSortedJoinPipe, left and right pipes don't have to be ordered. This pipe uses a grace-hash-join approach, and requires the caller to be careful with the data partitioning definitions, in order to prevent OOM errors. Duplicates are allowed. The output type for this pipe isJoinRecord, which consists of the key, the left matches and the right matches. The join can work in LEFT/INNER/FULL_INNER/OUTER mode. SeeJoinModefor more details.- Author:
- Eyal Schneider
-
-
Constructor Summary
Constructors Constructor Description HashJoinPipe(List<? extends Pipe<R>> rightPipes, FailableFunction<R,K,PipeException> rightKeyExtractor, int partitionCount, CodecFactory<R> rightCodec, File tmpFolder)Constructor A constructor for the case of no left pipe.HashJoinPipe(Pipe<L> leftPipe, FailableFunction<L,K,PipeException> leftKeyExtractor, List<? extends Pipe<R>> rightPipes, FailableFunction<R,K,PipeException> rightKeyExtractor, JoinMode joinMode, int partitionCount, CodecFactory<L> leftCodec, CodecFactory<R> rightCodec, File tmpFolder)ConstructorHashJoinPipe(Pipe<L> leftPipe, FailableFunction<L,K,PipeException> leftKeyExtractor, Pipe<R> rightPipe, FailableFunction<R,K,PipeException> rightKeyExtractor, JoinMode joinMode, int partitionCount, CodecFactory<L> leftCodec, CodecFactory<R> rightCodec, File tmpFolder)Constructor To be used when there's a single right pipe.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()protected Pipe<JoinRecord<K,L,R>>createPipeline()-
Methods inherited from class org.pipecraft.pipes.sync.inter.CompoundPipe
getProgress, next, peek, start
-
-
-
-
Constructor Detail
-
HashJoinPipe
public HashJoinPipe(Pipe<L> leftPipe, FailableFunction<L,K,PipeException> leftKeyExtractor, List<? extends Pipe<R>> rightPipes, FailableFunction<R,K,PipeException> rightKeyExtractor, JoinMode joinMode, int partitionCount, CodecFactory<L> leftCodec, CodecFactory<R> rightCodec, File tmpFolder)
Constructor- Parameters:
leftPipe- The left side pipe in the join operationleftKeyExtractor- The extractor of the key from the data type of the left piperightPipes- The list of right side pipes. The order is important, and determines the ids given to the pipes in the iteration outputs (SeeJoinRecord).rightKeyExtractor- The extractor of the key from the data type of the right pipesjoinMode- The policy for performing the join. SeeJoinMode.partitionCount- The number of partitions every pipe should be split into. Assuming a good hash function on item keys, the caller can assume the partitions are more-less even in size. In the worst case, the same partition of all pipes will be loaded into memory at the same time. This number determines the amount of memory to be used, so it should be determined with caution.leftCodec- An encoder/decoder factory for items of left pipe. Used for intermediate storage needed for the hash join.rightCodec- An encoder/decoder factory for items of right pipe. Used for intermediate storage needed for the hash join.tmpFolder- The folder to use for temporary storage of pipe contents
-
HashJoinPipe
public HashJoinPipe(Pipe<L> leftPipe, FailableFunction<L,K,PipeException> leftKeyExtractor, Pipe<R> rightPipe, FailableFunction<R,K,PipeException> rightKeyExtractor, JoinMode joinMode, int partitionCount, CodecFactory<L> leftCodec, CodecFactory<R> rightCodec, File tmpFolder)
Constructor To be used when there's a single right pipe.- Parameters:
leftPipe- The left side pipe in the join operationleftKeyExtractor- The extractor of the key from the data type of the left piperightPipe- The right side piperightKeyExtractor- The extractor of the key from the data type of the right pipesjoinMode- The policy for performing the join. SeeJoinMode.partitionCount- The number of partitions every pipe should be split into. Assuming a good hash function on item keys, the caller can assume the partitions are more-less even in size. In the worst case, the same partition of all pipes will be loaded into memory at the same time. This number determines the amount of memory to be used, so it should be determined with caution.leftCodec- An encoder/decoder factory for items of left pipe. Used for intermediate storage needed for the hash join.rightCodec- An encoder/decoder factory for items of right pipes. Used for intermediate storage needed for the hash join.tmpFolder- The folder to use for temporary storage of pipe contents
-
HashJoinPipe
public HashJoinPipe(List<? extends Pipe<R>> rightPipes, FailableFunction<R,K,PipeException> rightKeyExtractor, int partitionCount, CodecFactory<R> rightCodec, File tmpFolder)
Constructor A constructor for the case of no left pipe. Assumes join type OUTER among the right pipes.- Parameters:
rightPipes- The list of right side pipes. The order is important, and determines the ids given to the pipes in the iteration outputs (SeeJoinRecord).rightKeyExtractor- The extractor of the key from the data type of the right pipespartitionCount- The number of partitions every pipe should be split into. Assuming a good hash function on item keys, the caller can assume the partitions are more-less even in size. In the worst case, the same partition of all pipes will be loaded into memory at the same time. This number determines the amount of memory to be used, so it should be determined with caution.rightCodec- An encoder/decoder factory for items of right pipes. Used for intermediate storage needed for the hash join.tmpFolder- The folder to use for temporary storage of pipe contents
-
-
Method Detail
-
createPipeline
protected Pipe<JoinRecord<K,L,R>> createPipeline() throws PipeException, InterruptedException
- Specified by:
createPipelinein classCompoundPipe<JoinRecord<K,L,R>>- Returns:
- A new pipeline to represent the logic of this pipe
- Throws:
PipeException- In case of a pipeline creation errorInterruptedException- In case that the thread is interrupted
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classCompoundPipe<JoinRecord<K,L,R>>- Throws:
IOException
-
-