Class LookupJoinPipe<K,​L,​R>

  • Type Parameters:
    K - The type of the key used for matching records. Must be suitable to serve at a hash based data structure key.
    L - The type of left side records
    R - The type of right side records
    All Implemented Interfaces:
    Closeable, AutoCloseable, BasePipe, Pipe<JoinRecord<K,​L,​R>>

    public class LookupJoinPipe<K,​L,​R>
    extends Object
    implements Pipe<JoinRecord<K,​L,​R>>
    A pipe performing a join operation between a 'left' pipe of type L, and a list of 'right' pipes of type R. This join is only relevant when we can guarantee that the contents of all the right side pipes can fit entirely in memory. Under these conditions, this pipe will be much more efficient than the HashJoinPipe alternative, which makes use of temporary disk space. Note that this pipe assumes that the left pipe has unique entries (with respect to the join key). In case there are repetitions the join will still work, but will not conform with the standard join semantics of other join pipes (SortedJoinPipe and HashJoinPipe), where each join record covers to all key matches from left side and right side. The current pipe will always produce at most one left side match in each join record. No specific item order in required in the input pipes. The output type for this pipe is JoinRecord, which consists of the key, the left matches and the right matches. The join can work in LEFT/INNER/FULL_INNER/OUTER mode. See JoinMode for more details.
    Author:
    Eyal Schneider
    • Constructor Detail

      • LookupJoinPipe

        public LookupJoinPipe​(Pipe<L> leftPipe,
                              FailableFunction<L,​K,​PipeException> leftKeyExtractor,
                              List<? extends Pipe<R>> rightPipes,
                              FailableFunction<R,​K,​PipeException> rightKeyExtractor,
                              JoinMode joinMode)
        Constructor
        Parameters:
        leftPipe - The left side pipe in the join operation
        leftKeyExtractor - The extractor of the key from the data type of the left pipe
        rightPipes - The list of right side pipes. The order is important, and determines the ids given to the pipes in the iteration outputs (See JoinRecord).
        rightKeyExtractor - The extractor of the key from the data type of the right pipes
        joinMode - The policy for performing the join. See JoinMode.
      • LookupJoinPipe

        public LookupJoinPipe​(Pipe<L> leftPipe,
                              FailableFunction<L,​K,​PipeException> leftKeyExtractor,
                              Pipe<R> rightPipe,
                              FailableFunction<R,​K,​PipeException> rightKeyExtractor,
                              JoinMode joinMode)
        Constructor To be used when there's a single right pipe.
        Parameters:
        leftPipe - The left side pipe in the join operation
        leftKeyExtractor - The extractor of the key from the data type of the left pipe
        rightPipe - The right side pipe
        rightKeyExtractor - The extractor of the key from the data type of the right pipes
        joinMode - The policy for performing the join. See JoinMode.
      • LookupJoinPipe

        public LookupJoinPipe​(List<? extends Pipe<R>> rightPipes,
                              FailableFunction<R,​K,​PipeException> rightKeyExtractor)
        Constructor A constructor for the case of no left pipe. Assumes join type OUTER among the right pipes.
        Parameters:
        rightPipes - The list of right side pipes. The order is important, and determines the ids given to the pipes in the iteration outputs (See JoinRecord).
        rightKeyExtractor - The extractor of the key from the data type of the right pipes
    • Method Detail

      • start

        public void start()
                   throws PipeException,
                          InterruptedException
        Description copied from interface: BasePipe
        Performs pre-processing prior to item flow throw the pipe. Implementations must call the same method for all their input pipes before accessing their items. This is typically done here.
        Specified by:
        start in interface BasePipe
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream.
        InterruptedException - In case that the operation has been interrupted by another thread.
      • next

        public JoinRecord<K,​L,​R> next()
                                           throws PipeException,
                                                  InterruptedException
        Specified by:
        next in interface Pipe<K>
        Returns:
        The next item in this pipe output, or null if the output end has been reached. May be a blocking operation.
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream while trying to prepare next item to return.
        InterruptedException - In case that the operation has been interrupted by another thread.
      • peek

        public JoinRecord<K,​L,​R> peek()
        Specified by:
        peek in interface Pipe<K>
        Returns:
        The next item in the pipe's output. Does not remove it, so next call to next() will return it.
      • getProgress

        public float getProgress()
        Specified by:
        getProgress in interface BasePipe
        Returns:
        The pipe flow progress, as a floating number between 0.0 and 1.0. Important implementation rules: 1) Calling this method before start() call is complete isn't allowed and has an undefined behavior. 2) Implementation should do best effort to provide an estimate of the progress this pipe has made (0.0 - 1.0) 3) When the pipe is fully consumed, getProgress() should return 1.0. 4) Results must be monotonous, i.e. results of consecutive calls may never be decreasing. 5) Thread safety: progress may be maintained by some thread/s but monitoring by other threads. Implementations must be thread safe.