Class AsyncEnqueuingSharderPipe<T>

  • Type Parameters:
    T - The items data type
    All Implemented Interfaces:
    Closeable, AutoCloseable, BasePipe

    public class AsyncEnqueuingSharderPipe<T>
    extends TerminalPipe
    A terminal pipe that receives an async pipe as input, and shards the contents of the input pipe into multiple queues according to some sharding criteria based on item values. In case of relatively few shards this option is a good alternative to AsyncSharderPipe (when used as an intermediate step), because it doesn't involve disk IO. In case of errors, the start() method unblocks and exits with the exception, as required by the spec. In addition, queue consumers will read a special error marker placed by this class. Similarly, completion is reported by sending a completion marker to all queues. The implementation allows calling close() by any thread after start() has been invoked. In case of a premature close, no markers (error/success) are sent to the output queues, meaning that the caller is responsible for releasing them. Caveats: 1. This implementation fills multiple queues, so it's recommended to use bounded queues and be aware of their total memory consumption. 2. In order to prevent a deadlock, the caller should make sure to not start queue consumers and the start() method by the same thread. Alternatively, one can use the asyncStart() method. 3. Queue consumers should not try to drain some queues before others using blocking calls, since it will result in a deadlock. 4. Queue consumers should be aware of the reserved error and successful completion markers, and handle them differently than a standard item.
    Author:
    Eyal Schneider
    • Constructor Detail

      • AsyncEnqueuingSharderPipe

        public AsyncEnqueuingSharderPipe​(AsyncPipe<T> input,
                                         List<? extends BlockingQueue<T>> queues,
                                         Function<? super T,​Integer> selectorFunction,
                                         T successMarker,
                                         T errorMarker)
        Constructor
        Parameters:
        input - The input pipe
        queues - The queues to write to. The order indicates their identities used by the selector function.
        selectorFunction - Given an item, selects the index of the queue to write the item to. Must return an integer between 0 and queues.size() - 1.
        successMarker - A special (reserved reference) item value used for indicating a successful completion to queue consumers
        errorMarker - A special (reserved reference) item value used for indicating an error to queue consumers.
      • AsyncEnqueuingSharderPipe

        public AsyncEnqueuingSharderPipe​(AsyncPipe<T> input,
                                         List<? extends BlockingQueue<T>> queues,
                                         T successMarker,
                                         T errorMarker)
        Constructor Uses hash based sharding into queues
        Parameters:
        input - The input pipe
        queues - The queues to write to. The order indicates their identities used by the selector function.
        successMarker - A special (reserved reference) item value used for indicating a successful completion to queue consumers
        errorMarker - A special (reserved reference) item value used for indicating an error to queue consumers.
    • Method Detail

      • start

        public void start()
                   throws PipeException,
                          InterruptedException
        Description copied from interface: BasePipe
        Performs pre-processing prior to item flow throw the pipe. Implementations must call the same method for all their input pipes before accessing their items. This is typically done here.
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream.
        InterruptedException - In case that the operation has been interrupted by another thread.
      • asyncStart

        public Future<Void> asyncStart()
        A special async version of the standard start() method. The caller may run this method before starting the queue consumers or vice versa, without risking with deadlock. The returned future can be used to detect pipe completion, and to get the exception, if any.
        Returns:
        the future representing the completion of this terminal pipe
      • getShardSizes

        public int[] getShardSizes()
        Returns:
        The counts of items written to each shard, as an array. Item i corresponds to queue #i. Call this method only after start() has been called and completed successfully.
      • getProgress

        public float getProgress()
        Specified by:
        getProgress in interface BasePipe
        Overrides:
        getProgress in class TerminalPipe
        Returns:
        The pipe flow progress, as a floating number between 0.0 and 1.0. Important implementation rules: 1) Calling this method before start() call is complete isn't allowed and has an undefined behavior. 2) Implementation should do best effort to provide an estimate of the progress this pipe has made (0.0 - 1.0) 3) When the pipe is fully consumed, getProgress() should return 1.0. 4) Results must be monotonous, i.e. results of consecutive calls may never be decreasing. 5) Thread safety: progress may be maintained by some thread/s but monitoring by other threads. Implementations must be thread safe.