Class IntermediateSharderBySeqPipe<T>

    • Constructor Detail

      • IntermediateSharderBySeqPipe

        public IntermediateSharderBySeqPipe​(Pipe<T> input,
                                            EncoderFactory<? super T> encoderFactory,
                                            FailableFunction<? super T,​String,​PipeException> shardSelectorFunction,
                                            File folder,
                                            FileWriteOptions fileWriteOptions)
        Constructor
        Parameters:
        input - The input pipe
        encoderFactory - The encoder factory to use for writing items into the different shards
        shardSelectorFunction - Given an item, selects the corresponding shard id. Files will use this id as a name. May be stateful. Must not return null for any non null input!
        folder - The folder where to place all shards. Must exist.
        fileWriteOptions - Define how files should be written
      • IntermediateSharderBySeqPipe

        public IntermediateSharderBySeqPipe​(Pipe<T> input,
                                            EncoderFactory<? super T> encoderFactory,
                                            FailableFunction<? super T,​String,​PipeException> shardSelectorFunction,
                                            File folder)
        Constructor Uses default file write options
        Parameters:
        input - The input pipe
        encoderFactory - The encoder factory to use for writing items into the different shards
        shardSelectorFunction - Given an item, selects the corresponding shard id. Files will use this id as a name. Must not return null for any non null input!
        folder - The folder where to place all shards. Must exist.
    • Method Detail

      • start

        public void start()
                   throws PipeException,
                          InterruptedException
        Description copied from interface: BasePipe
        Performs pre-processing prior to item flow throw the pipe. Implementations must call the same method for all their input pipes before accessing their items. This is typically done here.
        Specified by:
        start in interface BasePipe
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream.
        InterruptedException - In case that the operation has been interrupted by another thread.
      • next

        public T next()
               throws PipeException,
                      InterruptedException
        Specified by:
        next in interface Pipe<T>
        Returns:
        The next item in this pipe output, or null if the output end has been reached. May be a blocking operation.
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream while trying to prepare next item to return.
        InterruptedException - In case that the operation has been interrupted by another thread.
      • peek

        public T peek()
        Specified by:
        peek in interface Pipe<T>
        Returns:
        The next item in the pipe's output. Does not remove it, so next call to next() will return it.
      • getShardSizes

        public Map<String,​Integer> getShardSizes()
        Returns:
        The counts of items written to each shard. Call this method only after the pipe terminates.
      • getProgress

        public float getProgress()
        Specified by:
        getProgress in interface BasePipe
        Returns:
        The pipe flow progress, as a floating number between 0.0 and 1.0. Important implementation rules: 1) Calling this method before start() call is complete isn't allowed and has an undefined behavior. 2) Implementation should do best effort to provide an estimate of the progress this pipe has made (0.0 - 1.0) 3) When the pipe is fully consumed, getProgress() should return 1.0. 4) Results must be monotonous, i.e. results of consecutive calls may never be decreasing. 5) Thread safety: progress may be maintained by some thread/s but monitoring by other threads. Implementations must be thread safe.