Class SortPipe<T>

  • Type Parameters:
    T - The items data type
    All Implemented Interfaces:
    Closeable, AutoCloseable, BasePipe, Pipe<T>

    public class SortPipe<T>
    extends Object
    implements Pipe<T>
    An intermediate pipe performing a sort on an input pipe, without deduping. Uses a limit of number of items that can be handled at once in memory. When the input file is larger, local disk is used for storing and reading intermediate results.
    Author:
    Eyal Schneider
    • Constructor Detail

      • SortPipe

        public SortPipe​(Pipe<T> input,
                        int maxItemsInMemory,
                        File tmpFolder,
                        EncoderFactory<? super T> encoderFactory,
                        DecoderFactory<T> decoderFactory,
                        Comparator<T> comparator,
                        Compression tempFilesCompression)
        Constructor
        Parameters:
        input - The input pipe
        maxItemsInMemory - The maximum number of items to be accumulated in memory at once.
        tmpFolder - The folder where to store intermediate results if needed. Temp files are deleted automatically by this pipe.
        encoderFactory - Specifies how to serialize items to disk (in case temp disk space is required)
        decoderFactory - Specifies how to deserialize items from disk (in case temp disk space is required).
        comparator - The comparator defining order relation on type T.
        tempFilesCompression - What compression should be used for temp files
      • SortPipe

        public SortPipe​(Pipe<T> input,
                        int maxItemsInMemory,
                        File tmpFolder,
                        EncoderFactory<? super T> encoderFactory,
                        DecoderFactory<T> decoderFactory,
                        Comparator<T> comparator)
        Constructor Uses compression on temp files.
        Parameters:
        input - The input pipe
        maxItemsInMemory - The maximum number of items to be accumulated in memory at once.
        tmpFolder - The folder where to store intermediate results if needed. Temp files are deleted automatically by this pipe.
        encoderFactory - Specifies how to serialize items to disk (in case temp disk space is required)
        decoderFactory - Specifies how to deserialize items from disk (in case temp disk space is required).
        comparator - The comparator defining order relation on type T
      • SortPipe

        public SortPipe​(Pipe<T> input,
                        int maxItemsInMemory,
                        File tmpFolder,
                        CodecFactory<T> codecFactory,
                        Comparator<T> comparator)
        Constructor Uses compression on temp files.
        Parameters:
        input - The input pipe
        maxItemsInMemory - The maximum number of items to be accumulated in memory at once.
        tmpFolder - The folder where to store intermediate results if needed. Temp files are deleted automatically by this pipe.
        codecFactory - Specifies how to serialize/deserialize items to/from disk (in case temp disk space is required).
        comparator - The comparator defining order relation on type T
      • SortPipe

        public SortPipe​(Pipe<T> input,
                        int maxItemsInMemory,
                        CodecFactory<T> codecFactory,
                        Comparator<T> comparator)
        Constructor Uses compression on temp files, and uses system default temp folder
        Parameters:
        input - The input pipe
        maxItemsInMemory - The maximum number of items to be accumulated in memory at once.
        codecFactory - Specifies how to serialize/deserialize items to/from disk (in case temp disk space is required).
        comparator - The comparator defining order relation on type T
      • SortPipe

        public SortPipe​(Pipe<T> input,
                        Comparator<T> comparator)
        Constructor Use this constructor for disabling disk usage and sorting in-memory only. To be used with care only when the input pipe's data is expected to fit entirely in memory.
        Parameters:
        input - The input pipe
        comparator - The comparator defining order relation on type T
    • Method Detail

      • next

        public T next()
               throws PipeException,
                      InterruptedException
        Specified by:
        next in interface Pipe<T>
        Returns:
        The next item in this pipe output, or null if the output end has been reached. May be a blocking operation.
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream while trying to prepare next item to return.
        InterruptedException - In case that the operation has been interrupted by another thread.
      • peek

        public T peek()
               throws PipeException
        Specified by:
        peek in interface Pipe<T>
        Returns:
        The next item in the pipe's output. Does not remove it, so next call to next() will return it.
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream while trying to prepare next item to return.
      • start

        public void start()
                   throws PipeException,
                          InterruptedException
        Description copied from interface: BasePipe
        Performs pre-processing prior to item flow throw the pipe. Implementations must call the same method for all their input pipes before accessing their items. This is typically done here.
        Specified by:
        start in interface BasePipe
        Throws:
        PipeException - In case of pipe errors in this pipe or somewhere up-stream.
        InterruptedException - In case that the operation has been interrupted by another thread.
      • getProgress

        public float getProgress()
        Specified by:
        getProgress in interface BasePipe
        Returns:
        The pipe flow progress, as a floating number between 0.0 and 1.0. Important implementation rules: 1) Calling this method before start() call is complete isn't allowed and has an undefined behavior. 2) Implementation should do best effort to provide an estimate of the progress this pipe has made (0.0 - 1.0) 3) When the pipe is fully consumed, getProgress() should return 1.0. 4) Results must be monotonous, i.e. results of consecutive calls may never be decreasing. 5) Thread safety: progress may be maintained by some thread/s but monitoring by other threads. Implementations must be thread safe.