Class SlidingWindowQuantileDigest

  • All Implemented Interfaces:
    ConcurrentQuantileEstimator

    public class SlidingWindowQuantileDigest
    extends Object
    implements ConcurrentQuantileEstimator
    High throughput, thread safe, quantile and cdf estimation with a sliding window of time.

    This structure maintains a live digest and a fixed capacity journal. The live digest deals with all add() operations. It is then periodically appended as an entry to the journal and reset. If the journal is at capacity, the oldest entry is removed to make way for the new one. The size of the journal determines the length of the sliding window.

    length_of_sliding_window_in_milliseconds = capacity * journalingIntervalMillis

    To improve throughput of the add operation, the live digest is a MultiQuantileDigest which maintains multiple (liveDigestCount) live digests simultaneously and routes add() requests to the different digests.

    Example:

    • journalingIntervalMillis: 60000
    • capacity: 120
    • liveDigestCount: 10

    The above specs would journal every minute (60000 milliseconds) for the last 120 minutes. Essentially only retaining data for the last 2 hours. Calling the quantile function would give data for the complete 2 hour time period. If we required quantiles for the last 30 minutes, we would call summarize(30).quantile().

    The liveDigestCount of 10 determines that the MultiQuantileDigest maintains 10 live digests simultaneously. Upon querying (quantile, cdf, ...), the MultiQuantileDigest along with the journal are summarized into one single QuantileDigest, which then responds to all quantile and cdf queries.

    At journaling time, the 10 live digests are merged and appended as 1 entry to the journal.

    Author:
    Mojtaba Kohram
    • Constructor Detail

      • SlidingWindowQuantileDigest

        public SlidingWindowQuantileDigest​(int liveDigestCount,
                                           int journalingIntervalMillis,
                                           int capacity,
                                           ScheduledExecutorService executorService)
        Instantiate with default compression and default compressionInflation values
        Parameters:
        liveDigestCount - the number of live digests to maintain
        journalingIntervalMillis - interval between two journaling operations in milliseconds, equivalent to the resolution of the sliding window, set this to -1 to disable automated journaling
        capacity - capacity of the journal
        executorService - the executor to schedule the journaling task to
      • SlidingWindowQuantileDigest

        public SlidingWindowQuantileDigest​(int liveDigestCount,
                                           double compression,
                                           int journalingIntervalMillis,
                                           int capacity,
                                           ScheduledExecutorService executorService)
        Instantiate with default compressionInflation value
        Parameters:
        liveDigestCount - the number of live digests to maintain
        compression - the compression factor of the final digest
        journalingIntervalMillis - interval between two journaling operations in milliseconds, equivalent to the resolution of the sliding window, set this to -1 to disable automated journaling
        capacity - capacity of the journal
        executorService - the executor to schedule the journaling task to
      • SlidingWindowQuantileDigest

        public SlidingWindowQuantileDigest​(int liveDigestCount,
                                           double compression,
                                           double compressionInflationMultiplier,
                                           int journalingIntervalMillis,
                                           int capacity,
                                           ScheduledExecutorService executorService)
        Fully specified constructor
        Parameters:
        liveDigestCount - the number of live digests to maintain
        compression - the compression factor of the final digest
        compressionInflationMultiplier - the compression inflation multiplier, see: MultiQuantileDigest.getCompressionInflation()
        journalingIntervalMillis - interval between two journaling operations in milliseconds, equivalent to the resolution of the sliding window, set this to -1 to disable automated journaling
        capacity - capacity of the journal
        executorService - the executor to schedule the journaling task to
    • Method Detail

      • getCompression

        public double getCompression()
        Get the compression factor. gi
        Returns:
        the compression factor
      • getJournalingIntervalMillis

        public int getJournalingIntervalMillis()
        Gets journaling interval. Returns -1 if automated journaling is disabled.
        Returns:
        the journaling interval in milliseconds
      • getCapacity

        public int getCapacity()
        Gets the journal capacity.
        Returns:
        the capacity of the journal
      • summarize

        public QuantileDigest summarize()
        Get an aggregate view of the current state of this object as a QuantileDigest. The returned QuantileDigest is independent of this object and the caller is free to modify the returned digest.
        Returns:
        the digest representing all the data currently consumed by this object
      • summarize

        public QuantileDigest summarize​(int lookback)
        Get an aggregate view of the current state of the lookback most recent journal entries as a QuantileDigest. In other words, journal entries are added in reverse chronological order. The returned QuantileDigest is independent of this object and the caller is free to modify the returned digest.
        Parameters:
        lookback - the number of journal entries to add to the summary, must be less than the journal capacity
        Returns:
        the digest representing lookback journal entries
      • publishToJournal

        public void publishToJournal()
        Squash live digest(s) and add to journal as latest entry. If journal is at capacity removes oldest entry.
      • quantile

        public double quantile​(double q)
        Get an estimate of the quantile at q. This function could be expensive depending on implementation. Consider using ConcurrentQuantileEstimator.quantile(List) when querying more than one quantile value.
        Specified by:
        quantile in interface ConcurrentQuantileEstimator
        Parameters:
        q - quantile to query, must be between 0 and 1
        Returns:
        the estimated quantile value at q
      • quantile

        public List<Double> quantile​(List<Double> qs)
        Get an estimated quantile for every value in the input list.
        Specified by:
        quantile in interface ConcurrentQuantileEstimator
        Parameters:
        qs - list of quantiles to query, must all be between 0 and 1
        Returns:
        the estimated quantile value for every element in quantileList, in order
      • size

        public long size()
        Returns the number of samples added to the current Estimator.
        Specified by:
        size in interface ConcurrentQuantileEstimator
        Returns:
        the number of samples currently added
      • cdf

        public double cdf​(double x)
        Get an estimate for the cdf of the distribution at x. This function could be expensive depending on implementation. Consider using ConcurrentQuantileEstimator.cdf(List) when querying more than one cdf value.
        Specified by:
        cdf in interface ConcurrentQuantileEstimator
        Parameters:
        x - the value to get cdf at
        Returns:
        the estimated cumulative distribution function value at x, always between 0 and 1
      • cdf

        public List<Double> cdf​(List<Double> coords)
        Get an estimate for the cdf of the distribution at every coordinate of input list.
        Specified by:
        cdf in interface ConcurrentQuantileEstimator
        Parameters:
        coords - the list of coordinates to compute the cdf at
        Returns:
        the estimated cumulative distribution function value for every element in coordinates, in order, results are always between 0 and 1
      • tryAdd

        public boolean tryAdd​(double x,
                              int w)
        Attempts to add a weighted sample to this estimator. Returns false if a lock is held by another thread.
        Specified by:
        tryAdd in interface ConcurrentQuantileEstimator
        Parameters:
        x - data to add
        w - weight
        Returns:
        false if this object's lock is held by another thread, true otherwise