Class MultiQuantileDigest

  • All Implemented Interfaces:
    ConcurrentQuantileEstimator

    public class MultiQuantileDigest
    extends Object
    implements ConcurrentQuantileEstimator
    Thread Safe high-throughput quantile and cdf estimation.

    Maintains multiple (digestCount) QuantileDigests and responds to quantile queries by merging the digests into a single digest and querying the merged (i.e. summarized) digest. This increases the throughput of the QuantileDigest by routing add operations to the different digests. A rule thumb for setting digestCount:

    no_of_write_threads < digestCount < 2 * no_of_write_threads

    Most applications would be fine with MUCH smaller numbers than that. Set digestCount = 1 for single threaded apps. With digestCount = 1, this should be equivalent to a QuantileDigest with some overhead.

    When merging multiple digests, it is recommended that the individual digests have a higher compression factor than the final merged digest. This leads to a more accurate merged digest. The compression factor of each of the internal digests is:

    compressionInflation * compression

    The merged digest that responds to queries has a compression factor equal to compression. Empirical results show that a compression value of 100 performs well for most use cases. The average serialized size of each QuantileDigest with compression 100 is less than 1 KB.
    Author:
    Mojtaba Kohram
    • Constructor Detail

      • MultiQuantileDigest

        public MultiQuantileDigest​(int digestCount)
        Constructor with default values
        Parameters:
        digestCount - number of digests
      • MultiQuantileDigest

        public MultiQuantileDigest​(int digestCount,
                                   double compression)
        Constructor with default values
        Parameters:
        digestCount - number of digests to keep
        compression - compression factor of each digest
      • MultiQuantileDigest

        public MultiQuantileDigest​(int digestCount,
                                   double compression,
                                   double compressionInflationMultiplier)
        Fully specified constructor
        Parameters:
        digestCount - number of digests to keep
        compression - compression factor of final digest
        compressionInflationMultiplier - compression factor multiplier
    • Method Detail

      • getCompressionInflation

        public double getCompressionInflation()
        The compression factor of each internal digest is equal to compressionInflation * compression
        Returns:
        the compression inflation factor
      • getCompression

        public double getCompression()
        The compression factor
        Returns:
        the compression factor
      • reset

        public QuantileDigest reset()
        Reset the digest. Returns a QuantileDigest representing the final state of this object prior to reset.
        Returns:
        a digest representing the final state of this object prior to reset
      • summarize

        public QuantileDigest summarize()
        Merge internal digests into a single QuantileDigest. The returned QuantileDigest is independent of this object and the caller is free to modify it.
        Returns:
        the merged digest
      • quantile

        public double quantile​(double q)
        Get an estimate of the quantile at q. This function could be expensive depending on implementation. Consider using ConcurrentQuantileEstimator.quantile(List) when querying more than one quantile value.
        Specified by:
        quantile in interface ConcurrentQuantileEstimator
        Parameters:
        q - quantile to query, must be between 0 and 1
        Returns:
        the estimated quantile value at q
      • quantile

        public List<Double> quantile​(List<Double> qs)
        Get an estimated quantile for every value in the input list.
        Specified by:
        quantile in interface ConcurrentQuantileEstimator
        Parameters:
        qs - list of quantiles to query, must all be between 0 and 1
        Returns:
        the estimated quantile value for every element in quantileList, in order
      • size

        public long size()
        Returns the number of samples added to the current Estimator.
        Specified by:
        size in interface ConcurrentQuantileEstimator
        Returns:
        the number of samples currently added
      • cdf

        public double cdf​(double x)
        Get an estimate for the cdf of the distribution at x. This function could be expensive depending on implementation. Consider using ConcurrentQuantileEstimator.cdf(List) when querying more than one cdf value.
        Specified by:
        cdf in interface ConcurrentQuantileEstimator
        Parameters:
        x - the value to get cdf at
        Returns:
        the estimated cumulative distribution function value at x, always between 0 and 1
      • cdf

        public List<Double> cdf​(List<Double> coords)
        Get an estimate for the cdf of the distribution at every coordinate of input list.
        Specified by:
        cdf in interface ConcurrentQuantileEstimator
        Parameters:
        coords - the list of coordinates to compute the cdf at
        Returns:
        the estimated cumulative distribution function value for every element in coordinates, in order, results are always between 0 and 1
      • tryAdd

        public boolean tryAdd​(double x,
                              int w)
        Attempts to add a weighted sample to this estimator. Returns false if a lock is held by another thread.
        Specified by:
        tryAdd in interface ConcurrentQuantileEstimator
        Parameters:
        x - data to add
        w - weight
        Returns:
        false if this object's lock is held by another thread, true otherwise