Class QuantileDigest

  • All Implemented Interfaces:
    ConcurrentQuantileEstimator

    public class QuantileDigest
    extends Object
    implements ConcurrentQuantileEstimator
    A synchronized wrapper for com.tdunning.math.stats.MergingDigest which is an implementation of t-digests. The t-digest data structure is used for accurate quantile and cdf estimation.

    All read operations on the wrapped MergingDigest (size(), quantile(), byteSize(), ...) change its internal state arnd are also synchronized.

    The digest compression factor is the size/accuracy tradeoff parameter of t-digests. The memory requirements are Θ(compression). Empirical results show a value of 100 is pretty good. The average serialized size of a digest with compression 100 is less than 1 KB.

    Author:
    Mojtaba Kohram
    See Also:
    https://github.com/tdunning/t-digest, https://arxiv.org/abs/1902.04023
    • Constructor Detail

      • QuantileDigest

        public QuantileDigest​(double compression)
        Constructor
        Parameters:
        compression - the compression factor
    • Method Detail

      • wrap

        public static QuantileDigest wrap​(com.tdunning.math.stats.MergingDigest digest)
        Wrap a MergingDigest in order to make it thread safe. The resulting object is independent of the original MergingDigest and could be thought of as a copy of the original digest. Could display erratic behaviour if the MergingDigest is modified while wrapping is in progress.
        Parameters:
        digest - the digest to wrap
        Returns:
        the synchronized merging digest
      • fromBytes

        public static QuantileDigest fromBytes​(ByteBuffer buff)
        Initialize from a serialized buffer.
        Parameters:
        buff - the buffer to construct from
        Returns:
        the synchronized merging digest
      • getCompression

        public double getCompression()
      • add

        public void add​(List<QuantileDigest> others)
        Merge a list of QuantileDigest into this digest.
        Parameters:
        others - the digests to merge into this digest
      • add

        public void add​(QuantileDigest other)
        Merge another QuantileDigest into this digest.
        Parameters:
        other - the other digest to merge
      • asReadOnlyBuffer

        public ByteBuffer asReadOnlyBuffer()
        Serialize this object to a read-only ByteBuffer.
        Returns:
        this object serialized to a read-only ByteBuffer
      • reset

        public QuantileDigest reset()
        Resets the digest to size zero and returns the final state prior to resetting.
        Returns:
        final state of this object prior to resetting
      • compression

        public double compression()
        Compression factor of this digest.
        Returns:
        the compression factor
      • byteSize

        public int byteSize()
        Returns the byte size of the wrapped MergingDigest.
        Returns:
        the byte size of the wrapped MergingDigest
      • size

        public long size()
        Returns the number of samples added to the current Estimator.
        Specified by:
        size in interface ConcurrentQuantileEstimator
        Returns:
        the number of samples currently added
      • cdf

        public double cdf​(double x)
        Get an estimate for the cdf of the distribution at x. This function could be expensive depending on implementation. Consider using ConcurrentQuantileEstimator.cdf(List) when querying more than one cdf value.
        Specified by:
        cdf in interface ConcurrentQuantileEstimator
        Parameters:
        x - the value to get cdf at
        Returns:
        the estimated cumulative distribution function value at x, always between 0 and 1
      • cdf

        public List<Double> cdf​(List<Double> coords)
        Get an estimate for the cdf of the distribution at every coordinate of input list.
        Specified by:
        cdf in interface ConcurrentQuantileEstimator
        Parameters:
        coords - the list of coordinates to compute the cdf at
        Returns:
        the estimated cumulative distribution function value for every element in coordinates, in order, results are always between 0 and 1
      • quantile

        public double quantile​(double q)
        Get an estimate of the quantile at q. This function could be expensive depending on implementation. Consider using ConcurrentQuantileEstimator.quantile(List) when querying more than one quantile value.
        Specified by:
        quantile in interface ConcurrentQuantileEstimator
        Parameters:
        q - quantile to query, must be between 0 and 1
        Returns:
        the estimated quantile value at q
      • quantile

        public List<Double> quantile​(List<Double> qs)
        Get an estimated quantile for every value in the input list.
        Specified by:
        quantile in interface ConcurrentQuantileEstimator
        Parameters:
        qs - list of quantiles to query, must all be between 0 and 1
        Returns:
        the estimated quantile value for every element in quantileList, in order
      • tryAdd

        public boolean tryAdd​(double x,
                              int w)
        Attempts to add a weighted sample to this estimator. Returns false if a lock is held by another thread.
        Specified by:
        tryAdd in interface ConcurrentQuantileEstimator
        Parameters:
        x - data to add
        w - weight
        Returns:
        false if this object's lock is held by another thread, true otherwise
      • add

        public void add​(double x,
                        int w)
        Adds a weighted sample to the digest. Locks the object until data is added. For non-blocking adds, see tryAdd(double, int).
        Specified by:
        add in interface ConcurrentQuantileEstimator
        Parameters:
        x - data to add
        w - weights