Skip to content

Stable Genius

Latest
Compare
Choose a tag to compare
@hrbrmstr hrbrmstr released this 01 Aug 11:18
· 10 commits to master since this release
2143fd4

tdigest

Wicked Fast, Accurate Quantiles Using ‘t-Digests’

Description

The t-Digest construction algorithm uses a variant of 1-dimensional
k-means clustering to produce a very compact data structure that allows
accurate estimation of quantiles. This t-Digest data structure can be
used to estimate quantiles, compute other rank statistics or even to
estimate related measures like trimmed means. The advantage of the
t-Digest over previous digests for this purpose is that the t-Digest
handles data with full floating point resolution. The accuracy of
quantile estimates produced by t-Digests can be orders of magnitude more
accurate than those produced by previous digest algorithms. Methods are
provided to create and update t-Digests and retreive quantiles from the
accumulated distributions.

See the original paper by Ted Dunning & Otmar
Ertl
for more details on t-Digests.

What’s Inside The Tin

The following functions are implemented:

  • td_add: Add a value to the t-Digest with the specified count
  • td_create: Allocate a new histogram
  • td_merge: Merge one t-Digest into another
  • td_quantile_of: Return the quantile of the value
  • td_total_count: Total items contained in the t-Digest
  • td_value_at: Return the value at the specified quantile
  • tquantile: Calculate sample quantiles from a t-Digest