This is an implementation of Ted Dunning's Merging T-Digest.
This implementation batches all inserts, including merges. This is an improvement over the original implementation where merges require sorting of all points on each merge.
This implementation does not support storing the incoming data with each centroid, (since obviously that is for testing).
The T-Digest is a data structure for accurate online computation of quantiles and cumulative distribution functions (CDFs) on streaming data.
#include "tdigest2/TDigest.h"
using namespace tdigest;
int main() {
// Create a T-Digest with compression factor (default: 1000)
// Higher compression = more accuracy but more memory
TDigest digest(1000);
// Add data points
digest.add(1.0);
digest.add(2.0);
digest.add(3.0);
// Or add with weights
digest.add(4.0, 2.5); // value, weight
// Compute quantiles (0.0 to 1.0)
double median = digest.quantile(0.5); // 50th percentile
double p95 = digest.quantile(0.95); // 95th percentile
double p99 = digest.quantile(0.99); // 99th percentile
// Compute cumulative distribution function
double cdf_at_2 = digest.cdf(2.0); // P(X <= 2.0)
return 0;
}TDigest digest1(1000);
TDigest digest2(1000);
// Add data to both digests
digest1.add(1.0);
digest2.add(2.0);
// Merge digest1 into digest2
digest2.merge(&digest1);
// Now digest2 contains data from both
double combined_median = digest2.quantile(0.5);TDigest digest(1000);
// Add many values
for (int i = 0; i < 1000000; i++) {
digest.add(some_value);
}
// Optional: compress to ensure processing
digest.compress();
// Compute quantiles
double q25 = digest.quantile(0.25);
double q50 = digest.quantile(0.50);
double q75 = digest.quantile(0.75);This project uses Bazel for building and testing:
bazel test //...TDigest(compression)- Constructor with compression factor (default: 1000)add(value)- Add a single data pointadd(value, weight)- Add a weighted data pointquantile(q)- Get the q-th quantile (0.0 to 1.0)cdf(x)- Get the cumulative probability P(X <= x)merge(other)- Merge another T-Digest into this onecompress()- Force compression of unprocessed data