Skip to content

Simple Python package to compute TDigests, implemented in Rust

License

Notifications You must be signed in to change notification settings

G-Research/tdigest-rs

This branch is 1 commit ahead of main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b5e640d · Mar 3, 2025

History

18 Commits
Mar 21, 2024
Aug 1, 2024
Jan 3, 2025
Mar 21, 2024
Mar 21, 2024
Mar 21, 2024
Mar 21, 2024
Mar 3, 2025
Mar 3, 2025
Mar 21, 2024
Nov 24, 2024
Mar 21, 2024

Repository files navigation

TDigest-rs

PyPi Latest Release

Simple Python package to compute TDigests, implemented in Rust.

Introduction

TDigest-rs is a Python library with a Rust backend that implements the T-Digest algorithm, enhancing the estimation of quantiles in streaming data. For an in-depth exploration of the T-Digest algorithm, refer to Ted Dunning and Otmar Ertl's paper and the G-Research blog post.

Usage

pip install tdigest-rs

The library contains a single TDigest class.

Creating a TDigest object

from tdigest_rs import TDigest

# Fit a TDigest from a numpy array (float32 or float64)
arr = np.random.randn(1000)
tdigest = TDigest.from_array(arr=arr, delta=100.0)  # delta is optional and defaults to 300.0
print(tdigest.means, tdigest.weights)

# Create directly from means and weights arrays
vals = np.random.randn(1000).astype(np.float32)
weights = np.ones(1000).astype(np.uint32)
tdigest = TDigest.from_means_weights(arr=vals, weights=weights)

Computing quantiles

# Compute a quantile
tdigest.quantile(0.1)

# Compute median
tdigest.median()

# Compute trimmed mean
tdigest.trimmed_mean(lower=0.05, upper=0.95)

Merging TDigests

arr1 = np.random.randn(1000)
arr2 = np.ones(1000)
digest1 = TDigest.from_array(arr=arr1)
digest2 = TDigest.from_array(arr=arr2)

merged_digest = digest1.merge(digest2, delta=100.0)  # delta again defaults to 300.0

Updating TDigests

arr = np.random.randn(1000)
digest = TDigest.from_array(arr=arr1)

# Buffer data points before updating
buffer = np.random.randn(1)
digest = digest.update(buffer, delta=300.0, merge_delta=100.0)

Serialising TDigests

The TDigest object can be converted to a dictionary and JSON-serialised and is also pickleable.

# Convert and load to/from a python dict
d = tdigest.to_dict()
loaded_digest = TDigest.from_dict(d)

# Pickle a digest
import pickle

pickle.dumps(tdigest)

Development workflow

pip install hatch

cd bindings/python

# Run linters
hatch run dev:lint

# Run tests
hatch run dev:test

# Run benchmark
hatch run dev:benchmark

# Format code
hatch run dev:format

Contributing

Please read our contributing guide and code of conduct if you'd like to contribute to the project.

Community Guidelines

Please read our code of conduct before participating in or contributing to this project.

Security

Please see our security policy for details on reporting security vulnerabilities.

License

TDigest-rs is licensed under the Apache Software License 2.0 (Apache-2.0)