Skip to content

High-performance time series downsampling algorithms for visualization

License

Notifications You must be signed in to change notification settings

my1e5/tsdownsample

This branch is 4 commits behind predict-idlab/tsdownsample:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

cd68c99 Β· Jan 2, 2025

History

58 Commits
Jan 2, 2025
Feb 2, 2024
Jun 28, 2023
Jan 2, 2025
Feb 2, 2024
Feb 2, 2024
Jul 21, 2023
Apr 12, 2023
Jan 2, 2025
Nov 22, 2022
Jan 2, 2025
Aug 29, 2024
Jan 2, 2025
Jul 21, 2023

Repository files navigation

tsdownsample

PyPI Latest Release support-version Downloads CodeQL Testing Testing Discord

Extremely fast time series downsampling πŸ“ˆ for visualization, written in Rust.

Features ✨

  • Fast: written in rust with PyO3 bindings
    • leverages optimized argminmax - which is SIMD accelerated with runtime feature detection
    • scales linearly with the number of data points
    • multithreaded with Rayon (in Rust)
      Why we do not use Python multiprocessing Citing the PyO3 docs on parallelism:
      CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing.
      In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with Rayon) with little to no overhead.
  • Efficient: memory efficient
    • works on views of the data (no copies)
    • no intermediate data structures are created
  • Flexible: works on any type of data
    • supported datatypes are
      • for x: f32, f64, i16, i32, i64, u16, u32, u64, datetime64, timedelta64
      • for y: f16, f32, f64, i8, i16, i32, i64, u8, u16, u32, u64, datetime64, timedelta64, bool
      !! πŸš€ f16 argminmax is 200-300x faster than numpy In contrast with all other data types above, f16 is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!!
      🐌 Programming languages facilitate support for this datatype by either (i) upcasting to f32 or (ii) using a software implementation.
      πŸ’‘ As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a symmetrical ordinal mapping from f16 to i16 is sufficient. This mapping allows to use the hardware supported scalar and SIMD i16 instructions - while not producing any memory overhead πŸŽ‰
      More details are described in argminmax PR #1.
  • Easy to use: simple & flexible API

Install

pip install tsdownsample

Usage

from tsdownsample import MinMaxLTTBDownsampler
import numpy as np

# Create a time series
y = np.random.randn(10_000_000)
x = np.arange(len(y))

# Downsample to 1000 points (assuming constant sampling rate)
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)

# Select downsampled data
downsampled_y = y[s_ds]

# Downsample to 1000 points using the (possible irregularly spaced) x-data
s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)

# Select downsampled data
downsampled_x = x[s_ds]
downsampled_y = y[s_ds]

Downsampling algorithms & API

Downsampling API πŸ“‘

Each downsampling algorithm is implemented as a class that implements a downsample method. The signature of the downsample method:

downsample([x], y, n_out, **kwargs) -> ndarray[uint64]

Arguments:

  • x is optional
  • x and y are both positional arguments
  • n_out is a mandatory keyword argument that defines the number of output values*
  • **kwargs are optional keyword arguments (see table below):
    • parallel: whether to use multi-threading (default: False)
      ❗ The max number of threads can be configured with the TSDOWNSAMPLE_MAX_THREADS ENV var (e.g. os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4")
    • ...

Returns: a ndarray[uint64] of indices that can be used to index the original data.

*When there are gaps in the time series, fewer than n_out indices may be returned.

Downsampling algorithms πŸ“ˆ

The following downsampling algorithms (classes) are implemented:

Downsampler Description **kwargs
MinMaxDownsampler selects the min and max value in each bin parallel
M4Downsampler selects the min, max, first and last value in each bin parallel
LTTBDownsampler performs the Largest Triangle Three Buckets algorithm parallel
MinMaxLTTBDownsampler (new two-step algorithm πŸŽ‰) first selects n_out * minmax_ratio min and max values, then further reduces these to n_out values using the Largest Triangle Three Buckets algorithm parallel, minmax_ratio*

*Default value for minmax_ratio is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332

Handling NaNs

This library supports two NaN-policies:

  1. Omit NaNs (NaNs are ignored during downsampling).
  2. Return index of first NaN once there is at least one present in the bin of the considered data.
Omit NaNs Return NaNs
MinMaxDownsampler NaNMinMaxDownsampler
M4Downsampler NaNM4Downsampler
MinMaxLTTBDownsampler NaNMinMaxLTTBDownsampler
LTTBDownsampler

Note that NaNs are not supported for x-data.

Limitations & assumptions 🚨

Assumes;

  1. x-data is (non-strictly) monotonic increasing (i.e., sorted)
  2. no NaNs in x-data

πŸ‘€ Jeroen Van Der Donckt

About

High-performance time series downsampling algorithms for visualization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 82.7%
  • Rust 10.3%
  • Python 6.8%
  • Makefile 0.2%