Skip to content

Topological data analysis anomaly detection for timeseries

License

IRT-SystemX/tdaad

ConfianceAI Logo

TDAAD




TDAAD – Topological Data Analysis for Anomaly Detection

Overview

TDAAD is a Python package for unsupervised anomaly detection in multivariate time series using Topological Data Analysis (TDA). Website and documentation: https://irt-systemx.github.io/tdaad/

It builds upon two powerful open-source libraries:

  • GUDHI GUDHI for efficient and scalable computation of persistent homology and topological features,
  • scikit-learn scikit-learn for core machine learning utilities like Pipeline and objects like EllipticEnvelope.

TDAAD is inspired by the methodology introduced in:

Chazal, F., Levrard, C., & Royer, M. (2024). Topological Analysis for Detecting Anomalies (TADA) in dependent sequences: application to Time Series. Journal of Machine Learning Research, 25(365), 1–49. https://www.jmlr.org/papers/v25/24-0853.html

🔍 Features

  • Unsupervised anomaly detection in multivariate time series
  • Topological embedding using persistent homology
  • Scikit-learn–style API (fit, transform, score_samples)
  • Configurable embedding dimension, window size, and topological parameters
  • Works with NumPy arrays or pandas DataFrames

🛠 Installation

Install from PyPI (recommended):

pip install tdaad

Or install from source:

git clone https://github.com/IRT-SystemX/tdaad.git
cd tdaad
pip install .

Requirements:

  • Python ≥ 3.7
  • See requirements.txt for full dependency list

🚀 Quickstart

Here’s a minimal example using TopologicalAnomalyDetector:

import numpy as np
from tdaad.anomaly_detectors import TopologicalAnomalyDetector

# Example multivariate time series with shape (n_samples, n_features)
X = np.random.randn(1000, 3)

# Initialize and fit the detector
detector = TopologicalAnomalyDetector(window_size=100, n_centers_by_dim=3)
detector.fit(X)

# Compute anomaly scores
scores = detector.score_samples(X)

You can also use pandas.DataFrame instead of a NumPy array — column names will be preserved in the output.

For more advanced usage (e.g. custom embeddings, parameter tuning), see the examples folder or API documentation

📌 Usage Notes

  • TDAAD is designed for multivariate time series (2D inputs) — univariate data is not supported.
  • The core detection method relies on sliding-window embeddings and persistent homology to identify structural changes in the signal.
  • The key parameters that impact results and runtime are:
    • window_size controls the time resolution — larger windows capture slower anomalies, smaller ones detect more localized changes.
    • n_centers_by_dim controls the number of reference shapes used per homology dimension (e.g. connected components in H0, loops in H1, ...). Increasing this improves sensitivity but adds computation time.
    • tda_max_dim sets the maximum topological feature dimension computed (0 = connected components, 1 = loops, 2 = voids, ...). Higher values increase runtime and memory usage.
  • Internally, computations are parallelized using joblib to scale to larger datasets. Use n_jobs to control parallelism.
  • Inputs can be numpy.ndarray or pandas.DataFrame. Column names are preserved in the output when using DataFrames.

⚙️ You can typically handle ~100 sensors and a few hundred time steps per window on a modern machine.

🧮 Basic Complexity of Persistent Homology in TDAAD

  • Total complexity scales with: $O(N × (w × p)^{(d+2)})$ where $w$ is the time resolution (or window_size, number of time steps per window), $p$ is the number of variables (features/sensors), $d$ is the maximum homology dimension tda_max_dim, and $N$ is the total number of sliding windows.
  • So note that increasing max homology dimension d raises the exponent, causing exponential growth. The number of centers n_centers_by_dim used after the PH computation does not significantly affect the overall complexity.

📚 Documentation & Resources


Document generation

To regenerate the documentation, rerun the following commands from the project root, adapting if necessary:

pip install -r docs/docs_requirements.txt -r requirements.txt
sphinx-apidoc -o docs/source/generated tdaad
sphinx-build -M html docs/source docs/build -W --keep-going

Contributors and Support

This work has been supported by the French government under the "France 2030” program, as part of the SystemX Technological Research Institute within the Confiance.ai project. 

TDAAD is developed by and supported by the