TDAAD

TDAAD – Topological Data Analysis for Anomaly Detection

Overview

TDAAD is a Python package for unsupervised anomaly detection in multivariate time series using Topological Data Analysis (TDA). Website and documentation: https://irt-systemx.github.io/tdaad/

It builds upon two powerful open-source libraries:

GUDHI for efficient and scalable computation of persistent homology and topological features,
scikit-learn for core machine learning utilities like Pipeline and objects like EllipticEnvelope.

TDAAD is inspired by the methodology introduced in:

Chazal, F., Levrard, C., & Royer, M. (2024). Topological Analysis for Detecting Anomalies (TADA) in dependent sequences: application to Time Series. Journal of Machine Learning Research, 25(365), 1–49. https://www.jmlr.org/papers/v25/24-0853.html

🔍 Features

Unsupervised anomaly detection in multivariate time series
Topological embedding using persistent homology
Scikit-learn–style API (fit, transform, score_samples)
Configurable embedding dimension, window size, and topological parameters
Works with NumPy arrays or pandas DataFrames

🛠 Installation

Install from PyPI (recommended):

pip install tdaad

Or install from source:

git clone https://github.com/IRT-SystemX/tdaad.git
cd tdaad
pip install .

Requirements:

Python ≥ 3.7
See requirements.txt for full dependency list

🚀 Quickstart

Here’s a minimal example using TopologicalAnomalyDetector:

import numpy as np
from tdaad.anomaly_detectors import TopologicalAnomalyDetector

# Example multivariate time series with shape (n_samples, n_features)
X = np.random.randn(1000, 3)

# Initialize and fit the detector
detector = TopologicalAnomalyDetector(window_size=100, n_centers_by_dim=3)
detector.fit(X)

# Compute anomaly scores
scores = detector.score_samples(X)

You can also use pandas.DataFrame instead of a NumPy array — column names will be preserved in the output.

For more advanced usage (e.g. custom embeddings, parameter tuning), see the examples folder or API documentation

📌 Usage Notes

TDAAD is designed for multivariate time series (2D inputs) — univariate data is not supported.
The core detection method relies on sliding-window embeddings and persistent homology to identify structural changes in the signal.
The key parameters that impact results and runtime are:
- window_size controls the time resolution — larger windows capture slower anomalies, smaller ones detect more localized changes.
- n_centers_by_dim controls the number of reference shapes used per homology dimension (e.g. connected components in H0, loops in H1, ...). Increasing this improves sensitivity but adds computation time.
- tda_max_dim sets the maximum topological feature dimension computed (0 = connected components, 1 = loops, 2 = voids, ...). Higher values increase runtime and memory usage.
Internally, computations are parallelized using joblib to scale to larger datasets. Use n_jobs to control parallelism.
Inputs can be numpy.ndarray or pandas.DataFrame. Column names are preserved in the output when using DataFrames.

⚙️ You can typically handle ~100 sensors and a few hundred time steps per window on a modern machine.

🧮 Basic Complexity of Persistent Homology in TDAAD

Total complexity scales with: $O(N × (w × p)^{(d+2)})$ where $w$ is the time resolution (or window_size, number of time steps per window), $p$ is the number of variables (features/sensors), $d$ is the maximum homology dimension tda_max_dim, and $N$ is the total number of sliding windows.
So note that increasing max homology dimension d raises the exponent, causing exponential growth. The number of centers n_centers_by_dim used after the PH computation does not significantly affect the overall complexity.

📚 Documentation & Resources

Document generation

To regenerate the documentation, rerun the following commands from the project root, adapting if necessary:

pip install -r docs/docs_requirements.txt -r requirements.txt
sphinx-apidoc -o docs/source/generated tdaad
sphinx-build -M html docs/source docs/build -W --keep-going

Contributors and Support

This work has been supported by the French government under the "France 2030” program, as part of the SystemX Technological Research Institute within the Confiance.ai project.

TDAAD is developed by and supported by the

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
_static		_static
docs		docs
examples		examples
scripts		scripts
tdaad		tdaad
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CHANGELOG.md		CHANGELOG.md
CLA.md		CLA.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

TDAAD

TDAAD – Topological Data Analysis for Anomaly Detection

Overview

🔍 Features

🛠 Installation

🚀 Quickstart

📌 Usage Notes

🧮 Basic Complexity of Persistent Homology in TDAAD

📚 Documentation & Resources

Document generation

Contributors and Support

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

License

Uh oh!

IRT-SystemX/tdaad

Folders and files

Latest commit

History

Repository files navigation

TDAAD

TDAAD – Topological Data Analysis for Anomaly Detection

Overview

🔍 Features

🛠 Installation

🚀 Quickstart

📌 Usage Notes

🧮 Basic Complexity of Persistent Homology in TDAAD

📚 Documentation & Resources

Document generation

Contributors and Support

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages