Releases: cvxgrp/pymde
0.3.0
The release dramatically decreases the time it takes PyMDE to construct embeddings by shipping a custom Rust implementations of approximate and exact k-nearest neighbor algorithms. This release also introduces a new API to choose between approximate and exact neighbor construction.
Extremely fast k-nearest neighbor preprocessing, written in Rust.
Computing local-structure preserving embeddings (such as t-SNE, UMAP, and LargeVis) requires the construction of a k-nearest neighbor graph given an original data matrix. Like the UMAP package, previous versions of PyME used pynndescent to construct this graph. This release drops the dependency on pynndescent in favor of a custom Rust implementation.
Extremely fast approximate k-nearest neighbors, written in Rust. This release includes a custom implementation the nn-descent algorithm for the Euclidean metric, written in Rust. This has a few benefits:
-
Dramatically smaller "time-to-first embedding". Computing
$k=15$ nearest neighbors for the MNIST dataset (70,000 vectors, 784 dimension takes less than 1 second on a modern MacBook Pro, while pynndescent takes approximately one minute due to Numba's JIT. - Competitive with warmed pynndescent. Our rust implementation is competitive with burned-in or warm pynndescent (post JIT), with comparable execution time and recall on simple benchmarks.
- Easy to install. The elimination of Numba as a transitive dependency means that PyMDE is now easy to install on all major platforms, Python 3.10+, with no restrictions on NumPy version.
Performance is obtained by custom SIMD kernels, an auto-vectorization-friendly fallback, and rayon for parallelism.
Extremely fast exact k-nearest neighbor calculation, written in Rust. This release also includes a custom Rust implementation for exact k-nearest neighbor calculations.
- Extremely fast. On modern machines, the exact algorithm is even faster than nn-descent (both our own and pynndescent). Performance is obtained by heavily exploiting parallelism and via BLAS (Accelerate on macOS, OpenBLAS on Linux and Windows).
- Competitive with
faiss. Our implementation is competitive withfaiss, but without the dependency on OpenMP (which makes it difficult to installfaissalongsidePyTorch).
Choosing the k-nearest neighbor algorithm
pymde.preserve_neighbors now takes an optional argument, knn_method, which controls whether an approximate or exact kNN algorithm is used. In practice the performance of the embedding is typically the same. When omitted we choose the method based on simple heuristic.
New interactive examples
PyMDE now ships with interactive marimo notebook examples that bring embeddings to life, letting you select into matplotlib plots and automatically retrieve the original data in Python for downstream analysis. See our example notebooks at https://pymde.org/examples.
Full Changelog: v0.2.3...v0.3.0
v0.2.0
v0.1.18
v0.1.17
v0.1.16
v0.1.13
Adds a function pymde.seed() for controlling randomness.
PyMDE's internal random state can be set by passing an integer seed
to this function (e.g., pymde.seed(0)). This is useful when
exact reproducibility is required.
See https://pymde.org/getting_started/index.html#reproducibility