ENH: stats.multivariate: introduce `Covariance` class and subclasses #88

mdhaber · 2022-08-21T17:23:23Z

Explore the use of a Covariance class to address the need to accept alternative representations of a multivariate distribution shape matrix and avoid re-processing this matrix each time a frozen distribution method is called.

…esting

mdhaber · 2022-08-21T17:27:24Z

scipy/stats/_covariance.py

+        """
+        return self._whiten(x)
+
+    @cached_property


@cached_property is probably not appropriate all over the place, here. The important thing with this Covariance class is to define an interface and provide default documentation (which would not need to be rewritten by subclasses).

mdhaber · 2022-08-21T17:38:06Z

scipy/stats/_covariance.py

+    Representation of a covariance matrix as needed by multivariate_normal
+    """
+
+    def whiten(self, x):


This name is taken from KDE. Really, I think I'd just replace this method with xTPx, which would do the equivalent of np.sum(np.square(np.dot(dev, prec_U)), axis=-1).

mdhaber · 2022-09-09T00:36:41Z

scipy/stats/_covariance.py

+    def __init__(self, cov):
+        cov = self._validate_matrix(cov, 'cov')
+
+        self._factor = _J(np.linalg.cholesky(_J(_T(cov))))


It turns out we can get $x^T \Sigma^{-1} x$ without the _J flipping. See scipygh-16987.

mdhaber · 2022-09-09T01:16:09Z

scipy/stats/_covariance.py

+                else self._covariance)
+
+
+class CovViaEigendecomposition(Covariance):


One advantage of the eigendecomposition is that it supports singular covariance matrices. But we can compute the key $x^T \Sigma^* x$ product using LDL factors, too.

For nonsingular matrices, this is what it looks like:

import numpy as np from scipy import linalg n = 40 rng = np.random.default_rng(14) x = rng.random(size=(n, 1)) A = rng.random(size=(n, n)) A = A @ A.T # Eigendecomposition route w, V = linalg.eigh(A) z = x.T @ (V * w**(-0.5)) xSxT = (z**2).sum() # LDL route L, D, perm = linalg.ldl(A) y = linalg.solve_triangular(L[perm], x[perm], lower=True).T * np.diag(D)**(-0.5) xSxT2 = (y**2).sum() print(xSxT, xSxT2)

For singular matrices, we need to mask out the zero eigenvalues, so it looks a little more complicated. When the vector x is within the right subspace, we can perform the calculation as:

import numpy as np from scipy import linalg n = 10 rng = np.random.default_rng(1329348965454623456) x = rng.random(size=(n, 1)) A = rng.random(size=(n, n)) A[0] = A[1] + A[2] # make it singular x[0] = x[1] + x[2] # ensure x is in the subspace A = A @ A.T eps = 1e-10 # Eigendecomposition route w, V = linalg.eigh(A) mask = np.abs(w) < eps w_ = w.copy() w_[mask] = np.inf z = x.T @ (V * w_**(-0.5)) xSxT = (z**2).sum() # LDL route L, D, perm = linalg.ldl(A) d = np.diag(D).copy()[:, np.newaxis] mask = np.abs(d) < eps d_ = d.copy() d[mask] = 0 d_[mask] = np.inf y = linalg.solve_triangular(L[perm], x[perm], lower=True) * d_**(-0.5) xSxT2 = (y**2).sum() np.testing.assert_allclose(xSxT, xSxT2)

And we can tell it's in the subspace if np.allclose(L@(d**0.5*y), x). Otherwise, the PDF is zero.

This needs some work. The pseudodeterminant calculation isn't just the pseudodeterminant of the D matrix (product of positive diagonal elements).

class CovViaLDL(Covariance): def __init__(self, ldl): L, d, perm = ldl d = self._validate_vector(d, 'd') perm = self._validate_vector(perm, 'perm') L = self._validate_matrix(L, 'L') i_zero = d <= 0 positive_d = np.array(d, dtype=np.float64) positive_d[i_zero] = 1 # ones don't affect determinant self._log_pdet = np.sum(np.log(positive_d), axis=-1) psuedo_reciprocals = 1 / np.sqrt(positive_d) psuedo_reciprocals[i_zero] = 0 self._psuedo_reciprocals = psuedo_reciprocals self._d = d self._perm = perm self._L = L self._rank = d.shape[-1] - i_zero.sum(axis=-1) self._dimensionality = d.shape[-1] self._shape = L.shape # This is only used for `_support_mask`, not to decide whether # the covariance is singular or not. self._eps = 1e-8 self._allow_singular = True def _whiten(self, x): L = self.L[self.perm] x = self.x[self.perm] return linalg.solve_triangular(L, x, lower=True) * self.psuedo_reciprocals @cached_property def _covariance(self): return (self._L * self._d) @ self._L.T @staticmethod def from_ldl(ldl): r""" Representation of covariance provided via LDL (aka LDLT) decomposition Parameters ---------- ldl : sequence A sequence (nominally a tuple) containing the lower factor ``L``, the diagonal multipliers ``d``, and row-permutation indices ``perm`` as computed by `scipy.linalg.ldl`. Notes ----- Let the covariance matrix be :math:`A`, and let :math:`L` be a lower triangular matrix and :math:`D` be a diagonal matrix such that :math:`L D L^T = A`. When all of the eigenvalues are strictly positive, whitening of a data point :math:`x` is performed by computing :math:`x^T (V W^{-1/2})`, where the inverse square root can be taken element-wise. :math:`\log\det{A}` is calculated as :math:`tr(\log{W})`, where the :math:`\log` operation is performed element-wise. This `Covariance` class supports singular covariance matrices. When computing ``_log_pdet``, non-positive eigenvalues are ignored. Whitening is not well defined when the point to be whitened does not lie in the span of the columns of the covariance matrix. The convention taken here is to treat the inverse square root of non-positive eigenvalues as zeros. Examples -------- Prepare a symmetric positive definite covariance matrix ``A`` and a data point ``x``. >>> import numpy as np >>> from scipy import stats >>> rng = np.random.default_rng() >>> n = 5 >>> A = rng.random(size=(n, n)) >>> A = A @ A.T # make the covariance symmetric positive definite >>> x = rng.random(size=n) Perform the eigendecomposition of ``A`` and create the `Covariance` object. >>> w, v = np.linalg.eigh(A) >>> cov = stats.Covariance.from_eigendecomposition((w, v)) Compare the functionality of the `Covariance` object against reference implementations. >>> res = cov.whiten(x) >>> ref = x @ (v @ np.diag(w**-0.5)) >>> np.allclose(res, ref) True >>> res = cov.log_pdet >>> ref = np.linalg.slogdet(A)[-1] >>> np.allclose(res, ref) True """ return CovViaLDL(ldl)

mdhaber added 3 commits August 19, 2022 17:11

ENH: stats.multivariate_normal: introduce and use new Covariance class

c46d557

ENH: stats.multivariate_normal: improvements to Covariance classes, t…

9e1467c

…esting

Merge branch 'main' into mvn_covariance

e4238ac

mdhaber mentioned this pull request Aug 21, 2022

ENH: Allow specyfing inverse covariance of a multivariate normal distribution scipy/scipy#16002

Merged

mdhaber commented Aug 25, 2022

View reviewed changes

mdhaber added 8 commits August 24, 2022 20:35

ENH: stats: vectorize CovViaDiagonal

2fab47b

ENH: stats: vectorize CovViaCov

b0120de

ENH: stats: narrow down on desired vectorization behavior

70f3d77

ENH: stats: vectorize CovViaPrecision

c13b697

ENH: stats: vectorize CovViaEigendecomposition

2619d94

MAINT: stats: fix CovViaDiag shapes

a5c5c24

MAINT: stats.Covariance: update input validation for nd inputs

56c8517

ENH: stats.CovViaCov: vectorize whiten

60d36e2

mdhaber mentioned this pull request Aug 26, 2022

ENH: speed up multivariate_normal.logpdf for nonsingular matrices scipy/scipy#9973

Closed

4 tasks

mdhaber added 2 commits August 27, 2022 15:24

TST: stats.covariance: vectorized tests of covariance objects

ceacd34

ENH: stats.multivariate_normal: vectorize pdf/logpdf methods

9e28528

mdhaber commented Sep 9, 2022

View reviewed changes

mdhaber mentioned this pull request Oct 2, 2022

ENH: stats.covariance: add CovViaCholesky scipy/scipy#17128

Merged

mdhaber mentioned this pull request May 31, 2024

ENH: stats.multivariate_t: support covariance class and unify handling of infinite degrees of freedom scipy/scipy#20015

Draft

mdhaber mentioned this pull request Jun 28, 2024

ENH: stats: add array API-support scipy/scipy#20544

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: stats.multivariate: introduce `Covariance` class and subclasses #88

ENH: stats.multivariate: introduce `Covariance` class and subclasses #88

mdhaber commented Aug 21, 2022

mdhaber Aug 21, 2022

mdhaber Aug 21, 2022

mdhaber Sep 9, 2022

mdhaber Sep 9, 2022 •

edited

Loading

mdhaber Oct 7, 2022 •

edited

Loading

		else self._covariance)


		class CovViaEigendecomposition(Covariance):

ENH: stats.multivariate: introduce Covariance class and subclasses #88

Are you sure you want to change the base?

ENH: stats.multivariate: introduce Covariance class and subclasses #88

Conversation

mdhaber commented Aug 21, 2022

mdhaber Aug 21, 2022

Choose a reason for hiding this comment

mdhaber Aug 21, 2022

Choose a reason for hiding this comment

mdhaber Sep 9, 2022

Choose a reason for hiding this comment

mdhaber Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

mdhaber Oct 7, 2022 • edited Loading

Choose a reason for hiding this comment

ENH: stats.multivariate: introduce `Covariance` class and subclasses #88

ENH: stats.multivariate: introduce `Covariance` class and subclasses #88

mdhaber Sep 9, 2022 •

edited

Loading

mdhaber Oct 7, 2022 •

edited

Loading