You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to show that the squared Frobenius norm $\frac{1}{2} \|A - B \|_F^2$ is a Bregman divergence. Let $\psi(A) = \frac{1}{2}\|A\|_F^2$, so that $\nabla \psi(A) = A$. Using norm properties, we can then write the Bregman divergence associated with $\psi$ as
4
+
5
+
$$
6
+
\begin{aligned}
7
+
B_\psi(A \| B) &= \psi(A) - \psi(B) - \langle \nabla \psi(B), A - B \rangle \\
8
+
&= \frac{1}{2}\|A\|_F^2 - \frac{1}{2}\|B\|_F^2 - \langle B, A \rangle + \langle B, B \rangle \\
9
+
&= \frac{1}{2}\|A\|_F^2 - \langle B, A \rangle + \frac{1}{2}\|B\|_F^2 \\
10
+
&= \frac{1}{2} \big[ \langle A, A \rangle - 2\langle B, A \rangle + \langle B, B \rangle \big] \\
11
+
&= \frac{1}{2} \langle A - B, A - B \rangle \\
12
+
&= \frac{1}{2} \| A - B \|_F^2.
13
+
\end{aligned}
14
+
$$
15
+
16
+
Similarly, the Bregman divergence induced from the log-partition of the Gaussian $G(\theta) = \theta^2/2$ is the squared Euclidean distance.
Copy file name to clipboardExpand all lines: docs/src/math/intro.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ This can be formulated as an optimization problem where we find the rank-$k$ app
32
32
```math
33
33
\begin{aligned}
34
34
& \underset{\Theta}{\text{minimize}}
35
-
& & \|X - \Theta\|_F \\
35
+
& & \|X - \Theta\|_F^2 \\
36
36
& \text{subject to}
37
37
& & \mathrm{rank}\left(\Theta\right) = k
38
38
\end{aligned}
@@ -41,7 +41,7 @@ This can be formulated as an optimization problem where we find the rank-$k$ app
41
41
where $\| \cdot \|_F$ denotes the Frobenius norm. The Frobenius norm is calculated as the square root of the sum of the squared differences between corresponding elements of the two matrices:
42
42
43
43
```math
44
-
\| X - \Theta \|_F = \sqrt{\sum_{i=1}^{n}\sum_{j=1}^{d}(X_{ij}-\Theta_{ij})^2}.
44
+
\| X - \Theta \|_F^2 = \sum_{i=1}^{n}\sum_{j=1}^{d}(X_{ij}-\Theta_{ij})^2.
45
45
```
46
46
47
47
Intuitively, it can be seen as an extension of the Euclidean distance for vectors, applied to matrices by flattening them into large vectors. This makes the Frobenius norm a natural way to measure how well the lower-dimensional representation approximates the original data.
@@ -64,13 +64,13 @@ The goal of PCA here is to find the parameters $\Theta = [\theta_1, \dots, \thet
which is equivalent to minimizing the Frobenius norm in the geometric interpretation.
67
+
which is equivalent to minimizing the squared Frobenius norm in the geometric interpretation.
68
68
69
69
## Exponential Family PCA (EPCA)
70
70
71
71
EPCA is similar to generalized linear models (GLMs) [GLM](@cite). Just as GLMs extend linear regression to handle a variety of response distributions, EPCA generalizes PCA to accommodate data with noise drawn from any exponential family distribution, rather than just Gaussian noise. This allows EPCA to address a broader range of real-world data scenarios where the Gaussian assumption may not hold (e.g., binary, count, discrete distribution data).
72
72
73
-
At its core, EPCA replaces the geometric PCA objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family—rather than the Frobenius norm, which PCA uses. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
73
+
At its core, EPCA replaces the geometric PCA objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family—rather than the squared Frobenius norm, which PCA uses. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
Copy file name to clipboardExpand all lines: paper.bib
+14
Original file line number
Diff line number
Diff line change
@@ -56,6 +56,20 @@ @BOOK{GLM
56
56
57
57
@article{optim, doi = {10.21105/joss.00615}, url = {https://doi.org/10.21105/joss.00615}, year = {2018}, publisher = {The Open Journal}, volume = {3}, number = {24}, pages = {615}, author = {Patrick K. Mogensen and Asbjørn N. Riseth}, title = {Optim: A mathematical optimization package for Julia}, journal = {Journal of Open Source Software} }
58
58
59
+
@article{Bregman,
60
+
title = {The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming},
61
+
journal = {USSR Computational Mathematics and Mathematical Physics},
abstract = {IN this paper we consider an iterative method of finding the common point of convex sets. This method can be regarded as a generalization of the methods discussed in [1–4]. Apart from problems which can be reduced to finding some point of the intersection of convex sets, the method considered can be applied to the approximate solution of problems in linear and convex programming.}
71
+
}
72
+
59
73
@article{symbolics,
60
74
author = {Gowda, Shashi and Ma, Yingbo and Cheli, Alessandro and Gw\'{o}\'{z}zd\'{z}, Maja and Shah, Viral B. and Edelman, Alan and Rackauckas, Christopher},
61
75
title = {High-Performance Symbolic-Numerics via Multiple Dispatch},
Copy file name to clipboardExpand all lines: paper.md
+29-13
Original file line number
Diff line number
Diff line change
@@ -31,32 +31,52 @@ bibliography: paper.bib
31
31
32
32
# Summary
33
33
34
-
Dimensionality reduction techniques like principal component analysis (PCA) [@PCA]are fundamental tools in machine learning and data science for managing high-dimensional data. While PCA is effective for continuous, real-valued data, it may not perform well for binary, count, or discrete distribution data. Exponential family PCA (EPCA) [@EPCA] generalizes PCA to accommodate these data types, making it a more suitable choice for tasks like belief compression in reinforcement learning [@Roy]. `ExpFamilyPCA.jl` is the first Julia [@Julia] package for EPCA, offering fast implementations for common distributions and a flexible interface for custom objectives.
34
+
Principal component analysis (PCA) [@PCA]is a fundamental tool in data science and machine learning for dimensionality reduction and denoising. While PCA is effective for continuous, real-valued data, it may not perform well for binary, count, or discrete distribution data. Exponential family PCA (EPCA) [@EPCA] generalizes PCA to accommodate these data types, making it more suitable for tasks such as belief compression in reinforcement learning [@Roy]. `ExpFamilyPCA.jl` is the first Julia [@Julia] package for EPCA, offering fast implementations for common distributions and a flexible interface for custom distributions.
35
35
36
36
# Statement of Need
37
37
38
-
To our knowledge, there are no open-source implementations of EPCA and the sole proprietary package [@epca-MATLAB] is limited to a single distribution. Modern applications of EPCA in reinforcement learning [@Roy] and mass spectrometry [@spectrum] require multiple distributions, numerical stability, and the ability to handle large datasets. `ExpFamilyPCA.jl` addresses this gap by providing fast implementations for several exponential family distributions and multiple constructors for custom distributions. More implementation and mathematical details are in the [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/).
38
+
<!-- REDO -->
39
+
40
+
To our knowledge, there are no open-source implementations of EPCA, and the sole proprietary package [@epca-MATLAB] is limited to a single distribution. Modern applications of EPCA in reinforcement learning [@Roy] and mass spectrometry [@spectrum] require multiple distributions, numerical stability, and the ability to handle large datasets. `ExpFamilyPCA.jl` addresses this gap by providing fast implementations for several exponential family distributions and multiple constructors for custom distributions. More implementation and mathematical details are in the [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/).
39
41
40
42
# Problem Formulation
41
43
44
+
- PCA has a specific geometric objective in terms of projections
45
+
- This can also be interpreted as a denoising process using Gaussian MLE
46
+
- EPCA generalizes geometric objective using Bregman divergences which are related to exponential families
47
+
48
+
TODO: read the original GLM paper
49
+
50
+
PCA has many interpretations (e.g., a variance-maximizing compression, a distance-minimizing projection). The interpretation that is most useful for understanding EPCA is the denoising interpretration. Suppose we have $n$ noisy observations $x_1, \dots, x_n \in \mathbb{R}^{n \times d$
51
+
42
52
## Principal Component Analysis
43
53
44
-
Traditional PCA is a low-rank matrix approximation problem. For a data matrix $X \in \mathbb{R}^{n \times d}$, we want to find the low-rank matrix approximation $\Theta \in \mathbb{R}^{n \times d}$ such that $\mathrm{rank}(\Theta) = k \leq d$. Formally,
54
+
55
+
56
+
57
+
58
+
Traditional PCA is a low-rank matrix approximation problem. For a data matrix $X \in \mathbb{R}^{n \times d}$ with $n$ observations, we want to find the low-rank matrix approximation $\Theta \in \mathbb{R}^{n \times d}$ such that $\mathrm{rank}(\Theta) = k \leq d$. Formally,
45
59
46
60
$$\begin{aligned}
47
61
& \underset{\Theta}{\text{minimize}}
48
-
& & \|X - \Theta\|_F \\
62
+
& & \|X - \Theta\|_F^2 \\
49
63
& \text{subject to}
50
64
& & \mathrm{rank}\left(\Theta\right) = k
51
65
\end{aligned}$$
52
66
53
-
where $\| \cdot \|_F$ denotes the Frobenius norm.[^1]
67
+
where $\| \cdot \|_F$ denotes the Frobenius norm[^1] and $\Theta = AV$ where $A = X_k \in \mathbb{R}^{n \times k}$ and $V = X_k \in \mathbb{R}^{k \times d}$.
54
68
55
69
[^1]: The Frobenius norm is a generalization of the Euclidean distance and thus a special case of the Bregman divergence (induced from the log-partition of the normal distribution).
56
70
57
71
## Exponential Family PCA
58
72
59
-
EPCA is a generalization of PCA that replaces PCA's geometric objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family (see [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/bregman/))—rather than the Frobenius norm. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
73
+
EPCA is a generalization of PCA that replaces PCA's geometric objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family (see [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/bregman/))—rather than the squared Frobenius norm. The Bregman divergence $B_F$ associated with $F$ is defined [@Bregman]:
The Bregman-based objective makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
60
80
61
81
$$\begin{aligned}
62
82
& \underset{\Theta}{\text{minimize}}
@@ -74,10 +94,6 @@ In this formulation,
74
94
75
95
EPCA is similar to generalized linear models (GLMs) [@GLM]. Just as GLMs extend linear regression to handle a variety of response distributions, EPCA generalizes PCA to accommodate data with noise drawn from any exponential family distribution, rather than just Gaussian noise. This allows EPCA to address a broader range of real-world data scenarios where the Gaussian assumption may not hold (e.g., binary, count, discrete distribution data).
76
96
77
-
## Related Work
78
-
79
-
Exponential family PCA was introduced by @EPCA, and several papers have extended the technique [@LitReview]. While there have been advances, EPCA remains the most well-studied variation of PCA in reinforcement learning and sequential decision-making [@Roy].
80
-
81
97
# API
82
98
83
99
## Usage
@@ -88,9 +104,9 @@ Each `EPCA` object supports a three-method interface: `fit!`, `compress`, and `d
0 commit comments