Skip to content

Commit f25046b

Browse files
changed docs, added scripts, etc.
1 parent e608df4 commit f25046b

24 files changed

+1386
-650
lines changed

docs/src/math/appendix/gaussian.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Gaussian EPCA and the Squared Frobenius Norm
2+
3+
We want to show that the squared Frobenius norm $\frac{1}{2} \|A - B \|_F^2$ is a Bregman divergence. Let $\psi(A) = \frac{1}{2}\|A\|_F^2$, so that $\nabla \psi(A) = A$. Using norm properties, we can then write the Bregman divergence associated with $\psi$ as
4+
5+
$$
6+
\begin{aligned}
7+
B_\psi(A \| B) &= \psi(A) - \psi(B) - \langle \nabla \psi(B), A - B \rangle \\
8+
&= \frac{1}{2}\|A\|_F^2 - \frac{1}{2}\|B\|_F^2 - \langle B, A \rangle + \langle B, B \rangle \\
9+
&= \frac{1}{2}\|A\|_F^2 - \langle B, A \rangle + \frac{1}{2}\|B\|_F^2 \\
10+
&= \frac{1}{2} \big[ \langle A, A \rangle - 2\langle B, A \rangle + \langle B, B \rangle \big] \\
11+
&= \frac{1}{2} \langle A - B, A - B \rangle \\
12+
&= \frac{1}{2} \| A - B \|_F^2.
13+
\end{aligned}
14+
$$
15+
16+
Similarly, the Bregman divergence induced from the log-partition of the Gaussian $G(\theta) = \theta^2/2$ is the squared Euclidean distance.

docs/src/math/bregman.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Understanding Bregman divergences is essential for EPCA because they link the ex
99
Formally, the Bregman divergence [Bregman](@cite) $B_F$ associated with a function $F(\theta)$ is defined as
1010

1111
```math
12-
B_F(p, q) = F(p) - F(q) - \langle f(p), p - q \rangle
12+
B_F(p \| q) = F(p) - F(q) - \langle f(p), p - q \rangle
1313
```
1414

1515
where

docs/src/math/intro.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ This can be formulated as an optimization problem where we find the rank-$k$ app
3232
```math
3333
\begin{aligned}
3434
& \underset{\Theta}{\text{minimize}}
35-
& & \|X - \Theta\|_F \\
35+
& & \|X - \Theta\|_F^2 \\
3636
& \text{subject to}
3737
& & \mathrm{rank}\left(\Theta\right) = k
3838
\end{aligned}
@@ -41,7 +41,7 @@ This can be formulated as an optimization problem where we find the rank-$k$ app
4141
where $\| \cdot \|_F$ denotes the Frobenius norm. The Frobenius norm is calculated as the square root of the sum of the squared differences between corresponding elements of the two matrices:
4242

4343
```math
44-
\| X - \Theta \|_F = \sqrt{\sum_{i=1}^{n}\sum_{j=1}^{d}(X_{ij}-\Theta_{ij})^2}.
44+
\| X - \Theta \|_F^2 = \sum_{i=1}^{n}\sum_{j=1}^{d}(X_{ij}-\Theta_{ij})^2.
4545
```
4646

4747
Intuitively, it can be seen as an extension of the Euclidean distance for vectors, applied to matrices by flattening them into large vectors. This makes the Frobenius norm a natural way to measure how well the lower-dimensional representation approximates the original data.
@@ -64,13 +64,13 @@ The goal of PCA here is to find the parameters $\Theta = [\theta_1, \dots, \thet
6464
\ell(\Theta; X) = \frac{1}{2} \sum_{i=1}^{n} (x_i-\theta_i)^2
6565
```
6666

67-
which is equivalent to minimizing the Frobenius norm in the geometric interpretation.
67+
which is equivalent to minimizing the squared Frobenius norm in the geometric interpretation.
6868

6969
## Exponential Family PCA (EPCA)
7070

7171
EPCA is similar to generalized linear models (GLMs) [GLM](@cite). Just as GLMs extend linear regression to handle a variety of response distributions, EPCA generalizes PCA to accommodate data with noise drawn from any exponential family distribution, rather than just Gaussian noise. This allows EPCA to address a broader range of real-world data scenarios where the Gaussian assumption may not hold (e.g., binary, count, discrete distribution data).
7272

73-
At its core, EPCA replaces the geometric PCA objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family—rather than the Frobenius norm, which PCA uses. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
73+
At its core, EPCA replaces the geometric PCA objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family—rather than the squared Frobenius norm, which PCA uses. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
7474

7575
```math
7676
\begin{aligned}

paper.bib

+14
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,20 @@ @BOOK{GLM
5656

5757
@article{optim, doi = {10.21105/joss.00615}, url = {https://doi.org/10.21105/joss.00615}, year = {2018}, publisher = {The Open Journal}, volume = {3}, number = {24}, pages = {615}, author = {Patrick K. Mogensen and Asbjørn N. Riseth}, title = {Optim: A mathematical optimization package for Julia}, journal = {Journal of Open Source Software} }
5858

59+
@article{Bregman,
60+
title = {The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming},
61+
journal = {USSR Computational Mathematics and Mathematical Physics},
62+
volume = {7},
63+
number = {3},
64+
pages = {200-217},
65+
year = {1967},
66+
issn = {0041-5553},
67+
doi = {10.1016/0041-5553(67)90040-7},
68+
url = {https://www.sciencedirect.com/science/article/pii/0041555367900407},
69+
author = {L.M. Bregman},
70+
abstract = {IN this paper we consider an iterative method of finding the common point of convex sets. This method can be regarded as a generalization of the methods discussed in [1–4]. Apart from problems which can be reduced to finding some point of the intersection of convex sets, the method considered can be applied to the approximate solution of problems in linear and convex programming.}
71+
}
72+
5973
@article{symbolics,
6074
author = {Gowda, Shashi and Ma, Yingbo and Cheli, Alessandro and Gw\'{o}\'{z}zd\'{z}, Maja and Shah, Viral B. and Edelman, Alan and Rackauckas, Christopher},
6175
title = {High-Performance Symbolic-Numerics via Multiple Dispatch},

paper.md

+29-13
Original file line numberDiff line numberDiff line change
@@ -31,32 +31,52 @@ bibliography: paper.bib
3131

3232
# Summary
3333

34-
Dimensionality reduction techniques like principal component analysis (PCA) [@PCA] are fundamental tools in machine learning and data science for managing high-dimensional data. While PCA is effective for continuous, real-valued data, it may not perform well for binary, count, or discrete distribution data. Exponential family PCA (EPCA) [@EPCA] generalizes PCA to accommodate these data types, making it a more suitable choice for tasks like belief compression in reinforcement learning [@Roy]. `ExpFamilyPCA.jl` is the first Julia [@Julia] package for EPCA, offering fast implementations for common distributions and a flexible interface for custom objectives.
34+
Principal component analysis (PCA) [@PCA] is a fundamental tool in data science and machine learning for dimensionality reduction and denoising. While PCA is effective for continuous, real-valued data, it may not perform well for binary, count, or discrete distribution data. Exponential family PCA (EPCA) [@EPCA] generalizes PCA to accommodate these data types, making it more suitable for tasks such as belief compression in reinforcement learning [@Roy]. `ExpFamilyPCA.jl` is the first Julia [@Julia] package for EPCA, offering fast implementations for common distributions and a flexible interface for custom distributions.
3535

3636
# Statement of Need
3737

38-
To our knowledge, there are no open-source implementations of EPCA and the sole proprietary package [@epca-MATLAB] is limited to a single distribution. Modern applications of EPCA in reinforcement learning [@Roy] and mass spectrometry [@spectrum] require multiple distributions, numerical stability, and the ability to handle large datasets. `ExpFamilyPCA.jl` addresses this gap by providing fast implementations for several exponential family distributions and multiple constructors for custom distributions. More implementation and mathematical details are in the [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/).
38+
<!-- REDO -->
39+
40+
To our knowledge, there are no open-source implementations of EPCA, and the sole proprietary package [@epca-MATLAB] is limited to a single distribution. Modern applications of EPCA in reinforcement learning [@Roy] and mass spectrometry [@spectrum] require multiple distributions, numerical stability, and the ability to handle large datasets. `ExpFamilyPCA.jl` addresses this gap by providing fast implementations for several exponential family distributions and multiple constructors for custom distributions. More implementation and mathematical details are in the [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/).
3941

4042
# Problem Formulation
4143

44+
- PCA has a specific geometric objective in terms of projections
45+
- This can also be interpreted as a denoising process using Gaussian MLE
46+
- EPCA generalizes geometric objective using Bregman divergences which are related to exponential families
47+
48+
TODO: read the original GLM paper
49+
50+
PCA has many interpretations (e.g., a variance-maximizing compression, a distance-minimizing projection). The interpretation that is most useful for understanding EPCA is the denoising interpretration. Suppose we have $n$ noisy observations $x_1, \dots, x_n \in \mathbb{R}^{n \times d$
51+
4252
## Principal Component Analysis
4353

44-
Traditional PCA is a low-rank matrix approximation problem. For a data matrix $X \in \mathbb{R}^{n \times d}$, we want to find the low-rank matrix approximation $\Theta \in \mathbb{R}^{n \times d}$ such that $\mathrm{rank}(\Theta) = k \leq d$. Formally,
54+
55+
56+
57+
58+
Traditional PCA is a low-rank matrix approximation problem. For a data matrix $X \in \mathbb{R}^{n \times d}$ with $n$ observations, we want to find the low-rank matrix approximation $\Theta \in \mathbb{R}^{n \times d}$ such that $\mathrm{rank}(\Theta) = k \leq d$. Formally,
4559

4660
$$\begin{aligned}
4761
& \underset{\Theta}{\text{minimize}}
48-
& & \|X - \Theta\|_F \\
62+
& & \|X - \Theta\|_F^2 \\
4963
& \text{subject to}
5064
& & \mathrm{rank}\left(\Theta\right) = k
5165
\end{aligned}$$
5266

53-
where $\| \cdot \|_F$ denotes the Frobenius norm.[^1]
67+
where $\| \cdot \|_F$ denotes the Frobenius norm[^1] and $\Theta = AV$ where $A = X_k \in \mathbb{R}^{n \times k}$ and $V = X_k \in \mathbb{R}^{k \times d}$.
5468

5569
[^1]: The Frobenius norm is a generalization of the Euclidean distance and thus a special case of the Bregman divergence (induced from the log-partition of the normal distribution).
5670

5771
## Exponential Family PCA
5872

59-
EPCA is a generalization of PCA that replaces PCA's geometric objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family (see [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/bregman/))—rather than the Frobenius norm. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
73+
EPCA is a generalization of PCA that replaces PCA's geometric objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family (see [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/bregman/))—rather than the squared Frobenius norm. The Bregman divergence $B_F$ associated with $F$ is defined [@Bregman]:
74+
75+
$$
76+
B_F(p, q) = F(p) - F(q) - \nabla F(q) \cdot (p - q).
77+
$$
78+
79+
The Bregman-based objective makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
6080

6181
$$\begin{aligned}
6282
& \underset{\Theta}{\text{minimize}}
@@ -74,10 +94,6 @@ In this formulation,
7494

7595
EPCA is similar to generalized linear models (GLMs) [@GLM]. Just as GLMs extend linear regression to handle a variety of response distributions, EPCA generalizes PCA to accommodate data with noise drawn from any exponential family distribution, rather than just Gaussian noise. This allows EPCA to address a broader range of real-world data scenarios where the Gaussian assumption may not hold (e.g., binary, count, discrete distribution data).
7696

77-
## Related Work
78-
79-
Exponential family PCA was introduced by @EPCA, and several papers have extended the technique [@LitReview]. While there have been advances, EPCA remains the most well-studied variation of PCA in reinforcement learning and sequential decision-making [@Roy].
80-
8197
# API
8298

8399
## Usage
@@ -88,9 +104,9 @@ Each `EPCA` object supports a three-method interface: `fit!`, `compress`, and `d
88104
X = rand(n1, indim) * 100
89105
Y = rand(n2, indim) * 100
90106

91-
_ = fit!(gamma_epca, X)
92-
A = compress(gamma_epca, Y)
93-
Y_recon = decompress(gamma_epca, A)
107+
X_compressed = fit!(gamma_epca, X)
108+
Y_compressed = compress(gamma_epca, Y)
109+
Y_reconstructed = decompress(gamma_epca, Y_compressed)
94110
```
95111

96112
## Supported Distributions

proposal.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Summary
2+
3+
# Statment of Need
4+
5+
# Problem Formulation
6+
7+
## Principal Component Analysis
8+
9+
## Exponential Family Principal Component Analysis
10+
11+
### Poisson
12+
13+
- math
14+
15+
- Example: Belief Compression
16+
17+
### Bernoulli
18+
19+
- math
20+
21+
- example for survey data w/ binary noise (e.g., yes no question set) w/ API usage
22+
23+
### Gamma
24+
25+
- math
26+
27+
- Example: Ultrasound Denoising
28+
TODO: follow the PCA denoising guide in the princeton tutorial
File renamed without changes.

scripts/iris.ipynb

+1,266
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)