sisl
diff --git a/‎docs/src/math/appendix/gaussian.md
+16 b/‎docs/src/math/appendix/gaussian.md
+16
diff --git a/‎docs/src/math/bregman.md
+1-1 b/‎docs/src/math/bregman.md
+1-1
diff --git a/‎docs/src/math/intro.md
+4-4 b/‎docs/src/math/intro.md
+4-4
diff --git a/‎paper.bib
+14 b/‎paper.bib
+14
diff --git a/‎paper.md
+29-13 b/‎paper.md
+29-13
diff --git a/‎proposal.md
+28 b/‎proposal.md
+28
diff --git a/‎scripts/kl_circular_epca.jl renamed to ‎scripts/belief_compression.jl b/‎scripts/kl_circular_epca.jl renamed to ‎scripts/belief_compression.jl
diff --git a/‎scripts/iris.ipynb
+1,266 b/‎scripts/iris.ipynb
+1,266
@@ -0,0 +1,16 @@
+# Gaussian EPCA and the Squared Frobenius Norm
+
+We want to show that the squared Frobenius norm $\frac{1}{2} \|A - B \|_F^2$ is a Bregman divergence. Let $\psi(A) = \frac{1}{2}\|A\|_F^2$, so that $\nabla \psi(A) = A$. Using norm properties, we can then write the Bregman divergence associated with $\psi$ as
+
+$$
+\begin{aligned}
+B_\psi(A \| B) &= \psi(A) - \psi(B) - \langle \nabla \psi(B), A - B \rangle \\
+&= \frac{1}{2}\|A\|_F^2 - \frac{1}{2}\|B\|_F^2 - \langle B, A \rangle + \langle B, B \rangle \\
+&= \frac{1}{2}\|A\|_F^2 - \langle B, A \rangle + \frac{1}{2}\|B\|_F^2 \\
+&= \frac{1}{2} \big[ \langle A, A \rangle - 2\langle B, A \rangle + \langle B, B \rangle \big] \\
+&= \frac{1}{2} \langle A - B, A - B \rangle \\
+&= \frac{1}{2} \| A - B \|_F^2.
+\end{aligned}
+$$
+
+Similarly, the Bregman divergence induced from the log-partition of the Gaussian $G(\theta) = \theta^2/2$ is the squared Euclidean distance.
@@ -9,7 +9,7 @@ Understanding Bregman divergences is essential for EPCA because they link the ex
 Formally, the Bregman divergence [Bregman](@cite) $B_F$ associated with a function $F(\theta)$ is defined as
 
 ```math
-B_F(p, q) = F(p) - F(q) - \langle f(p), p - q \rangle
+B_F(p \| q) = F(p) - F(q) - \langle f(p), p - q \rangle
 ```
 
 where 
 
@@ -32,7 +32,7 @@ This can be formulated as an optimization problem where we find the rank-$k$ app
 ```math
 \begin{aligned}
 & \underset{\Theta}{\text{minimize}}
-& & \|X - \Theta\|_F \\
+& & \|X - \Theta\|_F^2 \\
 & \text{subject to}
 & & \mathrm{rank}\left(\Theta\right) = k
 \end{aligned}
@@ -41,7 +41,7 @@ This can be formulated as an optimization problem where we find the rank-$k$ app
 where $\| \cdot \|_F$ denotes the Frobenius norm. The Frobenius norm is calculated as the square root of the sum of the squared differences between corresponding elements of the two matrices:
 
 ```math
-\| X - \Theta \|_F = \sqrt{\sum_{i=1}^{n}\sum_{j=1}^{d}(X_{ij}-\Theta_{ij})^2}.
+\| X - \Theta \|_F^2 = \sum_{i=1}^{n}\sum_{j=1}^{d}(X_{ij}-\Theta_{ij})^2.
 ```
 
 Intuitively, it can be seen as an extension of the Euclidean distance for vectors, applied to matrices by flattening them into large vectors. This makes the Frobenius norm a natural way to measure how well the lower-dimensional representation approximates the original data.
@@ -64,13 +64,13 @@ The goal of PCA here is to find the parameters $\Theta = [\theta_1, \dots, \thet
 \ell(\Theta; X) = \frac{1}{2} \sum_{i=1}^{n} (x_i-\theta_i)^2
 ```
 
-which is equivalent to minimizing the Frobenius norm in the geometric interpretation.
+which is equivalent to minimizing the squared Frobenius norm in the geometric interpretation.
 
  ## Exponential Family PCA (EPCA)
 
 EPCA is similar to generalized linear models (GLMs) [GLM](@cite). Just as GLMs extend linear regression to handle a variety of response distributions, EPCA generalizes PCA to accommodate data with noise drawn from any exponential family distribution, rather than just Gaussian noise. This allows EPCA to address a broader range of real-world data scenarios where the Gaussian assumption may not hold (e.g., binary, count, discrete distribution data).
 
-At its core, EPCA replaces the geometric PCA objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family—rather than the Frobenius norm, which PCA uses. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
+At its core, EPCA replaces the geometric PCA objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family—rather than the squared Frobenius norm, which PCA uses. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
 
 ```math
 \begin{aligned}
 
@@ -56,6 +56,20 @@ @BOOK{GLM
 
 @article{optim, doi = {10.21105/joss.00615}, url = {https://doi.org/10.21105/joss.00615}, year = {2018}, publisher = {The Open Journal}, volume = {3}, number = {24}, pages = {615}, author = {Patrick K. Mogensen and Asbjørn N. Riseth}, title = {Optim: A mathematical optimization package for Julia}, journal = {Journal of Open Source Software} }
 
+@article{Bregman,
+title = {The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming},
+journal = {USSR Computational Mathematics and Mathematical Physics},
+volume = {7},
+number = {3},
+pages = {200-217},
+year = {1967},
+issn = {0041-5553},
+doi = {10.1016/0041-5553(67)90040-7},
+url = {https://www.sciencedirect.com/science/article/pii/0041555367900407},
+author = {L.M. Bregman},
+abstract = {IN this paper we consider an iterative method of finding the common point of convex sets. This method can be regarded as a generalization of the methods discussed in [1–4]. Apart from problems which can be reduced to finding some point of the intersection of convex sets, the method considered can be applied to the approximate solution of problems in linear and convex programming.}
+}
+
 @article{symbolics,
 author = {Gowda, Shashi and Ma, Yingbo and Cheli, Alessandro and Gw\'{o}\'{z}zd\'{z}, Maja and Shah, Viral B. and Edelman, Alan and Rackauckas, Christopher},
 title = {High-Performance Symbolic-Numerics via Multiple Dispatch},
 
@@ -31,32 +31,52 @@ bibliography: paper.bib
 
 # Summary
 
-Dimensionality reduction techniques like principal component analysis (PCA) [@PCA] are fundamental tools in machine learning and data science for managing high-dimensional data. While PCA is effective for continuous, real-valued data, it may not perform well for binary, count, or discrete distribution data. Exponential family PCA (EPCA) [@EPCA] generalizes PCA to accommodate these data types, making it a more suitable choice for tasks like belief compression in reinforcement learning [@Roy]. `ExpFamilyPCA.jl` is the first Julia [@Julia] package for EPCA, offering fast implementations for common distributions and a flexible interface for custom objectives.
+Principal component analysis (PCA) [@PCA] is a fundamental tool in data science and machine learning for dimensionality reduction and denoising. While PCA is effective for continuous, real-valued data, it may not perform well for binary, count, or discrete distribution data. Exponential family PCA (EPCA) [@EPCA] generalizes PCA to accommodate these data types, making it more suitable for tasks such as belief compression in reinforcement learning [@Roy]. `ExpFamilyPCA.jl` is the first Julia [@Julia] package for EPCA, offering fast implementations for common distributions and a flexible interface for custom distributions.
 
 # Statement of Need
 
-To our knowledge, there are no open-source implementations of EPCA and the sole proprietary package [@epca-MATLAB] is limited to a single distribution. Modern applications of EPCA in reinforcement learning [@Roy] and mass spectrometry [@spectrum] require multiple distributions, numerical stability, and the ability to handle large datasets. `ExpFamilyPCA.jl` addresses this gap by providing fast implementations for several exponential family distributions and multiple constructors for custom distributions. More implementation and mathematical details are in the [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/).
+<!-- REDO -->
+
+To our knowledge, there are no open-source implementations of EPCA, and the sole proprietary package [@epca-MATLAB] is limited to a single distribution. Modern applications of EPCA in reinforcement learning [@Roy] and mass spectrometry [@spectrum] require multiple distributions, numerical stability, and the ability to handle large datasets. `ExpFamilyPCA.jl` addresses this gap by providing fast implementations for several exponential family distributions and multiple constructors for custom distributions. More implementation and mathematical details are in the [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/).
 
 # Problem Formulation
 
+- PCA has a specific geometric objective in terms of projections
+- This can also be interpreted as a denoising process using Gaussian MLE
+- EPCA generalizes geometric objective using Bregman divergences which are related to exponential families
+
+TODO: read the original GLM paper
+
+PCA has many interpretations (e.g., a variance-maximizing compression, a distance-minimizing projection). The interpretation that is most useful for understanding EPCA is the denoising interpretration. Suppose we have $n$ noisy observations $x_1, \dots, x_n \in \mathbb{R}^{n \times d$
+
 ## Principal Component Analysis
 
-Traditional PCA is a low-rank matrix approximation problem. For a data matrix $X \in \mathbb{R}^{n \times d}$, we want to find the low-rank matrix approximation $\Theta \in \mathbb{R}^{n \times d}$ such that $\mathrm{rank}(\Theta) = k \leq d$. Formally,
+
+
+
+
+Traditional PCA is a low-rank matrix approximation problem. For a data matrix $X \in \mathbb{R}^{n \times d}$ with $n$ observations, we want to find the low-rank matrix approximation $\Theta \in \mathbb{R}^{n \times d}$ such that $\mathrm{rank}(\Theta) = k \leq d$. Formally,
 
 $$\begin{aligned}
 & \underset{\Theta}{\text{minimize}}
-& & \|X - \Theta\|_F \\
+& & \|X - \Theta\|_F^2 \\
 & \text{subject to}
 & & \mathrm{rank}\left(\Theta\right) = k
 \end{aligned}$$
 
-where $\| \cdot \|_F$ denotes the Frobenius norm.[^1]
+where $\| \cdot \|_F$ denotes the Frobenius norm[^1] and $\Theta = AV$ where $A = X_k \in \mathbb{R}^{n \times k}$ and $V = X_k \in \mathbb{R}^{k \times d}$.
 
 [^1]: The Frobenius norm is a generalization of the Euclidean distance and thus a special case of the Bregman divergence (induced from the log-partition of the normal distribution).
 
 ## Exponential Family PCA
 
-EPCA is a generalization of PCA that replaces PCA's geometric objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family (see [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/bregman/))—rather than the Frobenius norm. This makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
+EPCA is a generalization of PCA that replaces PCA's geometric objective with a more general probabilistic objective that minimizes the generalized Bregman divergence—a measure closely related to the exponential family (see [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/bregman/))—rather than the squared Frobenius norm. The Bregman divergence $B_F$ associated with $F$ is defined [@Bregman]:
+
+$$
+B_F(p, q) = F(p) - F(q) - \nabla F(q) \cdot (p - q).
+$$
+
+The Bregman-based objective makes EPCA particularly versatile for dimensionality reduction when working with non-Gaussian data distributions:
 
 $$\begin{aligned}
 & \underset{\Theta}{\text{minimize}}
@@ -74,10 +94,6 @@ In this formulation,
 
 EPCA is similar to generalized linear models (GLMs) [@GLM]. Just as GLMs extend linear regression to handle a variety of response distributions, EPCA generalizes PCA to accommodate data with noise drawn from any exponential family distribution, rather than just Gaussian noise. This allows EPCA to address a broader range of real-world data scenarios where the Gaussian assumption may not hold (e.g., binary, count, discrete distribution data).
 
-## Related Work
-
-Exponential family PCA was introduced by @EPCA, and several papers have extended the technique [@LitReview]. While there have been advances, EPCA remains the most well-studied variation of PCA in reinforcement learning and sequential decision-making [@Roy].
-
 # API 
 
 ## Usage
@@ -88,9 +104,9 @@ Each `EPCA` object supports a three-method interface: `fit!`, `compress`, and `d
 X = rand(n1, indim) * 100
 Y = rand(n2, indim) * 100
 
-_ = fit!(gamma_epca, X)
-A = compress(gamma_epca, Y)
-Y_recon = decompress(gamma_epca, A)
+X_compressed = fit!(gamma_epca, X)
+Y_compressed = compress(gamma_epca, Y)
+Y_reconstructed = decompress(gamma_epca, Y_compressed)
 ```
 
 ## Supported Distributions
 
@@ -0,0 +1,28 @@
+# Summary
+
+# Statment of Need
+
+# Problem Formulation
+
+## Principal Component Analysis
+
+## Exponential Family Principal Component Analysis
+
+### Poisson
+
+- math
+
+- Example: Belief Compression
+
+### Bernoulli
+
+- math
+
+- example for survey data w/ binary noise (e.g., yes no question set) w/ API usage
+
+### Gamma
+
+- math
+
+- Example: Ultrasound Denoising
+  TODO: follow the PCA denoising guide in the princeton tutorial