Skip to content

Commit 69fcdb4

Browse files
paper updates
1 parent 0b7791d commit 69fcdb4

File tree

1 file changed

+87
-4
lines changed

1 file changed

+87
-4
lines changed

paper.md

+87-4
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,97 @@ where
129129
* and $F(\mu)$ is the **convex conjugate** of $G$.
130130

131131

132-
PCA is a special case of EPCA when the data is Gaussian (see [appendix](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/appendix/gaussian/)). By selecting the appropriate function $G$, EPCA can handle a wider range of data types, offering more versatility than PCA. Then $\theta_i = g(a_i V)$ and
132+
PCA is a special case of EPCA when the data is Gaussian (see [appendix](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/appendix/gaussian/)). By selecting the appropriate function $G$, EPCA can handle a wider range of data types, offering more versatility than PCA. Then
133133

134+
$$
135+
x_i \approx \theta_i = g(a_i V).
134136
$$
135-
a = \argmin B
136-
$$
137+
138+
### Regularization
139+
140+
The optimum may diverge, so we introduce a regularization term
141+
142+
$$\begin{aligned}
143+
& \underset{\Theta}{\text{minimize}}
144+
& & B_F(X \| g(\Theta)) + \epsilon B_F(\mu_0 \| g(\Theta)) \\
145+
& \text{subject to}
146+
& & \mathrm{rank}\left(\Theta\right) = k
147+
\end{aligned}$$
148+
149+
where $\epsilon > 0$ and $\mu_0 \in \mathrm{range}(g)$ to ensure the solution is stationary.
150+
151+
152+
### Example: Gamma EPCA
153+
154+
old faithful stuff here
155+
156+
![](./scripts/faithful_graphs/eruptions_plot.png)
157+
158+
### Example: Poisson EPCA
159+
160+
The Poisson EPCA objective is the generalized Kullback-Leibler (KL) divergence (see [appendix](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/appendix/poisson/)), making Poisson EPCA ideal for compressing discrete distribution data.
161+
162+
Add some blurb about how Poisson EPCA is then an alternative to correspondance analysis.
163+
164+
This is useful in applications like belief compression in reinforcement learning [@Roy], where high-dimensional belief states can be effectively reduced with minimal information loss. Below we recreate a figure from @shortRoy and observe that Poisson EPCA achieved a nearly perfect reconstruction of a $41$-dimensional belief profile using just $5$ basis components.
165+
166+
![](./scripts/kl_divergence_plot.png)
167+
168+
For a larger environment with $200$ states, PCA struggles even with $10$ basis.
169+
170+
![](./scripts/reconstructions.png)
171+
172+
# API
173+
174+
## Supported Distributions
175+
176+
`ExpFamilyPCA.jl` includes efficient EPCA implementations for several exponential family distributions.
177+
178+
| Julia | Description |
179+
|---------------------------|--------------------------------------------------------|
180+
| `BernoulliEPCA` | For binary data |
181+
| `BinomialEPCA` | For count data with a fixed number of trials |
182+
| `ContinuousBernoulliEPCA` | For modeling probabilities between $0$ and $1$ |
183+
| `GammaEPCA` | For positive continuous data |
184+
| `GaussianEPCA` | Standard PCA for real-valued data |
185+
| `NegativeBinomialEPCA` | For over-dispersed count data |
186+
| `ParetoEPCA` | For modeling heavy-tailed distributions |
187+
| `PoissonEPCA` | For count and discrete distribution data |
188+
| `WeibullEPCA` | For modeling life data and survival analysis |
189+
190+
## Custom Distributions
191+
192+
When working with custom distributions, certain specifications are often more convenient and computationally efficient than others. For example, inducing the gamma EPCA objective from the log-partition $G(\theta) = -\log(-\theta)$ and its derivative $g(\theta) = -1/\theta$ is much simpler than implementing the full the Itakura-Saito distance [@ItakuraSaito] (see [appendix](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/appendix/gamma/)):
137193

138194
$$
139-
a_i \in \mathrm{argmin} B_F
195+
D(P(\omega), \hat{P}(\omega)) =\frac{1}{2\pi} \int_{-\pi}^{\pi} \Bigg[ \frac{P(\omega)}{\hat{P}(\omega)} - \log \frac{P(\omega)}{\hat{P}{\omega}} - 1\Bigg] \, d\omega.
140196
$$
141197

198+
In `ExpFamilyPCA.jl`, we would write:
199+
200+
```julia
201+
G(θ) = -log(-θ)
202+
g(θ) = -1 / θ
203+
gamma_epca = EPCA(indim, outdim, G, g, Val((:G, :g)); options = NegativeDomain())
204+
```
205+
206+
A lengthier discussion of the `EPCA` constructors and math is provided in the [documentation](https://sisl.github.io/ExpFamilyPCA.jl/dev/math/objectives/).
207+
208+
## Usage
209+
210+
Each `EPCA` object supports a three-method interface: `fit!`, `compress`, and `decompress`. `fit!` trains the model and returns the compressed training data; `compress` returns compressed input; and `decompress` reconstructs the original data from the compressed representation.
211+
212+
```julia
213+
X = sample_from_gamma(n1, indim)
214+
Y = sample_from_gamma(n2, indim)
215+
216+
X_compressed = fit!(gamma_epca, X)
217+
Y_compressed = compress(gamma_epca, Y)
218+
Y_reconstructed = decompress(gamma_epca, Y_compressed)
219+
```
220+
221+
# Acknowledgments
222+
223+
We thank Ryan Tibshirani, Arec Jamgochian, Robert Moss, and Dylan Asmar for their help and guidance.
224+
142225
# References

0 commit comments

Comments
 (0)