You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Transition to ImplicitDiff v0.8
* Add projection correctness test
* Clean up
* Fixes
* Allow specifying x0
* Different sizes
* Fix
* Show for debugging
* Show
* Make tests pass
Reference: <https://arxiv.org/abs/2105.15183> (section 2 + end of appendix A).
71
+
- `f`: function `f(x, θ)` to minimize with respect to `x`
72
+
- `f_grad1`: gradient `∇ₓf(x, θ)` of `f` with respect to `x`
73
+
- `lmo`: linear minimization oracle `θ -> argmin_{x ∈ C} θᵀx` from [FrankWolfe.jl](https://github.com/ZIB-IOL/FrankWolfe.jl), implicitly defines the convex set `C`
74
+
- `alg`: optimization algorithm from [FrankWolfe.jl](https://github.com/ZIB-IOL/FrankWolfe.jl), must return an `active_set`
75
+
- `implicit_kwargs`: keyword arguments passed to the `ImplicitFunction` object from [ImplicitDifferentiation.jl](https://github.com/gdalle/ImplicitDifferentiation.jl)
28
76
29
-
# Fields
77
+
# References
30
78
31
-
- `f`: function `f(x, θ)` to minimize wrt `x`
32
-
- `f_grad1`: gradient `∇ₓf(x, θ)` of `f` wrt `x`
33
-
- `lmo`: linear minimization oracle `θ -> argmin_{x ∈ C} θᵀx` from [FrankWolfe.jl], implicitly defines the convex set `C`
34
-
- `alg`: optimization algorithm from [FrankWolfe.jl](https://github.com/ZIB-IOL/FrankWolfe.jl)
35
-
- `implicit`: implicit function from [ImplicitDifferentiation.jl](https://github.com/gdalle/ImplicitDifferentiation.jl)
79
+
> [Efficient and Modular Implicit Differentiation](https://proceedings.neurips.cc/paper_files/paper/2022/hash/228b9279ecf9bbafe582406850c57115-Abstract-Conference.html), Blondel et al. (2022)
Apply the Frank-Wolfe algorithm to `θ` with settings defined by the named tuple `frank_wolfe_kwargs` (given as a positional argument).
101
+
Apply the differentiable Frank-Wolfe algorithm defined by `dfw` to parameter `θ` with starting point `x0`.
102
+
Keyword arguments are passed on to the Frank-Wolfe algorithm inside `dfw`.
63
103
64
104
Return a couple (x, stats) where `x` is the solution and `stats` is a named tuple containing additional information (its contents are not covered by public API, and mostly useful for debugging).
65
105
"""
66
-
function (dfw::DiffFW)(θ::AbstractArray, frank_wolfe_kwargs=NamedTuple())
Copy file name to clipboardExpand all lines: src/simplex_projection.jl
+9-12Lines changed: 9 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -5,29 +5,26 @@ Compute the Euclidean projection of the vector `z` onto the probability simplex.
5
5
6
6
This function is differentiable thanks to a custom chain rule.
7
7
8
-
Reference: <https://arxiv.org/abs/1602.02068>.
8
+
# References
9
+
10
+
> [From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification](https://proceedings.mlr.press/v48/martins16.html), Martins and Astudillo (2016)
Compute the Euclidean projection `p` of `z` on the probability simplex as well as the indicators `s` of its support, which are useful for differentiation.
0 commit comments