Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Linesearch and Quasi Newton allocations #335

Merged
merged 23 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2810598
Resolving #333 point 6, linesearch_backtrack! can now be used and is …
kellertuer Dec 16, 2023
d3812fb
Fix #333 point 1, 2, 5, and one further memory to use.
kellertuer Dec 16, 2023
55c5456
documentation formatting.
kellertuer Dec 16, 2023
5745c6d
IN theory resolves #333 points 4 and 7, in practice there seems to be…
kellertuer Dec 16, 2023
1b30f2f
first back to copying eta – then tests from before work again.
kellertuer Dec 16, 2023
ca7b1ef
Start changelog entry.
kellertuer Dec 16, 2023
d0a6e39
partial fix for Circle
mateuszbaran Dec 16, 2023
f79d857
Merge branch 'kellertuer/fix-linesearch-allocations' of github.com:Ju…
kellertuer Dec 16, 2023
bc795ae
rand -> allocate_result
mateuszbaran Dec 16, 2023
5175979
Fix a positional argument bug.
kellertuer Dec 16, 2023
590303d
Merge branch 'kellertuer/fix-linesearch-allocations' of github.com:Ju…
kellertuer Dec 16, 2023
070b2b4
fix typo
mateuszbaran Dec 16, 2023
72118ae
Merge remote-tracking branch 'origin/kellertuer/fix-linesearch-alloca…
mateuszbaran Dec 16, 2023
802ef16
Merge branch 'master' into kellertuer/fix-linesearch-allocations
mateuszbaran Dec 16, 2023
36cb062
copy -> allocate
mateuszbaran Dec 16, 2023
a6015be
fix qN direction update
mateuszbaran Dec 16, 2023
4340a51
save one allocation
mateuszbaran Dec 16, 2023
417f956
Adapt Documentation.
kellertuer Dec 17, 2023
a92e8a7
Work on Test Coverage.
kellertuer Dec 18, 2023
42d22b4
Fix deps.
kellertuer Dec 25, 2023
34dcad8
Merge branch 'master' into kellertuer/fix-linesearch-allocations
kellertuer Dec 26, 2023
c683eba
Bump dependencies.
kellertuer Dec 27, 2023
525d4f7
bump deps.
kellertuer Dec 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

* Introduce `sub_kwargs` and `sub_stopping_criterion` for `trust_regions` as noticed in [#336](https://github.com/JuliaManifolds/Manopt.jl/discussions/336)
* Introduce `sub_kwargs` and `sub_stopping_criterion` for `trust_regions` as noticed in [#336](https://

### Changed

* `WolfePowellLineSearch`, `ArmijoLineSearch` step sizes now allocate less
* `linesearch_backtrack!` is now available
* Quasi Newton Updates can work inplace of a direction vector as well.
* Faster `safe_indices` in L-BFGS.

## [0.4.44] December 12, 2023
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ JuMP = "1.15"
LinearAlgebra = "1.6"
LRUCache = "1.4"
ManifoldDiff = "0.3.8"
Manifolds = "0.9"
Manifolds = "0.9.11"
ManifoldsBase = "0.15"
ManoptExamples = "0.1.4"
Markdown = "1.6"
Expand Down
125 changes: 94 additions & 31 deletions src/plans/quasi_newton_plan.jl
Original file line number Diff line number Diff line change
Expand Up @@ -287,33 +287,63 @@ InverseBroyden(φ::Float64) = InverseBroyden(φ, :constant)
@doc raw"""
QuasiNewtonMatrixDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate

These [`AbstractQuasiNewtonDirectionUpdate`](@ref)s represent any quasi-Newton update rule, where the operator is stored as a matrix. A distinction is made between the update of the approximation of the Hessian, ``H_k \mapsto H_{k+1}``, and the update of the approximation of the Hessian inverse, ``B_k \mapsto B_{k+1}``. For the first case, the coordinates of the search direction ``η_k`` with respect to a basis ``\{b_i\}^{n}_{i=1}`` are determined by solving a linear system of equations, i.e.
The `QuasiNewtonMatrixDirectionUpdate` representa a quasi-Newton update rule,
where the operator is stored as a matrix. A distinction is made between the update of the
approximation of the Hessian, ``H_k \mapsto H_{k+1}``, and the update of the approximation
of the Hessian inverse, ``B_k \mapsto B_{k+1}``.
For the first case, the coordinates of the search direction ``η_k`` with respect to
a basis ``\{b_i\}^{n}_{i=1}`` are determined by solving a linear system of equations

```math
\text{Solve} \quad \hat{η_k} = - H_k \widehat{\operatorname{grad}f(x_k)}
\text{Solve} \quad \hat{η_k} = - H_k \widehat{\operatorname{grad}f(x_k)},
```

where ``H_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}`` and ``\widehat{\operatorname{grad}f(x_k)}`` represents the coordinates of the gradient of the objective function ``f`` in ``x_k`` with respect to the basis ``\{b_i\}^{n}_{i=1}``.
If a method is chosen where Hessian inverse is approximated, the coordinates of the search direction ``η_k`` with respect to a basis ``\{b_i\}^{n}_{i=1}`` are obtained simply by matrix-vector multiplication, i.e.
where ``H_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}``
and ``\widehat{\operatorname{grad}f(x_k)}`` represents the coordinates of the gradient of
the objective function ``f`` in ``x_k`` with respect to the basis ``\{b_i\}^{n}_{i=1}``.
If a method is chosen where Hessian inverse is approximated, the coordinates of the search
direction ``η_k`` with respect to a basis ``\{b_i\}^{n}_{i=1}`` are obtained simply by
matrix-vector multiplication

```math
\hat{η_k} = - B_k \widehat{\operatorname{grad}f(x_k)}
\hat{η_k} = - B_k \widehat{\operatorname{grad}f(x_k)},
```

where ``B_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}`` and ``\widehat{\operatorname{grad}f(x_k)}`` as above. In the end, the search direction ``η_k`` is generated from the coordinates ``\hat{eta_k}`` and the vectors of the basis ``\{b_i\}^{n}_{i=1}`` in both variants.
The [`AbstractQuasiNewtonUpdateRule`](@ref) indicates which quasi-Newton update rule is used. In all of them, the Euclidean update formula is used to generate the matrix ``H_{k+1}`` and ``B_{k+1}``, and the basis ``\{b_i\}^{n}_{i=1}`` is transported into the upcoming tangent space ``T_{x_{k+1}} \mathcal{M}``, preferably with an isometric vector transport, or generated there.
where ``B_k`` is the matrix representing the operator with respect to the basis ``\{b_i\}^{n}_{i=1}``
and ``\widehat{\operatorname{grad}f(x_k)}`` as above. In the end, the search direction ``η_k``
is generated from the coordinates ``\hat{eta_k}`` and the vectors of the basis ``\{b_i\}^{n}_{i=1}``
in both variants.
The [`AbstractQuasiNewtonUpdateRule`](@ref) indicates which quasi-Newton update rule is used.
In all of them, the Euclidean update formula is used to generate the matrix ``H_{k+1}``
and ``B_{k+1}``, and the basis ``\{b_i\}^{n}_{i=1}`` is transported into the upcoming tangent
space ``T_{x_{k+1}} \mathcal{M}``, preferably with an isometric vector transport, or generated there.

# Provided functors

* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction
* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction inplace of `η`

# Fields
* `update` – a [`AbstractQuasiNewtonUpdateRule`](@ref).
* `basis` the basis.
* `matrix` (`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`)

* `basis` an `AbstractBasis` to use in the tangent spaces
* `matrix` (`Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M))`)
the matrix which represents the approximating operator.
* `scale` – (`true) indicates whether the initial matrix (= identity matrix) should be scaled before the first update.
* `vector_transport_method` – (`vector_transport_method`)an `AbstractVectorTransportMethod`
* `scale` (`true) indicates whether the initial matrix (= identity matrix) should be scaled before the first update.
* `update` a [`AbstractQuasiNewtonUpdateRule`](@ref).
* `vector_transport_method` (`vector_transport_method`)an `AbstractVectorTransportMethod`

# Constructor
QuasiNewtonMatrixDirectionUpdate(M::AbstractManifold, update, basis, matrix;
scale=true, vector_transport_method=default_vector_transport_method(M))
QuasiNewtonMatrixDirectionUpdate(
M::AbstractManifold,
update,
basis::B=DefaultOrthonormalBasis(),
m=Matrix{Float64}(I, manifold_dimension(M), manifold_dimension(M));
kwargs...
)

## Keyword Arguments

* `scale`, `vector_transport_method` for the two fields above

Generate the Update rule with defaults from a manifold and the names corresponding to the fields above.

Expand Down Expand Up @@ -362,43 +392,63 @@ function QuasiNewtonMatrixDirectionUpdate(
basis, m, scale, update, vector_transport_method
)
end
function (d::QuasiNewtonMatrixDirectionUpdate)(mp, st)
r = zero_vector(get_manifold(mp), get_iterate(st))
return d(r, mp, st)
end
function (d::QuasiNewtonMatrixDirectionUpdate{T})(
mp, st
r, mp, st
) where {T<:Union{InverseBFGS,InverseDFP,InverseSR1,InverseBroyden}}
M = get_manifold(mp)
p = get_iterate(st)
X = get_gradient(st)
return get_vector(M, p, -d.matrix * get_coordinates(M, p, X, d.basis), d.basis)
get_vector!(M, r, p, -d.matrix * get_coordinates(M, p, X, d.basis), d.basis)
return r
end
function (d::QuasiNewtonMatrixDirectionUpdate{T})(
mp, st
r, mp, st
) where {T<:Union{BFGS,DFP,SR1,Broyden}}
M = get_manifold(mp)
p = get_iterate(st)
X = get_gradient(st)
return get_vector(M, p, -d.matrix \ get_coordinates(M, p, X, d.basis), d.basis)
get_vector!(M, r, p, -d.matrix \ get_coordinates(M, p, X, d.basis), d.basis)
return r
end
@doc raw"""
QuasiNewtonLimitedMemoryDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate

This [`AbstractQuasiNewtonDirectionUpdate`](@ref) represents the limited-memory Riemannian BFGS update, where the approximating operator is represented by ``m`` stored pairs of tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` in the ``k``-th iteration.
For the calculation of the search direction ``η_k``, the generalisation of the two-loop recursion is used (see [Huang, Gallican, Absil, SIAM J. Optim., 2015](@cite HuangGallivanAbsil:2015)), since it only requires inner products and linear combinations of tangent vectors in ``T_{x_k} \mathcal{M}``. For that the stored pairs of tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, the gradient ``\operatorname{grad}f(x_k)`` of the objective function ``f`` in ``x_k`` and the positive definite self-adjoint operator
This [`AbstractQuasiNewtonDirectionUpdate`](@ref) represents the limited-memory Riemannian BFGS update,
where the approximating operator is represented by ``m`` stored pairs of tangent
vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` in the ``k``-th iteration.
For the calculation of the search direction ``η_k``, the generalisation of the two-loop recursion
is used (see [Huang, Gallican, Absil, SIAM J. Optim., 2015](@cite HuangGallivanAbsil:2015)),
since it only requires inner products and linear combinations of tangent vectors in ``T_{x_k} \mathcal{M}``.
For that the stored pairs of tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``,
the gradient ``\operatorname{grad}f(x_k)`` of the objective function ``f`` in ``x_k``
and the positive definite self-adjoint operator

```math
\mathcal{B}^{(0)}_k[⋅] = \frac{g_{x_k}(s_{k-1}, y_{k-1})}{g_{x_k}(y_{k-1}, y_{k-1})} \; \mathrm{id}_{T_{x_k} \mathcal{M}}[⋅]
```

are used. The two-loop recursion can be understood as that the [`InverseBFGS`](@ref) update is executed ``m`` times in a row on ``\mathcal{B}^{(0)}_k[⋅]`` using the tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, and in the same time the resulting operator ``\mathcal{B}^{LRBFGS}_k [⋅]`` is directly applied on ``\operatorname{grad}f(x_k)``.
are used. The two-loop recursion can be understood as that the [`InverseBFGS`](@ref) update
is executed ``m`` times in a row on ``\mathcal{B}^{(0)}_k[⋅]`` using the tangent vectors ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}``, and in the same time the resulting operator ``\mathcal{B}^{LRBFGS}_k [⋅]`` is directly applied on ``\operatorname{grad}f(x_k)``.
When updating there are two cases: if there is still free memory, i.e. ``k < m``, the previously stored vector pairs ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m}^{k-1}`` have to be transported into the upcoming tangent space ``T_{x_{k+1}} \mathcal{M}``; if there is no free memory, the oldest pair ``\{ \widetilde{s}_{k−m}, \widetilde{y}_{k−m}\}`` has to be discarded and then all the remaining vector pairs ``\{ \widetilde{s}_i, \widetilde{y}_i\}_{i=k-m+1}^{k-1}`` are transported into the tangent space ``T_{x_{k+1}} \mathcal{M}``. After that we calculate and store ``s_k = \widetilde{s}_k = T^{S}_{x_k, α_k η_k}(α_k η_k)`` and ``y_k = \widetilde{y}_k``. This process ensures that new information about the objective function is always included and the old, probably no longer relevant, information is discarded.

# Provided functors

* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction
* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction inplace of `η`

# Fields
* `memory_s` – the set of the stored (and transported) search directions times step size ``\{ \widetilde{s}_i\}_{i=k-m}^{k-1}``.
* `memory_y` – set of the stored gradient differences ``\{ \widetilde{y}_i\}_{i=k-m}^{k-1}``.
* `ξ` – a variable used in the two-loop recursion.
* `ρ` – a variable used in the two-loop recursion.
* `scale` –
* `vector_transport_method` – a `AbstractVectorTransportMethod`
* `message` – a string containing a potential warning that might have appeared

* `memory_s` the set of the stored (and transported) search directions times step size ``\{ \widetilde{s}_i\}_{i=k-m}^{k-1}``.
* `memory_y` set of the stored gradient differences ``\{ \widetilde{y}_i\}_{i=k-m}^{k-1}``.
* `ξ` a variable used in the two-loop recursion.
* `ρ` a variable used in the two-loop recursion.
* `scale` initial scaling of the Hessian
* `vector_transport_method` a `AbstractVectorTransportMethod`
* `message` a string containing a potential warning that might have appeared

# Constructor
QuasiNewtonLimitedMemoryDirectionUpdate(
Expand All @@ -409,7 +459,7 @@ When updating there are two cases: if there is still free memory, i.e. ``k < m``
initial_vector=zero_vector(M,x),
scale=1.0
project=true
)
)

# See also

Expand Down Expand Up @@ -468,12 +518,19 @@ function status_summary(d::QuasiNewtonLimitedMemoryDirectionUpdate{T}) where {T}
return s
end
function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(mp, st)
r = zero_vector(get_manifold(mp), get_iterate(st))
return d(r, mp, st)
end
function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(r, mp, st)
isempty(d.message) || (d.message = "") # reset message
M = get_manifold(mp)
p = get_iterate(st)
r = copy(M, p, get_gradient(st))
copyto!(M, r, p, get_gradient(st))
m = length(d.memory_s)
m == 0 && return -r
if m == 0
r .*= -1
return r
end
for i in m:-1:1
# what if we divide by zero here? Setting to zero ignores this in the next step
# precompute in case inner is expensive
Expand Down Expand Up @@ -540,6 +597,11 @@ If [`InverseBFGS`](@ref) or [`InverseBFGS`](@ref) is chosen as update, then the
method follows the method of [Huang, Absil, Gallivan, SIAM J. Optim., 2018](@cite HuangAbsilGallivan:2018), taking into account that
the corresponding step size is chosen.

# Provided functors

* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction
* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction inplace of `η`

# Fields

* `update` – an [`AbstractQuasiNewtonDirectionUpdate`](@ref)
Expand Down Expand Up @@ -572,6 +634,7 @@ function QuasiNewtonCautiousDirectionUpdate(
return QuasiNewtonCautiousDirectionUpdate{U}(update, θ)
end
(d::QuasiNewtonCautiousDirectionUpdate)(mp, st) = d.update(mp, st)
(d::QuasiNewtonCautiousDirectionUpdate)(r, mp, st) = d.update(r, mp, st)

# access the inner vector transport method
function get_update_vector_transport(u::AbstractQuasiNewtonDirectionUpdate)
Expand Down
Loading