diff --git a/.github/workflows/CompatHelper.yml b/.github/workflows/CompatHelper.yml index 6f039105ee..9e00a1f0bd 100644 --- a/.github/workflows/CompatHelper.yml +++ b/.github/workflows/CompatHelper.yml @@ -17,8 +17,8 @@ jobs: shell: julia --color=yes {0} - name: "Run CompatHelper" run: | - import CompatHelper - CompatHelper.main() + using CompatHelper + CompatHelper.main(; subdirs = ["", "docs", "tutorials"]) shell: julia --color=yes {0} env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/documenter.yml b/.github/workflows/documenter.yml index 9a60d03b47..2c1795093c 100644 --- a/.github/workflows/documenter.yml +++ b/.github/workflows/documenter.yml @@ -50,7 +50,14 @@ jobs: restore-keys: | ${{ runner.os }}-${{ env.cache-name }}- - name: "Documenter rendering (including Quarto)" - run: "docs/make.jl --quarto" + run: "docs/make.jl --quarto --prettyurls" + - name: "vale.sh spell check" + uses: errata-ai/vale-action@reviewdog + with: + files: docs/src + fail_on_error: true + filter_mode: nofilter + vale_flags: "--config=docs/.vale.ini" env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} diff --git a/.gitignore b/.gitignore index af0a1746a2..c1ce6ebd72 100644 --- a/.gitignore +++ b/.gitignore @@ -21,3 +21,4 @@ docs/.CondaPkg docs/src/tutorials/Optimize!_files docs/src/tutorials/*.html docs/src/changelog.md +docs/styles/Google diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 81126b7757..ec16d18771 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -5,7 +5,7 @@ Any contribution is appreciated and welcome. The following is a set of guidelines to [`Manopt.jl`](https://juliamanifolds.github.io/Manopt.jl/). -#### Table of Contents +#### Table of contents - [Contributing to `Manopt.jl`](#Contributing-to-manoptjl) - [Table of Contents](#Table-of-Contents) @@ -32,7 +32,7 @@ If you found a bug or want to propose a feature, we track our issues within the ### Add a missing method There is still a lot of methods for within the optimization framework of `Manopt.jl`, may it be functions, gradients, differentials, proximal maps, step size rules or stopping criteria. -If you notice a method missing and can contribute an implementation, please do so! +If you notice a method missing and can contribute an implementation, please do so, we help with the necessary details. Even providing a single new method is a good contribution. ### Provide a new algorithm @@ -55,7 +55,7 @@ where also their reproducible Quarto-Markdown files are stored. ### Code style We try to follow the [documentation guidelines](https://docs.julialang.org/en/v1/manual/documentation/) from the Julia documentation as well as [Blue Style](https://github.com/invenia/BlueStyle). -We run [`JuliaFormatter.jl`](https://github.com/domluna/JuliaFormatter.jl) on the repo in the way set in the `.JuliaFormatter.toml` file, which enforces a number of conventions consistent with the Blue Style. +We run [`JuliaFormatter.jl`](https://github.com/domluna/JuliaFormatter.jl) on the repository in the way set in the `.JuliaFormatter.toml` file, which enforces a number of conventions consistent with the Blue Style. We also follow a few internal conventions: @@ -68,5 +68,5 @@ We also follow a few internal conventions: - There should be no dangling `=` signs. - Always add a newline between things of different types (struct/method/const). - Always add a newline between methods for different functions (including mutating/nonmutating variants). -- Prefer to have no newline between methods for the same function; when reasonable, merge the docstrings. +- Prefer to have no newline between methods for the same function; when reasonable, merge the documentation strings. - All `import`/`using`/`include` should be in the main module file. diff --git a/Changelog.md b/Changelog.md index 762bdb9656..aa4a5f32b7 100644 --- a/Changelog.md +++ b/Changelog.md @@ -5,27 +5,27 @@ All notable Changes to the Julia package `Manopt.jl` will be documented in this The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unreleased] +## [0.4.42] - November 6, 2023 ### Added * add `Manopt.JuMP_Optimizer` implementing JuMP's solver interface -## [0.4.41] - 02/11/2023 +## [0.4.41] - November 2, 2023 ### Changed -– `trust_regions` is now more flexible and the sub solver (Steinhaug-Toint tCG by default) +* `trust_regions` is now more flexible and the sub solver (Steihaug-Toint tCG by default) can now be exchanged. -- `adaptive_regularization_with_cubics` is now more flexible as well, where it previously was a bit too +* `adaptive_regularization_with_cubics` is now more flexible as well, where it previously was a bit too much tightened to the Lanczos solver as well. -- Unified documentation notation and bumped dependencies to use DocumenterCitations 1.3 +* Unified documentation notation and bumped dependencies to use DocumenterCitations 1.3 -## [0.4.40] – 24/10/2023 +## [0.4.40] - October 24, 2023 ### Added -* add a `--help` argument to `docs/make.jl` to document all availabel command line arguments +* add a `--help` argument to `docs/make.jl` to document all available command line arguments * add a `--exclude-tutorials` argument to `docs/make.jl`. This way, when quarto is not available on a computer, the docs can still be build with the tutorials not being added to the menu such that documenter does not expect them to exist. @@ -36,14 +36,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * move the ARC CG subsolver to the main package, since `TangentSpace` is now already available from `ManifoldsBase`. -## [0.4.39] – 09/10/2023 +## [0.4.39] - October 9, 2023 ### Changes * also use the pair of a retraction and the inverse retraction (see last update) to perform the relaxation within the Douglas-Rachford algorithm. -## [0.4.38] – 08/10/2023 +## [0.4.38] - October 8, 2023 ### Changes @@ -53,7 +53,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * Fix a lot of typos in the documentation -## [0.4.37] – 28/09/2023 +## [0.4.37] - September 28, 2023 ### Changes @@ -62,67 +62,66 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * generalize the internal reflection of Douglas-Rachford, such that is also works with an arbitrary pair of a reflection and an inverse reflection. -## [0.4.36] – 20/09/2023 +## [0.4.36] - September 20, 2023 ### Fixed -* Fixed a bug that caused non-matrix points and vectors to fail when working with approcimate +* Fixed a bug that caused non-matrix points and vectors to fail when working with approximate -## [0.4.35] – 14/09/2023 +## [0.4.35] - September 14, 2023 ### Added -* The access to functions of the objective is now unified and encapsulated in proper `get_` - functions. +* The access to functions of the objective is now unified and encapsulated in proper `get_` functions. -## [0.4.34] – 02/09/2023 +## [0.4.34] - September 02, 2023 ### Added -* an `ManifoldEuclideanGradientObjetive` to allow the cost, gradient, and Hessian and other +* an `ManifoldEuclideanGradientObjective` to allow the cost, gradient, and Hessian and other first or second derivative based elements to be Euclidean and converted when needed. * a keyword `objective_type=:Euclidean` for all solvers, that specifies that an Objective shall be created of the above type -## [0.4.33] - 24/08/2023 +## [0.4.33] - August 24, 2023 ### Added * `ConstantStepsize` and `DecreasingStepsize` now have an additional field `type::Symbol` to assess whether the step-size should be relatively (to the gradient norm) or absolutely constant. -## [0.4.32] - 23/08/2023 +## [0.4.32] - August 23, 2023 ### Added * The adaptive regularization with cubics (ARC) solver. -## [0.4.31] - 14/08/2023 +## [0.4.31] - August 14, 2023 ### Added * A `:Subsolver` keyword in the `debug=` keyword argument, that activates the new `DebugWhenActive`` to de/activate subsolver debug from the main solvers `DebugEvery`. -## [0.4.30] - 03/08/2023 +## [0.4.30] - August 3, 2023 ### Changed * References in the documentation are now rendered using [DocumenterCitations.jl](https://github.com/JuliaDocs/DocumenterCitations.jl) * Asymptote export now also accepts a size in pixel instead of its default `4cm` size and `render` can be deactivated setting it to `nothing`. -## [0.4.29] - 12/07/2023 +## [0.4.29] - July 12, 2023 ### Fixed * fixed a bug, where `cyclic_proximal_point` did not work with decorated objectives. -## [0.4.28] - 24/06/2023 +## [0.4.28] - June 24, 2023 ### Changed * `max_stepsize` was specialized for `FixedRankManifold` to follow Matlab Manopt. -## [0.4.27] - 15/06/2023 +## [0.4.27] - June 15, 2023 ### Added @@ -134,7 +133,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 `initial_jacobian_f` also as keyword arguments, such that their default initialisations can be adapted, if necessary -## [0.4.26] - 11/06/2023 +## [0.4.26] - June 11, 2023 ### Added @@ -142,13 +141,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * add a `get_state` function * document `indicates_convergence`. -## [0.4.25] - 05/06/2023 +## [0.4.25] - June 5, 2023 ### Fixed * Fixes an allocation bug in the difference of convex algorithm -## [0.4.24] - 04/06/2023 +## [0.4.24] - June 4, 2023 ### Added @@ -158,7 +157,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * bump dependencies since the extension between Manifolds.jl and ManifoldsDiff.jl has been moved to Manifolds.jl -## [0.4.23] - 04/06/2023 +## [0.4.23] - June 4, 2023 ### Added @@ -168,13 +167,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * loosen constraints slightly -## [0.4.22] - 31/05/2023 +## [0.4.22] - May 31, 2023 ### Added * A tutorial on how to implement a solver -## [0.4.21] - 22/05/2023 +## [0.4.21] - May 22, 2023 ### Added @@ -187,55 +186,56 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * change solvers on the mid level (form `solver(M, objective, p)`) to also accept decorated objectives ### Changed + * Switch all Requires weak dependencies to actual weak dependencies starting in Julia 1.9 -## [0.4.20] - 11/05/2023 +## [0.4.20] - May 11, 2023 ### Changed * the default tolerances for the numerical `check_` functions were loosened a bit, such that `check_vector` can also be changed in its tolerances. -## [0.4.19] - 07/05/2023 +## [0.4.19] - May 7, 2023 ### Added -* the sub solver for `trust_regions` is now customizable, i.e. can be exchanged. +* the sub solver for `trust_regions` is now customizable and can now be exchanged. ### Changed * slightly changed the definitions of the solver states for ALM and EPM to be type stable -## [0.4.18] - 04/05/2023 +## [0.4.18] - May 4, 2023 ### Added * A function `check_Hessian(M, f, grad_f, Hess_f)` to numerically check the (Riemannian) Hessian of a function `f` -## [0.4.17] - 28/04/2023 +## [0.4.17] - April 28, 2023 ### Added * A new interface of the form `alg(M, objective, p0)` to allow to reuse - objectives without creating `AbstractManoptSolverState`s and calling `solve!`. This especially still allows for any decoration of the objective and/or the state using e.g. `debug=`, or `record=`. + objectives without creating `AbstractManoptSolverState`s and calling `solve!`. This especially still allows for any decoration of the objective and/or the state using `debug=`, or `record=`. ### Changed -* All solvers now have the initial point `p` as an optional parameter making it more accessible to first time users, e.g. `gradient_descent(M, f, grad_f)` +* All solvers now have the initial point `p` as an optional parameter making it more accessible to first time users, `gradient_descent(M, f, grad_f)` is equivalent to `gradient_descent(M, f, grad_f, rand(M))` ### Fixed * Unified the framework to work on manifold where points are represented by numbers for several solvers -## [0.4.16] - 18/04/2023 +## [0.4.16] - April 18, 2023 ### Fixed * the inner products used in `truncated_gradient_descent` now also work thoroughly on complex matrix manifolds -## [0.4.15] - 13/04/2023 +## [0.4.15] - April 13, 2023 ### Changed @@ -249,7 +249,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * support for `ManifoldsBase.jl` 0.13.x, since with the definition of `copy(M,p::Number)`, in 0.14.4, we now use that instead of defining it ourselves. -## [0.4.14] - 06/04/2023 +## [0.4.14] - April 06, 2023 ### Changed * `particle_swarm` now uses much more in-place operations @@ -257,15 +257,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed * `particle_swarm` used quite a few `deepcopy(p)` commands still, which were replaced by `copy(M, p)` -## [0.4.13] - 09/04/2023 +## [0.4.13] - April 09, 2023 ### Added * `get_message` to obtain messages from sub steps of a solver * `DebugMessages` to display the new messages in debug -* safeguards in Armijo linesearch and L-BFGS against numerical over- and underflow that report in messages +* safeguards in Armijo line search and L-BFGS against numerical over- and underflow that report in messages -## [0.4.12] - 04/04/2023 +## [0.4.12] - April 4, 2023 ### Added @@ -275,19 +275,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 `difference_of_convex_proximal_point(M, prox_g, grad_h, p0)` * Introduce a `StopWhenGradientChangeLess` stopping criterion -## [0.4.11] - 27/04/2023 +## [0.4.11] - March 27, 2023 ### Changed * adapt tolerances in tests to the speed/accuracy optimized distance on the sphere in `Manifolds.jl` (part II) -## [0.4.10] - 26/04/2023 +## [0.4.10] - March 26, 2023 ### Changed * adapt tolerances in tests to the speed/accuracy optimized distance on the sphere in `Manifolds.jl` -## [0.4.9] – 03/03/2023 +## [0.4.9] - March 3, 2023 ### Added @@ -295,7 +295,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 to be used within Manopt.jl, introduce the [manoptjl.org/stable/extensions/](https://manoptjl.org/stable/extensions/) page to explain the details. -## [0.4.8] - 21/02/2023 +## [0.4.8] - February 21, 2023 ### Added @@ -308,26 +308,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * changed the `show` methods of `AbstractManoptSolverState`s to display their `state_summary * Move tutorials to be rendered with Quarto into the documentation. -## [0.4.7] - 14/02/2023 +## [0.4.7] - February 14, 2023 ### Changed -* Bump [compat] entry of ManifoldDiff to also include 0.3 +* Bump `[compat]` entry of ManifoldDiff to also include 0.3 -## [0.4.6] - 03/02/2023 +## [0.4.6] - February 3, 2023 ### Fixed * Fixed a few stopping criteria even indicated to stop before the algorithm started. -## [0.4.5] - 24/01/2023 +## [0.4.5] - January 24, 2023 ### Changed * the new default functions that include `p` are used where possible * a first step towards faster storage handling -## [0.4.4] - 20/01/2023 +## [0.4.4] - January 20, 2023 ### Added @@ -338,29 +338,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 * fix a type in `HestenesStiefelCoefficient` -## [0.4.3] - 17/01/2023 +## [0.4.3] - January 17, 2023 ### Fixed * the CG coefficient `β` can now be complex * fix a bug in `grad_distance` -## [0.4.2] - 16/01/2023 +## [0.4.2] - January 16, 2023 ### Changed -* the usage of `inner` in linesearch methods, such that they work well with +* the usage of `inner` in line search methods, such that they work well with complex manifolds as well -## [0.4.1] - 15/01/2023 +## [0.4.1] - January 15, 2023 ### Fixed * a `max_stepsize` per manifold to avoid leaving the injectivity radius, which it also defaults to -## [0.4.0] - 10/01/2023 +## [0.4.0] - January 10, 2023 ### Added diff --git a/Project.toml b/Project.toml index 23aa3674b4..758df65858 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "Manopt" uuid = "0fc0a36d-df90-57f3-8f93-d78a9fc72bb5" authors = ["Ronny Bergmann "] -version = "0.4.42" +version = "0.4.43" [deps] ColorSchemes = "35d6a980-a343-548e-a6ea-1d62b119f2f4" diff --git a/Readme.md b/Readme.md index 0a781366d8..472ed33285 100644 --- a/Readme.md +++ b/Readme.md @@ -2,7 +2,7 @@ Optimization Algorithm on Riemannian Manifolds. -[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://manoptjl.org/stable) +[![](https://img.shields.io/badge/docs-stable-blue?logo=Julia&logoColor=white)](https://manoptjl.org/stable) [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle) [![CI](https://github.com/JuliaManifolds/Manopt.jl/workflows/CI/badge.svg)](https://github.com/JuliaManifolds/Manopt.jl/actions?query=workflow%3ACI+branch%3Amaster) [![codecov](https://codecov.io/gh/JuliaManifolds/Manopt.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/JuliaManifolds/Manopt.jl) diff --git a/docs/.vale.ini b/docs/.vale.ini new file mode 100644 index 0000000000..1a8fdd41a9 --- /dev/null +++ b/docs/.vale.ini @@ -0,0 +1,15 @@ +StylesPath = styles +MinAlertLevel = warning +Vocab = Manopt + +Packages = Google + +[formats] +qmd = md + +[*.md] +BasedOnStyles = Vale, Google +TokenIgnores = \ + \$.+?\$, \ + \]\(@(ref|id|cite).+?\), \ + ``.+``, diff --git a/docs/make.jl b/docs/make.jl index 922bb75995..1bc8163c57 100755 --- a/docs/make.jl +++ b/docs/make.jl @@ -14,6 +14,7 @@ Arguments this can be used if you do not have Quarto installed to still be able to render the docs locally on this machine. This option should not be set on CI. * `--help` - print this help and exit without rendering the documentation +* `--prettyurls` – toggle the prettyurls part to true (which is otherwise only true on CI) * `--quarto` – run the Quarto notebooks from the `tutorials/` folder before generating the documentation this has to be run locally at least once for the `tutorials/*.md` files to exist that are included in the documentation (see `--exclude-tutorials`) for the alternative. @@ -93,22 +94,24 @@ end ## Build titorials menu tutorials_menu = "How to..." => [ - "Get started: Optimize!" => "tutorials/Optimize!.md", - "Speedup using Inplace computations" => "tutorials/InplaceGradient.md", - "Use Automatic Differentiation" => "tutorials/AutomaticDifferentiation.md", - "Define Objectives in the Embedding" => "tutorials/EmbeddingObjectives.md", - "Count and use a Cache" => "tutorials/CountAndCache.md", - "Print Debug Output" => "tutorials/HowToDebug.md", + "🏔️ Get started: optimize." => "tutorials/Optimize.md", + "Speedup using in-place computations" => "tutorials/InplaceGradient.md", + "Use automatic differentiation" => "tutorials/AutomaticDifferentiation.md", + "Define objectives in the embedding" => "tutorials/EmbeddingObjectives.md", + "Count and use a cache" => "tutorials/CountAndCache.md", + "Print debug output" => "tutorials/HowToDebug.md", "Record values" => "tutorials/HowToRecord.md", - "Implement a Solver" => "tutorials/ImplementASolver.md", - "Do Constrained Optimization" => "tutorials/ConstrainedOptimization.md", - "Do Geodesic Regression" => "tutorials/GeodesicRegression.md", + "Implement a solver" => "tutorials/ImplementASolver.md", + "Optimize on your own manifold" => "tutorials/ImplementOwnManifold.md", + "Do constrained optimization" => "tutorials/ConstrainedOptimization.md", + "Do geodesic regression" => "tutorials/GeodesicRegression.md", ] # (e) ...finally! make docs bib = CitationBibliography(joinpath(@__DIR__, "src", "references.bib"); style=:alpha) makedocs(; format=Documenter.HTML(; - prettyurls=false, assets=["assets/favicon.ico", "assets/citations.css"] + prettyurls=(get(ENV, "CI", nothing) == "true") || ("--prettyurls" ∈ ARGS), + assets=["assets/favicon.ico", "assets/citations.css"], ), modules=[ Manopt, @@ -153,7 +156,7 @@ makedocs(; "Conjugate gradient descent" => "solvers/conjugate_gradient_descent.md", "Cyclic Proximal Point" => "solvers/cyclic_proximal_point.md", "Difference of Convex" => "solvers/difference_of_convex.md", - "Douglas–Rachford" => "solvers/DouglasRachford.md", + "Douglas—Rachford" => "solvers/DouglasRachford.md", "Exact Penalty Method" => "solvers/exact_penalty_method.md", "Frank-Wolfe" => "solvers/FrankWolfe.md", "Gradient Descent" => "solvers/gradient_descent.md", diff --git a/docs/src/about.md b/docs/src/about.md index e38f780134..312df00f22 100644 --- a/docs/src/about.md +++ b/docs/src/about.md @@ -9,7 +9,7 @@ The following people contributed * [Willem Diepeveen](https://www.maths.cam.ac.uk/person/wd292) implemented the [primal-dual Riemannian semismooth Newton](@ref PDRSSNSolver) solver. * Even Stephansen Kjemsås contributed to the implementation of the [Frank Wolfe Method](@ref FrankWolfe) solver * Mathias Ravn Munkvold contributed most of the implementation of the [Adaptive Regularization with Cubics](@ref ARSSection) solver -* [Tom-Christian Riemer](https://www.tu-chemnitz.de/mathematik/wire/mitarbeiter.php) Riemer implemented the [trust regions](@ref trust_regions) and [quasi Newton](solvers/quasi_Newton.md) solvers. +* [Tom-Christian Riemer](https://www.tu-chemnitz.de/mathematik/wire/mitarbeiter.php) implemented the [trust regions](@ref trust_regions) and [quasi Newton](solvers/quasi_Newton.md) solvers. * [Manuel Weiss](https://scoop.iwr.uni-heidelberg.de/author/manuel-weiß/) implemented most of the [conjugate gradient update rules](@ref cg-coeffs) ...as well as various [contributors](https://github.com/JuliaManifolds/Manopt.jl/graphs/contributors) providing small extensions, finding small bugs and mistakes and fixing them by opening [PR](https://github.com/JuliaManifolds/Manopt.jl/pulls)s. @@ -23,14 +23,14 @@ to clone/fork the repository or open an issue. `Manopt.jl` belongs to the Manopt family: -* [manopt.org](https://www.manopt.org) – The Matlab version of Manopt, see also their :octocat: [GitHub repository](https://github.com/NicolasBoumal/manopt) -* [pymanopt.org](https://www.pymanopt.org/) – The Python version of Manopt – providing also several AD backends, see also their :octocat: [GitHub repository](https://github.com/pymanopt/pymanopt) +* [manopt.org](https://www.manopt.org) The Matlab version of Manopt, see also their :octocat: [GitHub repository](https://github.com/NicolasBoumal/manopt) +* [pymanopt.org](https://www.pymanopt.org/) The Python version of Manopt providing also several AD backends, see also their :octocat: [GitHub repository](https://github.com/pymanopt/pymanopt) but there are also more packages providing tools on manifolds: * [Jax Geometry](https://bitbucket.org/stefansommer/jaxgeometry/src/main/) (Python/Jax) for differential geometry and stochastic dynamics with deep learning * [Geomstats](https://geomstats.github.io) (Python with several backends) focusing on statistics and machine learning :octocat: [GitHub repository](https://github.com/geomstats/geomstats) -* [Geoopt](https://geoopt.readthedocs.io/en/latest/) (Python & PyTorch) – Riemannian ADAM & SGD. :octocat: [GitHub repository](https://github.com/geoopt/geoopt) -* [McTorch](https://github.com/mctorch/mctorch) (Python & PyToch) – Riemannian SGD, Adagrad, ASA & CG. +* [Geoopt](https://geoopt.readthedocs.io/en/latest/) (Python & PyTorch) Riemannian ADAM & SGD. :octocat: [GitHub repository](https://github.com/geoopt/geoopt) +* [McTorch](https://github.com/mctorch/mctorch) (Python & PyToch) Riemannian SGD, Adagrad, ASA & CG. * [ROPTLIB](https://www.math.fsu.edu/~whuang2/papers/ROPTLIB.htm) (C++) a Riemannian OPTimization LIBrary :octocat: [GitHub repository](https://github.com/whuang08/ROPTLIB) * [TF Riemopt](https://github.com/master/tensorflow-riemopt) (Python & TensorFlow) Riemannian optimization using TensorFlow diff --git a/docs/src/extensions.md b/docs/src/extensions.md index ad5755d396..9169d63766 100644 --- a/docs/src/extensions.md +++ b/docs/src/extensions.md @@ -45,10 +45,17 @@ x_opt = quasi_Newton( ) ``` -### Manifolds.jl +In general this defines the following new [stepsize](@ref Stepsize) ```@docs Manopt.LineSearchesStepsize +``` + +## Manifolds.jl + +Loading `Manifolds.jl` introduces the following additional functions + +```@docs mid_point Manopt.max_stepsize(::TangentBundle, ::Any) Manopt.max_stepsize(::FixedRankMatrices, ::Any) @@ -57,7 +64,8 @@ Manopt.max_stepsize(::FixedRankMatrices, ::Any) ## JuMP.jl Manopt can be used using the [JuMP.jl](https://github.com/jump-dev/JuMP.jl) interface. -The manifold is provided in the `@variable` macro. Note that until now, only variables (points on manifolds) are supported, that are arrays, i.e. especially structs do not yet work. +The manifold is provided in the `@variable` macro. Note that until now, +only variables (points on manifolds) are supported, that are arrays, especially structs do not yet work. The algebraic expression of the objective function is specified in the `@objective` macro. The `descent_state_type` attribute specifies the solver. @@ -72,6 +80,8 @@ optimize!(model) solution_summary(model) ``` +### Interface functions + ```@docs Manopt.JuMP_ArrayShape Manopt.JuMP_VectorizedManifold diff --git a/docs/src/functions/adjoint_differentials.md b/docs/src/functions/adjoint_differentials.md index 640d2e4457..19eee1e133 100644 --- a/docs/src/functions/adjoint_differentials.md +++ b/docs/src/functions/adjoint_differentials.md @@ -1,4 +1,4 @@ -# [Adjoint Differentials](@id adjointDifferentialFunctions) +# [Adjoint differentials](@id adjointDifferentialFunctions) ```@autodocs Modules = [Manopt] diff --git a/docs/src/functions/costs.md b/docs/src/functions/costs.md index ed02cffb57..ec12bb0ca0 100644 --- a/docs/src/functions/costs.md +++ b/docs/src/functions/costs.md @@ -1,4 +1,4 @@ -# [Cost Functions](@id CostFunctions) +# [Cost functions](@id CostFunctions) The following cost functions are available diff --git a/docs/src/functions/index.md b/docs/src/functions/index.md index a05a453087..4220b52aa9 100644 --- a/docs/src/functions/index.md +++ b/docs/src/functions/index.md @@ -1,7 +1,7 @@ # Functions There are several functions required within optimization, most prominently -[costFunctions](@ref CostFunctions) and [gradients](@ref GradientFunctions). This package includes +[cost functions](@ref CostFunctions) and [gradients](@ref GradientFunctions). This package includes several cost functions and corresponding gradients, but also corresponding [proximal maps](@ref proximalMapFunctions) for variational methods manifold-valued data. Most of these functions require the evaluation of diff --git a/docs/src/functions/proximal_maps.md b/docs/src/functions/proximal_maps.md index a3eb4912c1..58c51ee4ca 100644 --- a/docs/src/functions/proximal_maps.md +++ b/docs/src/functions/proximal_maps.md @@ -1,4 +1,4 @@ -# [Proximal Maps](@id proximalMapFunctions) +# [Proximal maps](@id proximalMapFunctions) For a function ``\varphi:\mathcal M →ℝ`` the proximal map is defined as @@ -14,9 +14,9 @@ the geodesic distance on ``\mathcal M``. While it might still be difficult to compute the minimizer, there are several proximal maps known (locally) in closed form. Furthermore if ``x^{\star} ∈ \mathcal M`` is a minimizer of ``\varphi``, then -``\displaystyle\operatorname{prox}_{λ\varphi}(x^\star) = x^\star,`` - -i.e. a minimizer is a fixed point of the proximal map. +```math +\operatorname{prox}_{λ\varphi}(x^\star) = x^\star. +``` This page lists all proximal maps available within Manopt. To add you own, just extend the `functions/proximal_maps.jl` file. diff --git a/docs/src/index.md b/docs/src/index.md index 631b1ca6c0..cd40716fa8 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -8,7 +8,7 @@ CurrentModule = Manopt Manopt.Manopt ``` -For a function ``f:\mathcal M → ℝ`` defined on a [Riemannian manifold](https://en.wikipedia.org/wiki/Riemannian_manifold) ``\mathcal M`` we aim to solve +For a function ``f:\mathcal M → ℝ`` defined on a [Riemannian manifold](https://en.wikipedia.org/wiki/Riemannian_manifold) ``\mathcal M`` algorithms in this package aim to solve ```math \operatorname*{argmin}_{p ∈ \mathcal M} f(p), @@ -19,8 +19,8 @@ or in other words: find the point ``p`` on the manifold, where ``f`` reaches its `Manopt.jl` provides a framework for optimization on manifolds as well as a Library of optimization algorithms in [Julia](https://julialang.org). It belongs to the “Manopt family”, which includes [Manopt](https://manopt.org) (Matlab) and [pymanopt.org](https://www.pymanopt.org/) (Python). -If you want to delve right into `Manopt.jl` check out the -[Get started: Optimize!](tutorials/Optimize!.md) tutorial. +If you want to delve right into `Manopt.jl` read the +[🏔️ Get started: optimize.](tutorials/Optimize.md) tutorial. `Manopt.jl` makes it easy to use an algorithm for your favourite manifold as well as a manifold for your favourite algorithm. It already provides @@ -44,7 +44,7 @@ If you use `Manopt.jl`in your work, please cite the following } ``` -To refer to a certain version or the source code in general we recommend to cite for example +To refer to a certain version or the source code in general cite for example ```biblatex @software{manoptjl-zenodo-mostrecent, @@ -60,12 +60,12 @@ To refer to a certain version or the source code in general we recommend to cite for the most recent version or a corresponding version specific DOI, see [the list of all versions](https://zenodo.org/search?page=1&size=20&q=conceptrecid:%224290905%22&sort=-version&all_versions=True). Note that both citations are in [BibLaTeX](https://ctan.org/pkg/biblatex) format. -## Main Features +## Main features -### Optimization Algorithms (Solvers) +### Optimization algorithms (solvers) -For every optimization algorithm, a [solver](@ref SolversSection) is implemented based on a [`AbstractManoptProblem`](@ref) that describes the problem to solve and its [`AbstractManoptSolverState`](@ref) that set up the solver, store interims values. Together they -form a [plan](@ref planSection). +For every optimization algorithm, a [solver](@ref SolversSection) is implemented based on a [`AbstractManoptProblem`](@ref) that describes the problem to solve and its [`AbstractManoptSolverState`](@ref) that set up the solver, and stores values that are required between or for the next iteration. +Together they form a [plan](@ref planSection). ## Manifolds @@ -73,14 +73,14 @@ This project is build upon [ManifoldsBase.jl](https://juliamanifolds.github.io/M The notation in the documentation aims to follow the same [notation](https://juliamanifolds.github.io/Manifolds.jl/stable/misc/notation.html) from these packages. -### Functions on Manifolds +### Functions on manifolds Several functions are available, implemented on an arbitrary manifold, [cost functions](@ref CostFunctions), [differentials](@ref DifferentialFunctions) and their [adjoints](@ref adjointDifferentialFunctions), and [gradients](@ref GradientFunctions) as well as [proximal maps](@ref proximalMapFunctions). ### Visualization -To visualize and interpret results, `Manopt.jl` aims to provide both easy plot functions as well as [exports](@ref Exports). Furthermore a system to get [debug](@ref DebugSection) during the iterations of an algorithms as well as [record](@ref RecordSection) capabilities, i.e. to record a specified tuple of values per iteration, most prominently [`RecordCost`](@ref) and -[`RecordIterate`](@ref). Take a look at the [Get Started: Optimize!](tutorials/Optimize!.md) tutorial on how to easily activate this. +To visualize and interpret results, `Manopt.jl` aims to provide both easy plot functions as well as [exports](@ref Exports). Furthermore a system to get [debug](@ref DebugSection) during the iterations of an algorithms as well as [record](@ref RecordSection) capabilities, for example to record a specified tuple of values per iteration, most prominently [`RecordCost`](@ref) and +[`RecordIterate`](@ref). Take a look at the [🏔️ Get started: optimize.](tutorials/Optimize.md) tutorial on how to easily activate this. ## Literature diff --git a/docs/src/notation.md b/docs/src/notation.md index 9d74aa127e..f95e5d2d81 100644 --- a/docs/src/notation.md +++ b/docs/src/notation.md @@ -1,8 +1,7 @@ # Notation -In this package, we follow the notation introduced in [Manifolds.jl – Notation](https://juliamanifolds.github.io/Manifolds.jl/latest/misc/notation.html) - -with the following additional notation +In this package,the notation introduced in [Manifolds.jl Notation](https://juliamanifolds.github.io/Manifolds.jl/latest/misc/notation.html) is used +with the following additional parts. | Symbol | Description | Also used | Comment | |:--:|:--------------- |:--:|:-- | diff --git a/docs/src/plans/debug.md b/docs/src/plans/debug.md index e248bca0de..e3c46eb2ad 100644 --- a/docs/src/plans/debug.md +++ b/docs/src/plans/debug.md @@ -1,4 +1,4 @@ -# [Debug Output](@id DebugSection) +# [Debug output](@id DebugSection) ```@meta CurrentModule = Manopt @@ -14,7 +14,7 @@ Order = [:type, :function] Private = true ``` -## Technical Details: The Debug Solver +## Technical details The decorator to print debug during the iterations can be activated by decorating the state of a solver and implementing diff --git a/docs/src/plans/index.md b/docs/src/plans/index.md index 195f1fc905..857f324b94 100644 --- a/docs/src/plans/index.md +++ b/docs/src/plans/index.md @@ -5,13 +5,13 @@ CurrentModule = Manopt ``` For any optimisation performed in `Manopt.jl` -we need information about both the optimisation task or “problem” at hand as well as the solver and all its parameters. +information is required about both the optimisation task or “problem” at hand as well as the solver and all its parameters. This together is called a __plan__ in `Manopt.jl` and it consists of two data structures: -* The [Manopt Problem](@ref ProblemSection) describes all _static_ data of our task, most prominently the manifold and the objective. -* The [Solver State](@ref SolverStateSection) describes all _varying_ data and parameters for the solver we aim to use. This also means that each solver has its own data structure for the state. +* The [Manopt Problem](@ref ProblemSection) describes all _static_ data of a task, most prominently the manifold and the objective. +* The [Solver State](@ref sec-solver-state) describes all _varying_ data and parameters for the solver that is used. This also means that each solver has its own data structure for the state. -By splitting these two parts, we can use one problem and solve it using different solvers. +By splitting these two parts, one problem can be define an then be solved using different solvers. Still there might be the need to set certain parameters within any of these structures. For that there is @@ -24,23 +24,23 @@ Manopt.status_summary Where the following Symbols are used The following symbols are used. -The column “generic” refers to a short hand that might be used – for readability if clear from context. +The column “generic” refers to a short hand that might be used for readability if clear from context. | Symbol | Used in | Description | generic | | :----------- | :------: | ;-------------------------------------------------------- | :------ | | `:active` | [`DebugWhenActive`](@ref) | activity of the debug action stored within | | | `:Basepoint` | [`TangentSpace`]() | the point the tangent space is at | `:p` | -| `:Cost` | generic |the cost function (e.g. within an objective, as pass down) | | +| `:Cost` | generic |the cost function (within an objective, as pass down) | | | `:Debug` | [`DebugSolverState`](@ref) | the stored `debugDictionary` | | -| `:Gradient` | generic |the gradient function (e.g. within an objective, as pass down) | | -| `:Iterate` | generic | the (current) iterate – similar to [`set_iterate!`](@ref) – within a state | | -| `:Manifold` | generic |the manifold (e.g. within a problem, as pass down) | | -| `:Objective` | generic | the objective (e.g. within a problem, as pass down) | | -| `:SubProblem` | generic | the sub problem (e.g. within a state, as pass down) | | -| `:SubState` | generic | the sub state (e.g. within a state, as pass down) | | +| `:Gradient` | generic |the gradient function (within an objective, as pass down) | | +| `:Iterate` | generic | the (current) iterate, similar to [`set_iterate!`](@ref), within a state | | +| `:Manifold` | generic |the manifold (within a problem, as pass down) | | +| `:Objective` | generic | the objective (within a problem, as pass down) | | +| `:SubProblem` | generic | the sub problem (within a state, as pass down) | | +| `:SubState` | generic | the sub state (within a state, as pass down) | | | `:λ` | [`ProximalDCCost`](@ref), [`ProximalDCGrad`](@ref) | set the proximal parameter within the proximal sub objective elements | | | `:p` | generic | a certain point | | | `:X` | generic | a certain tangent vector | | | `:TrustRegionRadius` | [`TrustRegionsState`](@ref) | the trust region radius | `:σ` | -| `:ρ`, `:u` | [`ExactPenaltyCost`](@ref), [`ExactPenaltyGrad`](@ref) | Parameters within the exact penalty objetive | | +| `:ρ`, `:u` | [`ExactPenaltyCost`](@ref), [`ExactPenaltyGrad`](@ref) | Parameters within the exact penalty objective | | | `:ρ`, `:μ`, `:λ` | [`AugmentedLagrangianCost`](@ref) and [`AugmentedLagrangianGrad`](@ref) | Parameters of the Lagrangian function | | diff --git a/docs/src/plans/objective.md b/docs/src/plans/objective.md index fdaa7aa39c..1fe163dd5e 100644 --- a/docs/src/plans/objective.md +++ b/docs/src/plans/objective.md @@ -1,4 +1,4 @@ -# [A Manifold Objective](@id ObjectiveSection) +# [A manifold objective](@id ObjectiveSection) ```@meta CurrentModule = Manopt @@ -11,7 +11,7 @@ AbstractManifoldObjective AbstractDecoratedManifoldObjective ``` -Which has two main different possibilities for its containing functions concerning the evaluation mode – not necessarily the cost, but for example gradient in an [`AbstractManifoldGradientObjective`](@ref). +Which has two main different possibilities for its containing functions concerning the evaluation mode, not necessarily the cost, but for example gradient in an [`AbstractManifoldGradientObjective`](@ref). ```@docs AbstractEvaluationType @@ -20,8 +20,7 @@ InplaceEvaluation evaluation_type ``` - -## Decorators for Objectives +## Decorators for objectives An objective can be decorated using the following trait and function to initialize @@ -31,15 +30,15 @@ is_objective_decorator decorate_objective! ``` -### [Embedded Objectives](@id ManifoldEmbeddedObjective) +### [Embedded objectives](@id ManifoldEmbeddedObjective) ```@docs EmbeddedManifoldObjective ``` -### [Cache Objective](@id CacheSection) +### [Cache objective](@id CacheSection) -Since single function calls, e.g. to the cost or the gradient, might be expensive, +Since single function calls, for example to the cost or the gradient, might be expensive, a simple cache objective exists as a decorator, that caches one cost value or gradient. It can be activated/used with the `cache=` keyword argument available for every solver. @@ -57,7 +56,7 @@ A first generic cache is always available, but it only caches one gradient and o SimpleManifoldCachedObjective ``` -#### A Generic Cache +#### A generic cache For the more advanced cache, you need to implement some type of cache yourself, that provides a `get!` and implement [`init_caches`](@ref). @@ -68,13 +67,13 @@ ManifoldCachedObjective init_caches ``` -### [Count Objective](@id ManifoldCountObjective) +### [Count objective](@id ManifoldCountObjective) ```@docs ManifoldCountObjective ``` -### Internal Decorators +### Internal decorators ```@docs ReturnManifoldObjective @@ -82,7 +81,7 @@ ReturnManifoldObjective ## Specific Objective typed and their access functions -### Cost Objective +### Cost objective ```@docs AbstractManifoldCostObjective @@ -101,7 +100,7 @@ and internally get_cost_function ``` -### Gradient Objectives +### Gradient objectives ```@docs AbstractManifoldGradientObjective @@ -130,37 +129,37 @@ and internally get_gradient_function ``` -#### Internal Helpers +#### Internal helpers ```@docs get_gradient_from_Jacobian! ``` -### Subgradient Objective +### Subgradient objective ```@docs ManifoldSubgradientObjective ``` -#### Access Functions +#### Access functions ```@docs get_subgradient ``` -### Proximal Map Objective +### Proximal map objective ```@docs ManifoldProximalMapObjective ``` -#### Access Functions +#### Access functions ```@docs get_proximal_map ``` -### Hessian Objective +### Hessian objective ```@docs AbstractManifoldHessianObjective @@ -180,7 +179,7 @@ and internally get_hessian_function ``` -### Primal-Dual based Objectives +### Primal-dual based objectives ```@docs AbstractPrimalDualManifoldObjective @@ -200,7 +199,7 @@ get_primal_prox linearized_forward_operator ``` -### Constrained Objective +### Constrained objective Besides the [`AbstractEvaluationType`](@ref) there is one further property to distinguish among constraint functions, especially the gradients of the constraints. @@ -235,7 +234,7 @@ get_grad_inequality_constraints get_grad_inequality_constraints! ``` -### Subproblem Objective +### Subproblem objective This objective can be use when the objective of a sub problem solver still needs access to the (outer/main) objective. diff --git a/docs/src/plans/problem.md b/docs/src/plans/problem.md index 5a4678e394..21ac9bf28b 100644 --- a/docs/src/plans/problem.md +++ b/docs/src/plans/problem.md @@ -1,4 +1,4 @@ -# [A Manopt Problem](@id ProblemSection) +# [A Manopt problem](@id ProblemSection) ```@meta CurrentModule = Manopt @@ -12,13 +12,13 @@ get_objective get_manifold ``` -Usually, such a problem is determined by the manifold or domain of the optimisation and the objective with all its properties used within an algorithm – see [The Objective](@ref ObjectiveSection). For that we can just use +Usually, such a problem is determined by the manifold or domain of the optimisation and the objective with all its properties used within an algorithm, see [The Objective](@ref ObjectiveSection). For that one can just use ```@docs DefaultManoptProblem ``` -The exception to these are the primal dual-based solvers ([Chambolle-Pock](@ref ChambollePockSolver) and the [PD Semismooth Newton](@ref PDRSSNSolver)]), which both need two manifolds as their domain(s), hence there also exists a +The exception to these are the primal dual-based solvers ([Chambolle-Pock](@ref ChambollePockSolver) and the [PD Semi-smooth Newton](@ref PDRSSNSolver)), which both need two manifolds as their domains, hence there also exists a ```@docs TwoManifoldProblem diff --git a/docs/src/plans/record.md b/docs/src/plans/record.md index 96c7605958..19630f0b01 100644 --- a/docs/src/plans/record.md +++ b/docs/src/plans/record.md @@ -10,7 +10,7 @@ On the one hand, the high-level interfaces provide a `record=` keyword, that acc For example recording the gradient from the [`GradientDescentState`](@ref) is automatically available, as explained in the [`gradient_descent`](@ref) solver. -## [Record Solver States](@id RecordSolverState) +## [Record solver states](@id RecordSolverState) ```@autodocs Modules = [Manopt] @@ -23,7 +23,7 @@ see [recording values](@ref RecordSection) for details on the decorated solver. Further specific [`RecordAction`](@ref)s can be found when specific types of [`AbstractManoptSolverState`](@ref) define them on their corresponding site. -## Technical Details: The Record Solver +## Technical details ```@docs initialize_solver!(amp::AbstractManoptProblem, rss::RecordSolverState) diff --git a/docs/src/plans/state.md b/docs/src/plans/state.md index 29cb603e29..cc3914ab11 100644 --- a/docs/src/plans/state.md +++ b/docs/src/plans/state.md @@ -1,4 +1,4 @@ -# [The Solver State](@id SolverStateSection) +# [Solver state](@id sec-solver-state) ```@meta CurrentModule = Manopt @@ -6,7 +6,7 @@ CurrentModule = Manopt Given an [`AbstractManoptProblem`](@ref), that is a certain optimisation task, the state specifies the solver to use. It contains the parameters of a solver and all -fields necessary during the algorithm, e.g. the current iterate, a [`StoppingCriterion`](@ref) +fields necessary during the algorithm, for example the current iterate, a [`StoppingCriterion`](@ref) or a [`Stepsize`](@ref). ```@docs @@ -17,9 +17,9 @@ Manopt.get_count Since every subtype of an [`AbstractManoptSolverState`](@ref) directly relate to a solver, the concrete states are documented together with the corresponding [solvers](@ref SolversSection). -This page documents the general functionality available for every state. +This page documents the general features available for every state. -A first example is to access, i.e. obtain or set, the current iterate. +A first example is to obtain or set, the current iterate. This might be useful to continue investigation at the current iterate, or to set up a solver for a next experiment, respectively. ```@docs @@ -42,7 +42,7 @@ Furthermore, to access the stopping criterion use get_stopping_criterion ``` -## Decorators for AbstractManoptSolverState +## Decorators for `AbstractManoptSolverState`s A solver state can be decorated using the following trait and function to initialize @@ -60,7 +60,7 @@ ReturnSolverState as well as [`DebugSolverState`](@ref) and [`RecordSolverState`](@ref). -## State Actions +## State actions A state action is a struct for callback functions that can be attached within for example the just mentioned debug decorator or the record decorator. @@ -88,7 +88,7 @@ _storage_copy_vector _storage_copy_point ``` -## Abstract States +## Abstract states In a few cases it is useful to have a hierarchy of types. These are diff --git a/docs/src/plans/stepsize.md b/docs/src/plans/stepsize.md index 7979393e2e..76bac209b8 100644 --- a/docs/src/plans/stepsize.md +++ b/docs/src/plans/stepsize.md @@ -1,13 +1,13 @@ -# [Stepsize and Linesearch](@id Stepsize) +# [Stepsize and line search](@id Stepsize) ```@meta CurrentModule = Manopt ``` -Most iterative algorithms determine a direction along which the algorithm will proceed and +Most iterative algorithms determine a direction along which the algorithm shall proceed and determine a step size to find the next iterate. How advanced the step size computation can be implemented depends (among others) on the properties the corresponding problem provides. -Within `Manopt.jl`, the step size determination is implemented as a `functor` which is a subtype of [`Stepsize`](@refbased on +Within `Manopt.jl`, the step size determination is implemented as a `functor` which is a subtype of [`Stepsize`](@ref) based on ```@docs Stepsize diff --git a/docs/src/plans/stopping_criteria.md b/docs/src/plans/stopping_criteria.md index 1e76a6cd1f..53d4173ebe 100644 --- a/docs/src/plans/stopping_criteria.md +++ b/docs/src/plans/stopping_criteria.md @@ -1,6 +1,6 @@ -# [Stopping Criteria](@id StoppingCriteria) +# [Stopping criteria](@id StoppingCriteria) -Stopping criteria are implemented as a `functor`, i.e. inherit from the base type +Stopping criteria are implemented as a `functor` and inherit from the base type ```@docs StoppingCriterion @@ -12,16 +12,16 @@ They can also be grouped, which is summarized in the type of a set of criteria StoppingCriterionSet ``` -Then the stopping criteria `s` might have certain internal values to check against, -and this is done when calling them as a function `s(amp::AbstractManoptProblem, ams::AbstractManoptSolverState)`, +The stopping criteria `s` might have certain internal values/fields it uses to verify against. +This is done when calling them as a function `s(amp::AbstractManoptProblem, ams::AbstractManoptSolverState)`, where the [`AbstractManoptProblem`](@ref) and the [`AbstractManoptSolverState`](@ref) together represent the current state of the solver. The functor returns either `false` when the stopping criterion is not fulfilled or `true` otherwise. One field all criteria should have is the `s.reason`, a string giving the reason to stop, see [`get_reason`](@ref). -## Stopping Criteria +## Generic stopping criteria The following generic stopping criteria are available. Some require that, for example, -the corresponding [`AbstractManoptSolverState`](@ref) have a field `gradient` when the criterion should check that. +the corresponding [`AbstractManoptSolverState`](@ref) have a field `gradient` when the criterion should access that. Further stopping criteria might be available for individual solvers. @@ -32,9 +32,9 @@ Order = [:type] Filter = t -> t != StoppingCriterion && t != StoppingCriterionSet ``` -## Functions for Stopping Criteria +## Functions for stopping criteria -There are a few functions to update, combine and modify stopping criteria, especially to update internal values even for stopping criteria already being used within an [`AbstractManoptSolverState`](@ref) structure. +There are a few functions to update, combine, and modify stopping criteria, especially to update internal values even for stopping criteria already being used within an [`AbstractManoptSolverState`](@ref) structure. ```@autodocs Modules = [Manopt] diff --git a/docs/src/references.md b/docs/src/references.md index b3beda4a65..a71d032292 100644 --- a/docs/src/references.md +++ b/docs/src/references.md @@ -1,7 +1,8 @@ # Literature This is all literature mentioned / referenced in the `Manopt.jl` documentation. -Usually you will find a small reference section at the end of every documentation page that contains references. +Usually you find a small reference section at the end of every documentation page that contains +the corresponding references as well. ```@bibliography ``` \ No newline at end of file diff --git a/docs/src/solvers/ChambollePock.md b/docs/src/solvers/ChambollePock.md index 0fc28d8b95..ed53333847 100644 --- a/docs/src/solvers/ChambollePock.md +++ b/docs/src/solvers/ChambollePock.md @@ -1,6 +1,6 @@ -# [The Riemannian Chambolle-Pock Algorithm](@id ChambollePockSolver) +# [The Riemannian Chambolle-Pock algorithm](@id ChambollePockSolver) -The Riemannian Chambolle–Pock is a generalization of the Chambolle–Pock algorithm [ChambollePock:2011](@citet*) +The Riemannian Chambolle—Pock is a generalization of the Chambolle—Pock algorithm [ChambollePock:2011](@citet*) It is also known as primal-dual hybrid gradient (PDHG) or primal-dual proximal splitting (PDPS) algorithm. In order to minimize over ``p∈\mathcal M`` the cost function consisting of @@ -14,14 +14,13 @@ F(p) + G(Λ(p)), where ``F:\mathcal M → \overline{ℝ}``, ``G:\mathcal N → \overline{ℝ}``, and ``Λ:\mathcal M →\mathcal N``. -If the manifolds ``\mathcal M`` or ``\mathcal N`` are not Hadamard, it has to be considered locally, -i.e. on geodesically convex sets ``\mathcal C \subset \mathcal M`` and ``\mathcal D \subset\mathcal N`` +If the manifolds ``\mathcal M`` or ``\mathcal N`` are not Hadamard, it has to be considered locally only, that is on geodesically convex sets ``\mathcal C \subset \mathcal M`` and ``\mathcal D \subset\mathcal N`` such that ``Λ(\mathcal C) \subset \mathcal D``. The algorithm is available in four variants: exact versus linearized (see `variant`) as well as with primal versus dual relaxation (see `relax`). For more details, see [BergmannHerzogSilvaLouzeiroTenbrinckVidalNunez:2021](@citet*). -In the following we note the case of the exact, primal relaxed Riemannian Chambolle–Pock algorithm. +In the following description is the case of the exact, primal relaxed Riemannian Chambolle—Pock algorithm. Given base points ``m∈\mathcal C``, ``n=Λ(m)∈\mathcal D``, initial primal and dual values ``p^{(0)} ∈\mathcal C``, ``ξ_n^{(0)} ∈T_n^*\mathcal N``, @@ -68,7 +67,7 @@ ChambollePock! ChambollePockState ``` -## Useful Terms +## Useful terms ```@docs primal_residual @@ -110,8 +109,19 @@ RecordPrimalIterate Manopt.update_prox_parameters! ``` +## [Technical details](@id sec-cp-technical-details) + +The [`ChambollePock`](@ref) solver requires the following functions of a manifold to be available for both the manifold ``\mathcal M``and ``\mathcal N`` + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` or `retraction_method_dual=` (for ``\mathcal N``) does not have to be specified. +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` or `inverse_retraction_method_dual=` (for ``\mathcal N``) does not have to be specified. +* A [`vector_transport_to!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/)`M, Y, p, X, q)`; it is recommended to set the [`default_vector_transport_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/#ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `vector_transport_method=` or `vector_transport_method_dual=` (for ``\mathcal N``) does not have to be specified. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. + ## Literature + + ```@bibliography Pages = ["ChambollePock.md"] Canonical=false diff --git a/docs/src/solvers/DouglasRachford.md b/docs/src/solvers/DouglasRachford.md index 3a4dacf0f5..c503d1c2e3 100644 --- a/docs/src/solvers/DouglasRachford.md +++ b/docs/src/solvers/DouglasRachford.md @@ -1,6 +1,6 @@ -# [Douglas–Rachford Algorithm](@id DRSolver) +# [Douglas—Rachford algorithm](@id DRSolver) -The (Parallel) Douglas–Rachford ((P)DR) Algorithm was generalized to Hadamard +The (Parallel) Douglas—Rachford ((P)DR) Algorithm was generalized to Hadamard manifolds in [BergmannPerschSteidl:2016](@cite). The aim is to minimize the sum @@ -62,6 +62,19 @@ DouglasRachfordState For specific [`DebugAction`](@ref)s and [`RecordAction`](@ref)s see also [Cyclic Proximal Point](@ref CPPSolver). +## [Technical details](@id sec-dr-technical-details) + +The [`DouglasRachford`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` does not have to be specified. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. + +By default, one of the stopping criteria is [`StopWhenChangeLess`](@ref), +which requires + +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` or `inverse_retraction_method_dual=` (for ``\mathcal N``) does not have to be specified or the [`distance`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.distance-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, q)` for said default inverse retraction. + ## Literature ```@bibliography diff --git a/docs/src/solvers/FrankWolfe.md b/docs/src/solvers/FrankWolfe.md index abfe2b6b50..9f64fff56d 100644 --- a/docs/src/solvers/FrankWolfe.md +++ b/docs/src/solvers/FrankWolfe.md @@ -1,4 +1,4 @@ -# [Frank Wolfe Method](@id FrankWolfe) +# [Frank—Wolfe method](@id FrankWolfe) ```@meta CurrentModule = Manopt diff --git a/docs/src/solvers/LevenbergMarquardt.md b/docs/src/solvers/LevenbergMarquardt.md index 82f743be57..a72040bd12 100644 --- a/docs/src/solvers/LevenbergMarquardt.md +++ b/docs/src/solvers/LevenbergMarquardt.md @@ -15,6 +15,15 @@ LevenbergMarquardt! LevenbergMarquardtState ``` +## [Technical details](@id sec-lm-technical-details) + +The [`LevenbergMarquardt`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. + + ## Literature ```@bibliography diff --git a/docs/src/solvers/NelderMead.md b/docs/src/solvers/NelderMead.md index 9a0985c087..d176fa839d 100644 --- a/docs/src/solvers/NelderMead.md +++ b/docs/src/solvers/NelderMead.md @@ -1,4 +1,4 @@ -# [Nelder Mead Method](@id NelderMeadSolver) +# [Nelder Mead method](@id NelderMeadSolver) ```@meta CurrentModule = Manopt @@ -21,8 +21,18 @@ CurrentModule = Manopt NelderMeadSimplex ``` -## Additional Stopping Criteria +## Additional stopping criteria ```@docs StopWhenPopulationConcentrated -``` \ No newline at end of file +``` + +## [Technical details](@id sec-NelderMead-technical-details) + +The [`NelderMead`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` does not have to be specified. +* The [`distance`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.distance-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, q)` when using the default stopping criterion, which includes [`StopWhenPopulationConcentrated`](@ref). +* Within the default initialization [`rand`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.rand-Tuple{AbstractManifold})`(M)` is used to generate the initial population +* A [`mean`](https://juliamanifolds.github.io/Manifolds.jl/stable/features/statistics.html#Statistics.mean-Tuple{AbstractManifold,%20AbstractVector,%20AbstractVector,%20ExtrinsicEstimation})`(M, population)` has to be available, for example by loading [`Manifolds.jl`](https://juliamanifolds.github.io/Manifolds.jl/stable/) and its [statistics](https://juliamanifolds.github.io/Manifolds.jl/stable/features/statistics.html) tools \ No newline at end of file diff --git a/docs/src/solvers/adaptive-regularization-with-cubics.md b/docs/src/solvers/adaptive-regularization-with-cubics.md index ad6ab2cc9c..a21c3ecc01 100644 --- a/docs/src/solvers/adaptive-regularization-with-cubics.md +++ b/docs/src/solvers/adaptive-regularization-with-cubics.md @@ -1,4 +1,4 @@ -# [Adaptive regularization with Cubics](@id ARSSection) +# [Adaptive regularization with cubics](@id ARSSection) @@ -21,13 +21,13 @@ AdaptiveRegularizationState There are several ways to approach the subsolver. The default is the first one. -## Lanczos Iteration +## Lanczos iteration ```@docs Manopt.LanczosState ``` -## (Conjugate) Gradient Descent +## (Conjugate) gradient descent There is a generic objective, that implements the sub problem @@ -42,18 +42,30 @@ arc_obj = AdaptiveRagularizationWithCubicsModelObjective(mho, σ) sub_problem = DefaultProblem(TangentSpaceAt(M,p), arc_obj) ``` -where `mho` is the hessian objective of `f` to solve. +where `mho` is the Hessian objective of `f` to solve. Then use this for the `sub_problem` keyword and use your favourite gradient based solver for the `sub_state` keyword, for example a [`ConjugateGradientDescentState`](@ref) -## Additional Stopping Criteria +## Additional stopping criteria ```@docs StopWhenAllLanczosVectorsUsed StopWhenFirstOrderProgress ``` +## [Technical details](@id sec-arc-technical-details) + +The [`adaptive_regularization_with_cubics`](@ref) requires the following functions +of a manifolds to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* if you do not provide an initial regularization parameter `σ`, a [`manifold_dimension`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.manifold_dimension-Tuple{AbstractManifold}) is required. +* By default the tangent vector storing the gradient is initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. +* [`inner`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.inner-Tuple{AbstractManifold,%20Any,%20Any,%20Any})`(M, p, X, Y)` is used within the algorithm step + +Furthermore, within the Lanczos subsolver, generating a random vector (at `p`) using [`rand!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.rand-Tuple{AbstractManifold})`(M, X; vector_at=p)` in place of `X` is required + ## Literature ```@bibliography diff --git a/docs/src/solvers/alternating_gradient_descent.md b/docs/src/solvers/alternating_gradient_descent.md index 49beaf9a85..3e60b23682 100644 --- a/docs/src/solvers/alternating_gradient_descent.md +++ b/docs/src/solvers/alternating_gradient_descent.md @@ -1,4 +1,4 @@ -# [Alternating Gradient Descent](@id AlternatingGradientDescentSolver) +# [Alternating gradient descent](@id title-agds) ```@meta CurrentModule = Manopt @@ -22,3 +22,14 @@ The most inner one should always be the following one though. ```@docs AlternatingGradient ``` + + +## [Technical details](@id sec-agd-technical-details) + +The [`alternating_gradient_descent`](@ref) solver requires the following functions of a manifold to be available + +* The problem has to be phrased on a [`ProductManifold`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/metamanifolds/#ProductManifold), to be able to +alternate between parts of the input. +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* By default alternating gradient descent uses [`ArmijoLinesearch`](@ref) which requires [`max_stepsize`](@ref)`(M)` to be set and an implementation of [`inner`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.inner-Tuple%7BAbstractManifold,%20Any,%20Any,%20Any%7D)`(M, p, X)`. +* By default the tangent vector storing the gradient is initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. diff --git a/docs/src/solvers/augmented_Lagrangian_method.md b/docs/src/solvers/augmented_Lagrangian_method.md index ada799cf92..f84ac86de1 100644 --- a/docs/src/solvers/augmented_Lagrangian_method.md +++ b/docs/src/solvers/augmented_Lagrangian_method.md @@ -1,4 +1,4 @@ -# [Augmented Lagrangian Method](@id AugmentedLagrangianSolver) +# [Augmented Lagrangian method](@id AugmentedLagrangianSolver) ```@meta CurrentModule = Manopt @@ -15,13 +15,22 @@ CurrentModule = Manopt AugmentedLagrangianMethodState ``` -## Helping Functions +## Helping functions ```@docs AugmentedLagrangianCost AugmentedLagrangianGrad ``` +## [Technical details](@id sec-agd-technical-details) + +The [`augmented_Lagrangian_method`](@ref) solver requires the following functions of a manifold to be available + +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. +* Everything the subsolver requires, which by default is the [`quasi_Newton`](@ref) method +* A [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. + + ## Literature ```@bibliography diff --git a/docs/src/solvers/conjugate_gradient_descent.md b/docs/src/solvers/conjugate_gradient_descent.md index 289907d363..871bb9b65b 100644 --- a/docs/src/solvers/conjugate_gradient_descent.md +++ b/docs/src/solvers/conjugate_gradient_descent.md @@ -1,5 +1,5 @@ -# [Conjugate Gradient Descent](@id CGSolver) +# [Conjugate gradient descent](@id CGSolver) ```@meta CurrentModule = Manopt @@ -16,7 +16,7 @@ conjugate_gradient_descent! ConjugateGradientDescentState ``` -## [Available Coefficients](@id cg-coeffs) +## [Available coefficients](@id cg-coeffs) The update rules act as [`DirectionUpdateRule`](@ref), which internally always first evaluate the gradient itself. @@ -32,6 +32,16 @@ PolakRibiereCoefficient SteepestDirectionUpdateRule ``` +## [Technical details](@id sec-cgd-technical-details) + +The [`conjugate_gradient_descent`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* A [`vector_transport_to!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/)`M, Y, p, X, q)`; it is recommended to set the [`default_vector_transport_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/#ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `vector_transport_method=` or `vector_transport_method_dual=` (for ``\mathcal N``) does not have to be specified. +* By default gradient descent uses [`ArmijoLinesearch`](@ref) which requires [`max_stepsize`](@ref)`(M)` to be set and an implementation of [`inner`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.inner-Tuple%7BAbstractManifold,%20Any,%20Any,%20Any%7D)`(M, p, X)`. +* By default the stopping criterion uses the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. +* By default the tangent vector storing the gradient is initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. + # Literature ```@bibliography diff --git a/docs/src/solvers/cyclic_proximal_point.md b/docs/src/solvers/cyclic_proximal_point.md index 018eee9a92..73fcd55356 100644 --- a/docs/src/solvers/cyclic_proximal_point.md +++ b/docs/src/solvers/cyclic_proximal_point.md @@ -1,4 +1,4 @@ -# [Cyclic Proximal Point](@id CPPSolver) +# [Cyclic proximal point](@id CPPSolver) The Cyclic Proximal Point (CPP) algorithm aims to minimize @@ -21,19 +21,28 @@ cyclic_proximal_point cyclic_proximal_point! ``` +## [Technical details](@id sec-cppa-technical-details) + +The [`cyclic_proximal_point`](@ref) solver requires no additional functions to be available for your manifold, besides the ones you use in the proximal maps. + +By default, one of the stopping criteria is [`StopWhenChangeLess`](@ref), +which either requires + +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` or `inverse_retraction_method_dual=` (for ``\mathcal N``) does not have to be specified or the [`distance`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.distance-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, q)` for said default inverse retraction. + ## State ```@docs CyclicProximalPointState ``` -## Debug Functions +## Debug functions ```@docs DebugProximalParameter ``` -## Record Functions +## Record functions ```@docs RecordProximalParameter diff --git a/docs/src/solvers/difference_of_convex.md b/docs/src/solvers/difference_of_convex.md index 89a314cdd1..4cbc1b5805 100644 --- a/docs/src/solvers/difference_of_convex.md +++ b/docs/src/solvers/difference_of_convex.md @@ -1,24 +1,24 @@ -# [Difference of Convex](@id DifferenceOfConvexSolvers) +# [Difference of convex](@id DifferenceOfConvexSolvers) ```@meta CurrentModule = Manopt ``` -## [Difference of Convex Algorithm](@id DCASolver) +## [Difference of convex algorithm](@id DCASolver) ```@docs difference_of_convex_algorithm difference_of_convex_algorithm! ``` -## [Difference of Convex Proximal Point](@id DCPPASolver) +## [Difference of convex proximal point](@id DCPPASolver) ```@docs difference_of_convex_proximal_point difference_of_convex_proximal_point! ``` -## Manopt Solver States +## Solver states ```@docs DifferenceOfConvexState @@ -49,12 +49,28 @@ ProximalDCCost ProximalDCGrad ``` -## Further helper functions +## Helper functions ```@docs get_subtrahend_gradient ``` +## [Technical details](@id sec-cp-technical-details) + +The [`difference_of_convex_algorithm`](@ref) and [`difference_of_convex_proximal_point`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` or `retraction_method_dual=` (for ``\mathcal N``) does not have to be specified. +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` or `inverse_retraction_method_dual=` (for ``\mathcal N``) does not have to be specified. + +By default, one of the stopping criteria is [`StopWhenChangeLess`](@ref), +which either requires + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` or `retraction_method_dual=` (for ``\mathcal N``) does not have to be specified. +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` or `inverse_retraction_method_dual=` (for ``\mathcal N``) does not have to be specified or the [`distance`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.distance-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, q)` for said default inverse retraction. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. +* By default the tangent vector storing the gradient is initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. +* everything the subsolver requires, which by default is the [`trust_regions`](@ref) or if you do not provide a Hessian [`gradient_descent`](@ref). + ## Literature ```@bibliography diff --git a/docs/src/solvers/exact_penalty_method.md b/docs/src/solvers/exact_penalty_method.md index 9f17a3f32c..98056cd253 100644 --- a/docs/src/solvers/exact_penalty_method.md +++ b/docs/src/solvers/exact_penalty_method.md @@ -1,4 +1,4 @@ -# [Exact Penalty Method](@id ExactPenaltySolver) +# [Exact penalty method](@id ExactPenaltySolver) ```@meta CurrentModule = Manopt @@ -15,7 +15,7 @@ CurrentModule = Manopt ExactPenaltyMethodState ``` -## Helping Functions +## Helping functions ```@docs ExactPenaltyCost @@ -25,6 +25,21 @@ LinearQuadraticHuber LogarithmicSumOfExponentials ``` +## [Technical details](@id sec-dr-technical-details) + +The [`exact_penalty_method`](@ref) solver requires the following functions of a manifold to be available + + +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. +* Everything the subsolver requires, which by default is the [`quasi_Newton`](@ref) method +* A [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. + + +The stopping criteria involves [`StopWhenChangeLess`](@ref) and [`StopWhenGradientNormLess`](@ref) +which require + +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` or `inverse_retraction_method_dual=` (for ``\mathcal N``) does not have to be specified or the [`distance`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.distance-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, q)` for said default inverse retraction. +* the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. ## Literature diff --git a/docs/src/solvers/gradient_descent.md b/docs/src/solvers/gradient_descent.md index fa5516f2c8..9dbbec5516 100644 --- a/docs/src/solvers/gradient_descent.md +++ b/docs/src/solvers/gradient_descent.md @@ -1,4 +1,4 @@ -# [Gradient Descent](@id GradientDescentSolver) +# [Gradient descent](@id GradientDescentSolver) ```@meta CurrentModule = Manopt @@ -15,7 +15,7 @@ CurrentModule = Manopt GradientDescentState ``` -## Direction Update Rules +## Direction update rules A field of the options is the `direction`, a [`DirectionUpdateRule`](@ref), which by default [`IdentityUpdateRule`](@ref) just evaluates the gradient but can be enhanced for example to @@ -27,7 +27,7 @@ AverageGradient Nesterov ``` -## Debug Actions +## Debug actions ```@docs DebugGradient @@ -35,7 +35,7 @@ DebugGradientNorm DebugStepsize ``` -## Record Actions +## Record actions ```@docs RecordGradient @@ -43,13 +43,14 @@ RecordGradientNorm RecordStepsize ``` -## Technical Details +## [Technical details](@id sec-gradient-descent-technical-details) -The [`gradient_descent`](@ref) solver requires the following functions of your manifold to be available +The [`gradient_descent`](@ref) solver requires the following functions of a manifold to be available -* A retraction; if you do not want to specify them directly, [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) should be implemented as well. -* By default gradient descent uses [`ArmijoLinesearch`](@ref) which requires [`max_stepsize`](@ref)`(M)` to be set and an implementation of [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, X)`. -* By default the stopping criterion uses the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to check for a small gradient +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* By default gradient descent uses [`ArmijoLinesearch`](@ref) which requires [`max_stepsize`](@ref)`(M)` to be set and an implementation of [`inner`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.inner-Tuple%7BAbstractManifold,%20Any,%20Any,%20Any%7D)`(M, p, X)`. +* By default the stopping criterion uses the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. +* By default the tangent vector storing the gradient is initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. ## Literature diff --git a/docs/src/solvers/index.md b/docs/src/solvers/index.md index 808bf7425f..673e50d290 100644 --- a/docs/src/solvers/index.md +++ b/docs/src/solvers/index.md @@ -8,19 +8,19 @@ CurrentModule = Manopt Solvers can be applied to [`AbstractManoptProblem`](@ref)s with solver specific [`AbstractManoptSolverState`](@ref). -# List of Algorithms +# List of algorithms The following algorithms are currently available | Solver | Function & State | Objective | |:---------|:----------------|:---------| -[Alternating Gradient Descent](@ref AlternatingGradientDescentSolver) | [`alternating_gradient_descent`](@ref) [`AlternatingGradientDescentState`](@ref) | ``f=(f_1,\ldots,f_n)``, ``\operatorname{grad} f_i`` | +[Alternating Gradient Descent](@ref title-agds) | [`alternating_gradient_descent`](@ref) [`AlternatingGradientDescentState`](@ref) | ``f=(f_1,\ldots,f_n)``, ``\operatorname{grad} f_i`` | [Chambolle-Pock](@ref ChambollePockSolver) | [`ChambollePock`](@ref), [`ChambollePockState`](@ref) (using [`TwoManifoldProblem`](@ref)) | ``f=F+G(Λ\cdot)``, ``\operatorname{prox}_{σ F}``, ``\operatorname{prox}_{τ G^*}``, ``Λ`` | [Conjugate Gradient Descent](@ref CGSolver) | [`conjugate_gradient_descent`](@ref), [`ConjugateGradientDescentState`](@ref) | ``f``, ``\operatorname{grad} f`` [Cyclic Proximal Point](@ref CPPSolver) | [`cyclic_proximal_point`](@ref), [`CyclicProximalPointState`](@ref) | ``f=\sum f_i``, ``\operatorname{prox}_{\lambda f_i}`` | -[Difference of Convex Algorithm](@ref DCASolver) | [`difference_of_convex_algorithm`](@ref), [`DifferenceOfConvexState`](@ref) | ``f=g-h``, ``∂h``, and e.g. ``g``, ``\operatorname{grad} g`` | -[Difference of Convex Proximal Point](@ref DCPPASolver) | [`difference_of_convex_proximal_point`](@ref), [`DifferenceOfConvexProximalState`](@ref) | ``f=g-h``, ``∂h``, and e.g. ``g``, ``\operatorname{grad} g`` | -[Douglas–Rachford](@ref DRSolver) | [`DouglasRachford`](@ref), [`DouglasRachfordState`](@ref) | ``f=\sum f_i``, ``\operatorname{prox}_{\lambda f_i}`` | +[Difference of Convex Algorithm](@ref DCASolver) | [`difference_of_convex_algorithm`](@ref), [`DifferenceOfConvexState`](@ref) | ``f=g-h``, ``∂h``, and for example ``g``, ``\operatorname{grad} g`` | +[Difference of Convex Proximal Point](@ref DCPPASolver) | [`difference_of_convex_proximal_point`](@ref), [`DifferenceOfConvexProximalState`](@ref) | ``f=g-h``, ``∂h``, and for example ``g``, ``\operatorname{grad} g`` | +[Douglas—Rachford](@ref DRSolver) | [`DouglasRachford`](@ref), [`DouglasRachfordState`](@ref) | ``f=\sum f_i``, ``\operatorname{prox}_{\lambda f_i}`` | [Exact Penalty Method](@ref ExactPenaltySolver) | [`exact_penalty_method`](@ref), [`ExactPenaltyMethodState`](@ref) | ``f``, ``\operatorname{grad} f``, ``g``, ``\operatorname{grad} g_i``, ``h``, ``\operatorname{grad} h_j`` | [Frank-Wolfe algorithm](@ref FrankWolfe) | [`Frank_Wolfe_method`](@ref), [`FrankWolfeState`](@ref) | sub-problem solver | [Gradient Descent](@ref GradientDescentSolver) | [`gradient_descent`](@ref), [`GradientDescentState`](@ref) | ``f``, ``\operatorname{grad} f`` | @@ -31,13 +31,13 @@ The following algorithms are currently available [Primal-dual Riemannian semismooth Newton Algorithm](@ref PDRSSNSolver) | [`primal_dual_semismooth_Newton`](@ref), [`PrimalDualSemismoothNewtonState`](@ref) (using [`TwoManifoldProblem`](@ref)) | ``f=F+G(Λ\cdot)``, ``\operatorname{prox}_{σ F}`` & diff., ``\operatorname{prox}_{τ G^*}`` & diff., ``Λ`` [Quasi-Newton Method](@ref quasiNewton) | [`quasi_Newton`](@ref), [`QuasiNewtonState`](@ref) | ``f``, ``\operatorname{grad} f`` | [Steihaug-Toint Truncated Conjugate-Gradient Method](@ref tCG) | [`truncated_conjugate_gradient_descent`](@ref), [`TruncatedConjugateGradientState`](@ref) | ``f``, ``\operatorname{grad} f``, ``\operatorname{Hess} f`` | -[Subgradient Method](@ref SubgradientSolver) | [`subgradient_method`](@ref), [`SubGradientMethodState`](@ref) | ``f``, ``∂ f`` | +[Subgradient Method](@ref sec-subgradient-method) | [`subgradient_method`](@ref), [`SubGradientMethodState`](@ref) | ``f``, ``∂ f`` | [Stochastic Gradient Descent](@ref StochasticGradientDescentSolver) | [`stochastic_gradient_descent`](@ref), [`StochasticGradientDescentState`](@ref) | ``f = \sum_i f_i``, ``\operatorname{grad} f_i`` | [The Riemannian Trust-Regions Solver](@ref trust_regions) | [`trust_regions`](@ref), [`TrustRegionsState`](@ref) | ``f``, ``\operatorname{grad} f``, ``\operatorname{Hess} f`` | -Note that the solvers (their [`AbstractManoptSolverState`](@ref), to be precise) can also be decorated to enhance your algorithm by general additional properties, see [debug output](@ref DebugSection) and [recording values](@ref RecordSection). This is done using the `debug=` and `record=` keywords in the function calls. Similarly, since 0.4 we provide a (simple) [caching of the objective function](@ref CacheSection) using the `cache=` keyword in any of the function calls.. +Note that the solvers (their [`AbstractManoptSolverState`](@ref), to be precise) can also be decorated to enhance your algorithm by general additional properties, see [debug output](@ref DebugSection) and [recording values](@ref RecordSection). This is done using the `debug=` and `record=` keywords in the function calls. Similarly, since Manopt.jl 0.4 a (simple) [caching of the objective function](@ref CacheSection) using the `cache=` keyword is available in any of the function calls.. -## Technical Details +## Technical details The main function a solver calls is @@ -60,7 +60,7 @@ stop_solver!(p::AbstractManoptProblem, s::AbstractManoptSolverState, Any) ## API for solvers this is a short overview of the different types of high-level functions are usually -available for a solver. Let's assume the solver is called `new_solver` and requires +available for a solver. Assume the solver is called `new_solver` and requires a cost `f` and some first order information `df` as well as a starting point `p` on `M`. `f` and `df` form the objective together called `obj`. @@ -80,26 +80,26 @@ If you provide an immutable point `p` or the `rand(M)` point is immutable, like The third variant works in place of `p`, so it is mandatory. -This first interface would set up the objective and pass all keywords on the the +This first interface would set up the objective and pass all keywords on the objective based call. -### The objective-based call +### Objective based calls to solvers ``` new_solver(M, obj, p=rand(M); kwargs...) new_solver!(M, obj, p; kwargs...) ``` -Here the objective would be created beforehand, e.g. to compare different solvers on the +Here the objective would be created beforehand for example to compare different solvers on the same objective, and for the first variant the start point is optional. Keyword arguments include decorators like `debug=` or `record=` as well as algorithm specific ones. -this variant would generate the `problem` and the `state` and check validity of all provided +This variant would generate the `problem` and the `state` and verify validity of all provided keyword arguments that affect the state. Then it would call the iterate process. -### The manual call +### Manual calls If you generate the corresponding `problem` and `state` as the previous step does, you can also use the third (lowest level) and just call diff --git a/docs/src/solvers/particle_swarm.md b/docs/src/solvers/particle_swarm.md index a219a817ba..dd31362b9f 100644 --- a/docs/src/solvers/particle_swarm.md +++ b/docs/src/solvers/particle_swarm.md @@ -1,4 +1,4 @@ -# [Particle Swarm Optimization](@id ParticleSwarmSolver) +# [Particle swarm optimization](@id ParticleSwarmSolver) ```@meta CurrentModule = Manopt @@ -15,6 +15,18 @@ CurrentModule = Manopt ParticleSwarmState ``` +## [Technical details](@id sec-arc-technical-details) + +The [`particle_swarm`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` does not have to be specified. +* A [`vector_transport_to!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/)`M, Y, p, X, q)`; it is recommended to set the [`default_vector_transport_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/#ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `vector_transport_method=` does not have to be specified. +* By default the stopping criterion uses the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. +* Tangent vectors storing the social and cognitive vectors are initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. +* The [`distance`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.distance-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, q)` when using the default stopping criterion, which uses [`StopWhenChangeLess`](@ref). + ## Literature ```@bibliography diff --git a/docs/src/solvers/primal_dual_semismooth_Newton.md b/docs/src/solvers/primal_dual_semismooth_Newton.md index e007783d05..10ac5aec4d 100644 --- a/docs/src/solvers/primal_dual_semismooth_Newton.md +++ b/docs/src/solvers/primal_dual_semismooth_Newton.md @@ -1,4 +1,4 @@ -# [The Primal-dual Riemannian semismooth Newton Algorithm](@id PDRSSNSolver) +# [Primal-dual Riemannian semismooth Newton algorithm](@id PDRSSNSolver) The Primal-dual Riemannian semismooth Newton Algorithm is a second-order method derived from the [`ChambollePock`](@ref). @@ -10,11 +10,10 @@ F(p) + G(Λ(p)), where ``F:\mathcal M → \overline{ℝ}``, ``G:\mathcal N → \overline{ℝ}``, and ``Λ:\mathcal M →\mathcal N``. -If the manifolds ``\mathcal M`` or ``\mathcal N`` are not Hadamard, it has to be considered locally, -i.e. on geodesically convex sets ``\mathcal C \subset \mathcal M`` and ``\mathcal D \subset\mathcal N`` +If the manifolds ``\mathcal M`` or ``\mathcal N`` are not Hadamard, it has to be considered locally only, that is on geodesically convex sets ``\mathcal C \subset \mathcal M`` and ``\mathcal D \subset\mathcal N`` such that ``Λ(\mathcal C) \subset \mathcal D``. -The algorithm comes down to applying the Riemannian semismooth Newton method to the rewritten primal-dual optimality conditions, i.e., we define the vector field ``X: \mathcal{M} \times \mathcal{T}_{n}^{*} \mathcal{N} \rightarrow \mathcal{T} \mathcal{M} \times \mathcal{T}_{n}^{*} \mathcal{N}`` as +The algorithm comes down to applying the Riemannian semismooth Newton method to the rewritten primal-dual optimality conditions. Define the vector field ``X: \mathcal{M} \times \mathcal{T}_{n}^{*} \mathcal{N} \rightarrow \mathcal{T} \mathcal{M} \times \mathcal{T}_{n}^{*} \mathcal{N}`` as ```math X\left(p, \xi_{n}\right):=\left(\begin{array}{c} @@ -72,6 +71,19 @@ primal_dual_semismooth_Newton! PrimalDualSemismoothNewtonState ``` +## [Technical details](@id sec-ssn-technical-details) + +The [`primal_dual_semismooth_Newton`](@ref) solver requires the following functions of a manifold to be available for both the manifold ``\mathcal M``and ``\mathcal N`` + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* An [`inverse_retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, X, p, q)`; it is recommended to set the [`default_inverse_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_inverse_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `inverse_retraction_method=` does not have to be specified. +* A [`vector_transport_to!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/)`M, Y, p, X, q)`; it is recommended to set the [`default_vector_transport_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/#ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `vector_transport_method=` does not have to be specified. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. +* A [`get_basis`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/bases/#ManifoldsBase.get_basis-Tuple{AbstractManifold,%20Any,%20ManifoldsBase.AbstractBasis}) for the [`DefaultOrthonormalBasis`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/bases/#ManifoldsBase.DefaultOrthonormalBasis) on ``\mathcal M`` +* [`exp`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.exp-Tuple{AbstractManifold,%20Any,%20Any}) and [`log`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.log-Tuple{AbstractManifold,%20Any,%20Any}) (on ``\mathcal M``) +* A [`DiagonalizingOrthonormalBasis`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/bases/#ManifoldsBase.DiagonalizingOrthonormalBasis) to compute the differentials of the exponential and logarithmic map +* Tangent vectors storing the social and cognitive vectors are initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. + ## Literature ```@bibliography diff --git a/docs/src/solvers/quasi_Newton.md b/docs/src/solvers/quasi_Newton.md index a14e992344..26e8595ff0 100644 --- a/docs/src/solvers/quasi_Newton.md +++ b/docs/src/solvers/quasi_Newton.md @@ -11,7 +11,7 @@ ## Background -The aim is to minimize a real-valued function on a Riemannian manifold, i.e. +The aim is to minimize a real-valued function on a Riemannian manifold, that is ```math \min f(x), \quad x ∈ \mathcal{M}. @@ -30,7 +30,11 @@ In quasi-Newton methods, the search direction is given by ``` where ``\mathcal{H}_k : T_{x_k} \mathcal{M} →T_{x_k} \mathcal{M}`` is a positive definite self-adjoint operator, which approximates the action of the Hessian ``\operatorname{Hess} f (x_k)[⋅]`` and ``\mathcal{B}_k = {\mathcal{H}_k}^{-1}``. The idea of quasi-Newton methods is instead of creating a complete new approximation of the Hessian operator ``\operatorname{Hess} f(x_{k+1})`` or its inverse at every iteration, the previous operator ``\mathcal{H}_k`` or ``\mathcal{B}_k`` is updated by a convenient formula using the obtained information about the curvature of the objective function during the iteration. The resulting operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` acts on the tangent space ``T_{x_{k+1}} \mathcal{M}`` of the freshly computed iterate ``x_{k+1}``. -In order to get a well-defined method, the following requirements are placed on the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` that is created by an update. Since the Hessian ``\operatorname{Hess} f(x_{k+1})`` is a self-adjoint operator on the tangent space ``T_{x_{k+1}} \mathcal{M}``, and ``\mathcal{H}_{k+1}`` approximates it, we require that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is also self-adjoint on ``T_{x_{k+1}} \mathcal{M}``. In order to achieve a steady descent, we want ``η_k`` to be a descent direction in each iteration. Therefore we require, that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is a positive definite operator on ``T_{x_{k+1}} \mathcal{M}``. In order to get information about the curvature of the objective function into the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``, we require that it satisfies a form of a Riemannian quasi-Newton equation: +In order to get a well-defined method, the following requirements are placed on the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` that is created by an update. +Since the Hessian ``\operatorname{Hess} f(x_{k+1})`` is a self-adjoint operator on the tangent space ``T_{x_{k+1}} \mathcal{M}``, and ``\mathcal{H}_{k+1}`` approximates it, one requirement is, that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is also self-adjoint on ``T_{x_{k+1}} \mathcal{M}``. +In order to achieve a steady descent, the next requirement is that ``η_k`` is a descent direction in each iteration. +Hence a further requirement is that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is a positive definite operator on ``T_{x_{k+1}} \mathcal{M}``. +In order to get information about the curvature of the objective function into the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``, the last requirement is a form of a Riemannian quasi-Newton equation: ```math \mathcal{H}_{k+1} [T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1}))] = \operatorname{grad}(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(\operatorname{grad}f(x_k)) @@ -42,14 +46,16 @@ or \mathcal{B}_{k+1} [\operatorname{grad}f(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(\operatorname{grad}f(x_k))] = T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1})) ``` -where ``T_{x_k \rightarrow x_{k+1}} : T_{x_k} \mathcal{M} →T_{x_{k+1}} \mathcal{M}`` and the chosen retraction ``R`` is the associated retraction of ``T``. We note that, of course, not all updates in all situations will meet these conditions in every iteration. -For specific quasi-Newton updates, the fulfillment of the Riemannian curvature condition, which requires that +where ``T_{x_k \rightarrow x_{k+1}} : T_{x_k} \mathcal{M} →T_{x_{k+1}} \mathcal{M}`` and +the chosen retraction ``R`` is the associated retraction of ``T``. +Note that, of course, not all updates in all situations meet these conditions in every iteration. +For specific quasi-Newton updates, the fulfilment of the Riemannian curvature condition, which requires that ```math g_{x_{k+1}}(s_k, y_k) > 0 ``` -holds, is a requirement for the inheritance of the self-adjointness and positive definiteness of the ``\mathcal{H}_k`` or ``\mathcal{B}_k`` to the operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``. Unfortunately, the fulfillment of the Riemannian curvature condition is not given by a step size ``\alpha_k > 0`` that satisfies the generalized Wolfe conditions. However, in order to create a positive definite operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` in each iteration, the so-called locking condition was introduced in [Huang, Gallican, Absil, SIAM J. Optim., 2015](@cite HuangGallivanAbsil:2015), which requires that the isometric vector transport ``T^S``, which is used in the update formula, and its associate retraction ``R`` fulfill +holds, is a requirement for the inheritance of the self-adjointness and positive definiteness of the ``\mathcal{H}_k`` or ``\mathcal{B}_k`` to the operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``. Unfortunately, the fulfilment of the Riemannian curvature condition is not given by a step size ``\alpha_k > 0`` that satisfies the generalized Wolfe conditions. However, to create a positive definite operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` in each iteration, the so-called locking condition was introduced in [HuangGallivanAbsil:2015](@cite), which requires that the isometric vector transport ``T^S``, which is used in the update formula, and its associate retraction ``R`` fulfil ```math T^{S}{x, ξ_x}(ξ_x) = β T^{R}{x, ξ_x}(ξ_x), \quad β = \frac{\lVert ξ_x \rVert_x}{\lVert T^{R}{x, ξ_x}(ξ_x) \rVert_{R_{x}(ξ_x)}}, @@ -67,10 +73,10 @@ where β_k = \frac{\lVert α_k η_k \rVert_{x_k}}{\lVert T^{R}{x_k, α_k η_k}(α_k η_k) \rVert_{x_{k+1}}}, ``` -in the update, it can be shown that choosing a stepsize ``α_k > 0`` that satisfies the Riemannian Wolfe conditions leads to the fulfillment of the Riemannian curvature condition, which in turn implies that the operator generated by the updates is positive definite. -In the following we denote the specific operators in matrix notation and hence use ``H_k`` and ``B_k``, respectively. +in the update, it can be shown that choosing a stepsize ``α_k > 0`` that satisfies the Riemannian Wolfe conditions leads to the fulfilment of the Riemannian curvature condition, which in turn implies that the operator generated by the updates is positive definite. +In the following the specific operators are denoted in matrix notation and hence use ``H_k`` and ``B_k``, respectively. -## Direction Updates +## Direction updates In general there are different ways to compute a fixed [`AbstractQuasiNewtonUpdateRule`](@ref). In general these are represented by @@ -82,7 +88,7 @@ QuasiNewtonLimitedMemoryDirectionUpdate QuasiNewtonCautiousDirectionUpdate ``` -## Hessian Update Rules +## Hessian update rules Using @@ -111,6 +117,20 @@ The quasi Newton algorithm is based on a [`DefaultManoptProblem`](@ref). ```@docs QuasiNewtonState ``` +## [Technical details](@id sec-qn-technical-details) + +The [`quasi_Newton`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* A [`vector_transport_to!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/)`M, Y, p, X, q)`; it is recommended to set the [`default_vector_transport_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/vector_transports/#ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `vector_transport_method=` or `vector_transport_method_dual=` (for ``\mathcal N``) does not have to be specified. +* By default quasi Newton uses [`ArmijoLinesearch`](@ref) which requires [`max_stepsize`](@ref)`(M)` to be set and an implementation of [`inner`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.inner-Tuple%7BAbstractManifold,%20Any,%20Any,%20Any%7D)`(M, p, X)`. +* the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points and similarly `copy(M, p, X)` for tangent vectors. +* By default the tangent vector storing the gradient is initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. + +Most Hessian approximations further require [`get_coordinates`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/bases/#ManifoldsBase.get_coordinates-Tuple{AbstractManifold,%20Any,%20Any,%20ManifoldsBase.AbstractBasis})`(M, p, X, b)` with respect to the [`AbstractBasis`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/bases/#ManifoldsBase.AbstractBasis) `b` provided, which is [`DefaultOrthonormalBasis`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/bases/#ManifoldsBase.AbstractOrthonormalBasis) by default from the `basis=` keyword. + + ## Literature diff --git a/docs/src/solvers/stochastic_gradient_descent.md b/docs/src/solvers/stochastic_gradient_descent.md index db9b30798a..d9168caa1f 100644 --- a/docs/src/solvers/stochastic_gradient_descent.md +++ b/docs/src/solvers/stochastic_gradient_descent.md @@ -1,4 +1,4 @@ -# [Stochastic Gradient Descent](@id StochasticGradientDescentSolver) +# [Stochastic gradient descent](@id StochasticGradientDescentSolver) ```@meta CurrentModule = Manopt @@ -23,3 +23,9 @@ The most inner one should always be. AbstractGradientGroupProcessor StochasticGradient ``` + +## [Technical details](@id sec-sgd-technical-details) + +The [`stochastic_gradient_descent`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. diff --git a/docs/src/solvers/subgradient.md b/docs/src/solvers/subgradient.md index ca2215285a..23a6a77328 100644 --- a/docs/src/solvers/subgradient.md +++ b/docs/src/solvers/subgradient.md @@ -1,4 +1,4 @@ -# [Subgradient Method](@id SubgradientSolver) +# [Subgradient method](@id sec-subgradient-method) ```@docs subgradient_method @@ -14,3 +14,10 @@ SubGradientMethodState For [`DebugAction`](@ref)s and [`RecordAction`](@ref)s to record (sub)gradient, its norm and the step sizes, see the [steepest Descent](@ref GradientDescentSolver) actions. + + +## [Technical details](@id sec-sgm-technical-details) + +The [`subgradient_method`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. diff --git a/docs/src/solvers/truncated_conjugate_gradient_descent.md b/docs/src/solvers/truncated_conjugate_gradient_descent.md index 31da535568..9a3ed63050 100644 --- a/docs/src/solvers/truncated_conjugate_gradient_descent.md +++ b/docs/src/solvers/truncated_conjugate_gradient_descent.md @@ -1,4 +1,4 @@ -# [Steihaug-Toint Truncated Conjugate-Gradient Method](@id tCG) +# [Steihaug-Toint truncated conjugate gradient method](@id tCG) Solve the constraint optimization problem on the tangent space @@ -27,7 +27,7 @@ Here ``\mathcal H_p`` is either the Hessian ``\operatorname{Hess} f(p)`` or a li TruncatedConjugateGradientState ``` -## Stopping Criteria +## Stopping criteria ```@docs StopWhenResidualIsReducedByFactorOrPower @@ -38,12 +38,21 @@ update_stopping_criterion!(::StopWhenResidualIsReducedByFactorOrPower, ::Val{:Re update_stopping_criterion!(::StopWhenResidualIsReducedByFactorOrPower, ::Val{:ResidualFactor}, ::Any) ``` -## Trust Region Model +## Trust region model ```@docs TrustRegionModelObjective ``` +## [Technical details](@id sec-tr-technical-details) + +The [`trust_regions`](@ref) solver requires the following functions of a manifold to be available + +* if you do not provide a `trust_region_radius=`, then [`injectivity_radius`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.injectivity_radius-Tuple{AbstractManifold}) on the manifold `M` is required. +* the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. +* A [`zero_vector!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,X,p)`. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. + ## Literature ```@bibliography diff --git a/docs/src/solvers/trust_regions.md b/docs/src/solvers/trust_regions.md index 8cf54a4c56..4b5d3f35b5 100644 --- a/docs/src/solvers/trust_regions.md +++ b/docs/src/solvers/trust_regions.md @@ -1,4 +1,4 @@ -# [The Riemannian Trust-Regions Solver](@id trust_regions) +# [The Riemannian trust-regions solver](@id trust_regions) Minimize a function @@ -6,11 +6,11 @@ Minimize a function \operatorname*{\arg\,min}_{p ∈ \mathcal{M}}\ f(p) ``` -by using the Riemannian trust-regions solver following [AbsilBakerGallivan:2006](@cite), -i.e. by building a lifted model at the ``k``th iterate ``p_k`` by locally mapping the +by using the Riemannian trust-regions solver following [AbsilBakerGallivan:2006](@cite) a model is build by +lifting the objective at the ``k``th iterate ``p_k`` by locally mapping the cost function ``f`` to the tangent space as ``f_k: T_{p_k}\mathcal M → \mathbb R`` as ``f_k(X) = f(\operatorname{retr}_{p_k}(X))``. -We then define the trust region subproblem as +The trust region subproblem is then defined as ```math \operatorname*{arg\,min}_{X ∈ T_{p_k}\mathcal M}\ m_k(X), @@ -45,7 +45,7 @@ TrustRegionsState ## Approximation of the Hessian -We currently provide a few different methods to approximate the Hessian. +Several different methods to approximate the Hessian are available. ```@docs ApproxHessianFiniteDifference @@ -59,6 +59,17 @@ as well as their (non-exported) common supertype Manopt.AbstractApproxHessian ``` +## [Technical details](@id sec-tr-technical-details) + +The [`trust_regions`](@ref) solver requires the following functions of a manifold to be available + +* A [`retract!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)`(M, q, p, X)`; it is recommended to set the [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a `retraction_method=` does not have to be specified. +* By default the stopping criterion uses the [`norm`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#LinearAlgebra.norm-Tuple{AbstractManifold,%20Any,%20Any}) as well, to stop when the norm of the gradient is small, but if you implemented `inner`, the norm is provided already. +* if you do not provide an initial `max_trudst_region_radius`, a [`manifold_dimension`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.manifold_dimension-Tuple{AbstractManifold}) is required. +* A [`copyto!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copyto!-Tuple{AbstractManifold,%20Any,%20Any})`(M, q, p)` and [`copy`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#Base.copy-Tuple{AbstractManifold,%20Any})`(M,p)` for points. +* By default the tangent vectors are initialized calling [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)`. + + ## Literature ```@bibliography diff --git a/docs/src/tutorials/GeodesicRegression.md b/docs/src/tutorials/GeodesicRegression.md index 8681d2ce15..db294c9e8e 100644 --- a/docs/src/tutorials/GeodesicRegression.md +++ b/docs/src/tutorials/GeodesicRegression.md @@ -55,8 +55,8 @@ p^* = d^* - t^*X^* and hence the linear regression result is the line $γ_{p^*,X^*}(t) = p^* + tX^*$. -On a Riemannian manifold we can phrase this as an optimization problem on the [tangent bundle](https://en.wikipedia.org/wiki/Tangent_bundle), -i.e. the disjoint union of all tangent spaces, as +On a Riemannian manifold we can phrase this as an optimization problem on the [tangent bundle](https://en.wikipedia.org/wiki/Tangent_bundle), which is +the disjoint union of all tangent spaces, as ``` math \operatorname*{arg\,min}_{(p,X) \in \mathrm{T}\mathcal M} F(p,X) @@ -115,7 +115,7 @@ end ``` For the Euclidean case, the result is given by the first principal component of a principal component analysis, -see [PCR](https://en.wikipedia.org/wiki/Principal_component_regression), i.e. with $p^* = \frac{1}{n}\displaystyle\sum_{i=1}^n d_i$ +see [PCR](https://en.wikipedia.org/wiki/Principal_component_regression) which is given by $p^* = \frac{1}{n}\displaystyle\sum_{i=1}^n d_i$ and the direction $X^*$ is obtained by defining the zero mean data matrix ``` math @@ -177,11 +177,11 @@ y = gradient_descent( ) ``` - Initial | F(x): 0.142862 - # 50 | F(x): 0.141113 - # 100 | F(x): 0.141113 - # 150 | F(x): 0.141113 - # 200 | F(x): 0.141113 + Initial | f(x): 0.142862 + # 50 | f(x): 0.141113 + # 100 | f(x): 0.141113 + # 150 | f(x): 0.141113 + # 200 | f(x): 0.141113 The algorithm reached its maximal number of iterations (200). ([0.7119768725361988, 0.009463059143003981, 0.7021391482357537], [0.590008151835008, -0.5543272518659472, -0.5908038715512287]) @@ -218,7 +218,7 @@ inner( 0.002487393068917863 -But we also started with one of the best scenarios, i.e. equally spaced points on a geodesic obstructed by noise. +But we also started with one of the best scenarios of equally spaced points on a geodesic obstructed by noise. This gets worse if you start with less evenly distributed data @@ -266,73 +266,73 @@ y2 = gradient_descent( ); ``` - Initial | F(x): 0.089844 - # 3 | F(x): 0.085364 - # 6 | F(x): 0.085364 - # 9 | F(x): 0.085364 - # 12 | F(x): 0.085364 - # 15 | F(x): 0.085364 - # 18 | F(x): 0.085364 - # 21 | F(x): 0.085364 - # 24 | F(x): 0.085364 - # 27 | F(x): 0.085364 - # 30 | F(x): 0.085364 - # 33 | F(x): 0.085364 - # 36 | F(x): 0.085364 - # 39 | F(x): 0.085364 - # 42 | F(x): 0.085364 - # 45 | F(x): 0.085364 - # 48 | F(x): 0.085364 - # 51 | F(x): 0.085364 - # 54 | F(x): 0.085364 - # 57 | F(x): 0.085364 - # 60 | F(x): 0.085364 - # 63 | F(x): 0.085364 - # 66 | F(x): 0.085364 - # 69 | F(x): 0.085364 - # 72 | F(x): 0.085364 - # 75 | F(x): 0.085364 - # 78 | F(x): 0.085364 - # 81 | F(x): 0.085364 - # 84 | F(x): 0.085364 - # 87 | F(x): 0.085364 - # 90 | F(x): 0.085364 - # 93 | F(x): 0.085364 - # 96 | F(x): 0.085364 - # 99 | F(x): 0.085364 - # 102 | F(x): 0.085364 - # 105 | F(x): 0.085364 - # 108 | F(x): 0.085364 - # 111 | F(x): 0.085364 - # 114 | F(x): 0.085364 - # 117 | F(x): 0.085364 - # 120 | F(x): 0.085364 - # 123 | F(x): 0.085364 - # 126 | F(x): 0.085364 - # 129 | F(x): 0.085364 - # 132 | F(x): 0.085364 - # 135 | F(x): 0.085364 - # 138 | F(x): 0.085364 - # 141 | F(x): 0.085364 - # 144 | F(x): 0.085364 - # 147 | F(x): 0.085364 - # 150 | F(x): 0.085364 - # 153 | F(x): 0.085364 - # 156 | F(x): 0.085364 - # 159 | F(x): 0.085364 - # 162 | F(x): 0.085364 - # 165 | F(x): 0.085364 - # 168 | F(x): 0.085364 - # 171 | F(x): 0.085364 - # 174 | F(x): 0.085364 - # 177 | F(x): 0.085364 - # 180 | F(x): 0.085364 - # 183 | F(x): 0.085364 - # 186 | F(x): 0.085364 - # 189 | F(x): 0.085364 - # 192 | F(x): 0.085364 - # 195 | F(x): 0.085364 - # 198 | F(x): 0.085364 + Initial | f(x): 0.089844 + # 3 | f(x): 0.085364 + # 6 | f(x): 0.085364 + # 9 | f(x): 0.085364 + # 12 | f(x): 0.085364 + # 15 | f(x): 0.085364 + # 18 | f(x): 0.085364 + # 21 | f(x): 0.085364 + # 24 | f(x): 0.085364 + # 27 | f(x): 0.085364 + # 30 | f(x): 0.085364 + # 33 | f(x): 0.085364 + # 36 | f(x): 0.085364 + # 39 | f(x): 0.085364 + # 42 | f(x): 0.085364 + # 45 | f(x): 0.085364 + # 48 | f(x): 0.085364 + # 51 | f(x): 0.085364 + # 54 | f(x): 0.085364 + # 57 | f(x): 0.085364 + # 60 | f(x): 0.085364 + # 63 | f(x): 0.085364 + # 66 | f(x): 0.085364 + # 69 | f(x): 0.085364 + # 72 | f(x): 0.085364 + # 75 | f(x): 0.085364 + # 78 | f(x): 0.085364 + # 81 | f(x): 0.085364 + # 84 | f(x): 0.085364 + # 87 | f(x): 0.085364 + # 90 | f(x): 0.085364 + # 93 | f(x): 0.085364 + # 96 | f(x): 0.085364 + # 99 | f(x): 0.085364 + # 102 | f(x): 0.085364 + # 105 | f(x): 0.085364 + # 108 | f(x): 0.085364 + # 111 | f(x): 0.085364 + # 114 | f(x): 0.085364 + # 117 | f(x): 0.085364 + # 120 | f(x): 0.085364 + # 123 | f(x): 0.085364 + # 126 | f(x): 0.085364 + # 129 | f(x): 0.085364 + # 132 | f(x): 0.085364 + # 135 | f(x): 0.085364 + # 138 | f(x): 0.085364 + # 141 | f(x): 0.085364 + # 144 | f(x): 0.085364 + # 147 | f(x): 0.085364 + # 150 | f(x): 0.085364 + # 153 | f(x): 0.085364 + # 156 | f(x): 0.085364 + # 159 | f(x): 0.085364 + # 162 | f(x): 0.085364 + # 165 | f(x): 0.085364 + # 168 | f(x): 0.085364 + # 171 | f(x): 0.085364 + # 174 | f(x): 0.085364 + # 177 | f(x): 0.085364 + # 180 | f(x): 0.085364 + # 183 | f(x): 0.085364 + # 186 | f(x): 0.085364 + # 189 | f(x): 0.085364 + # 192 | f(x): 0.085364 + # 195 | f(x): 0.085364 + # 198 | f(x): 0.085364 The algorithm reached its maximal number of iterations (200). For plotting we again generate all data @@ -350,7 +350,7 @@ geo_conn_highlighted2 = shortest_geodesic( ## Unlabeled Data -If we are not given time points $t_i$, then the optimization problem extends – informally speaking – +If we are not given time points $t_i$, then the optimization problem extends, informally speaking, to also finding the “best fitting” (in the sense of smallest error). To formalize, the objective function here reads @@ -373,7 +373,7 @@ N = M × Euclidean(length(t2)) ProductManifold with 2 submanifolds: TangentBundle(Sphere(2, ℝ)) - Euclidean(7; field = ℝ) + Euclidean(7; field=ℝ) ``` math \operatorname*{arg\,min}_{\bigl((p,X),t\bigr)\in\mathcal N} F(p, X, t). @@ -431,7 +431,7 @@ end ``` Finally, we additionally look for a fixed point $x=(p,X) ∈ \mathrm{T}\mathcal M$ at -the gradient with respect to $t∈\mathbb R^n$, i.e. the second component, which is given by +the gradient with respect to $t∈\mathbb R^n$, the second component, which is given by ``` math (\operatorname{grad}F_2(t))_i @@ -485,9 +485,9 @@ y3 = alternating_gradient_descent( ) ``` - Initial | F(x): 0.089844 - # 50 | F(x): 0.091097 - # 100 | F(x): 0.091097 + Initial | f(x): 0.089844 + # 50 | f(x): 0.091097 + # 100 | f(x): 0.091097 The algorithm reached its maximal number of iterations (100). (ArrayPartition{Float64, Tuple{Vector{Float64}, Vector{Float64}}}(([0.750222090700214, 0.031464227399200885, 0.6604368380243274], [0.6636489079535082, -0.3497538263293046, -0.737208025444054])), [0.7965909273713889, 0.43402264218923514, 0.755822122896529, 0.001059348203453764, -0.6421135044471217, -0.8635572995105818, -0.5546338813212247]) @@ -517,3 +517,4 @@ Note that the geodesics from the data to the regression geodesic meet at a nearl Pages = ["GeodesicRegression.md"] Canonical=false ``` + diff --git a/docs/src/tutorials/HowToDebug.md b/docs/src/tutorials/HowToDebug.md index fb5c1e3f6b..b5248f40b8 100644 --- a/docs/src/tutorials/HowToDebug.md +++ b/docs/src/tutorials/HowToDebug.md @@ -1,4 +1,4 @@ -# How to Print Debug Output +# How to print debug output Ronny Bergmann This tutorial aims to illustrate how to perform debug output. For that we consider an @@ -44,12 +44,12 @@ Any solver accepts the keyword `debug=`, which in the simplest case can be set t - the last number in the array is used with [`DebugEvery`](@ref) to print the debug only every $i$th iteration. - Any Symbol is converted into certain debug prints -Certain symbols starting with a capital letter are mapped to certain prints, e.g. `:Cost` is mapped to [`DebugCost`](@ref)`()` to print the current cost function value. A full list is provided in the [`DebugActionFactory`](@ref). +Certain symbols starting with a capital letter are mapped to certain prints, for example `:Cost` is mapped to [`DebugCost`](@ref)`()` to print the current cost function value. A full list is provided in the [`DebugActionFactory`](@ref). A special keyword is `:Stop`, which is only added to the final debug hook to print the stopping criterion. Any symbol with a small letter is mapped to fields of the [`AbstractManoptSolverState`](@ref) which is used. This way you can easily print internal data, if you know their names. -Let’s look at an example first: If we want to print the current iteration number, the current cost function value as well as the value `ϵ` from the [`ExactPenaltyMethodState`](@ref). To keep the amount of print at a reasonable level, we want to only print the debug every 25th iteration. +Let’s look at an example first: If we want to print the current iteration number, the current cost function value as well as the value `ϵ` from the [`ExactPenaltyMethodState`](@ref). To keep the amount of print at a reasonable level, we want to only print the debug every twentyfifth iteration. Then we can write @@ -68,13 +68,13 @@ p1 = exact_penalty_method( The value of the variable (ϵ) is smaller than or equal to its threshold (1.0e-6). The algorithm performed a step with a change (6.5347623783315016e-9) less than 1.0e-6. -## Advanced Debug output +## Advanced debug output There is two more advanced variants that can be used. The first is a tuple of a symbol and a string, where the string is used as the format print, that most [`DebugAction`](@ref)s have. The second is, to directly provide a `DebugAction`. We can for example change the way the `:ϵ` is printed by adding a format string and use [`DebugCost`](@ref)`()` which is equivalent to using `:Cost`. -Especially with the format change, the lines are more coniststent in length. +Especially with the format change, the lines are more consistent in length. ``` julia p2 = exact_penalty_method( @@ -91,7 +91,7 @@ p2 = exact_penalty_method( The value of the variable (ϵ) is smaller than or equal to its threshold (1.0e-6). The algorithm performed a step with a change (6.5347623783315016e-9) less than 1.0e-6. -You can also write your own [`DebugAction`](@ref) functor, where the function to implement has the same signature as the `step` function, that is an [`AbstractManoptProblem`](@ref), an [`AbstractManoptSolverState`](@ref), as well as the current iterate. For example the already mentioned \[`DebugDivider](@ref)`(s)\` is given as +You can also write your own [`DebugAction`](@ref) functor, where the function to implement has the same signature as the `step` function, that is an [`AbstractManoptProblem`](@ref), an [`AbstractManoptSolverState`](@ref), as well as the current iterate. For example the already mentioned[`DebugDivider`](@ref)`(s)` is given as ``` julia mutable struct DebugDivider{TIO<:IO} <: DebugAction @@ -107,7 +107,7 @@ end or you could implement that of course just for your specific problem or state. -## Subsolver Debug +## Subsolver debug most subsolvers have a `sub_kwargs` keyword, such that you can pass keywords to the sub solver as well. This works well if you do not plan to change the subsolver. If you do you can wrap your own `solver_state=` argument in a [`decorate_state!`](@ref) and pass a `debug=` password to this function call. Keywords in a keyword have to be passed as pairs (`:debug => [...]`). diff --git a/docs/src/tutorials/InplaceGradient.md b/docs/src/tutorials/InplaceGradient.md index e408fac188..eda739c598 100644 --- a/docs/src/tutorials/InplaceGradient.md +++ b/docs/src/tutorials/InplaceGradient.md @@ -1,8 +1,8 @@ -# Speedup using Inplace Evaluation +# Speedup using in-place evaluation Ronny Bergmann When it comes to time critical operations, a main ingredient in Julia is given by -mutating functions, i.e. those that compute in place without additional memory +mutating functions, that is those that compute in place without additional memory allocations. In the following, we illustrate how to do this with `Manopt.jl`. Let’s start with the same function as in [Get Started: Optimize!](https://manoptjl.org/stable/tutorials/Optimize!.html) @@ -62,7 +62,7 @@ We can also benchmark this as Time (median): 49.552 ms ┊ GC (median): 5.41% Time (mean ± σ): 50.151 ms ± 1.731 ms ┊ GC (mean ± σ): 5.56% ± 0.64% - ▂▃ █▃▃▆ ▂ + ▂▃ █▃▃▆ ▂ ▅████████▅█▇█▄▅▇▁▅█▅▇▄▇▅▁▅▄▄▄▁▄▁▁▁▄▄▁▁▁▁▁▁▄▁▁▁▁▁▁▄▁▄▁▁▁▁▁▁▄ ▄ 48.3 ms Histogram: frequency by time 56.6 ms < @@ -97,7 +97,7 @@ end For the actual call to the solver, we first have to generate an instance of `GradF!` and tell the solver, that the gradient is provided in an [`InplaceEvaluation`](https://manoptjl.org/stable/plans/objective/#Manopt.InplaceEvaluation). -We can further also use [`gradient_descent!`](https://manoptjl.org/stable/solvers/gradient_descent/#Manopt.gradient_descent!) to even work inplace of the initial point we pass. +We can further also use [`gradient_descent!`](https://manoptjl.org/stable/solvers/gradient_descent/#Manopt.gradient_descent!) to even work in-place of the initial point we pass. ``` julia grad_f2! = GradF!(data, similar(data[1])) @@ -120,7 +120,7 @@ We can again benchmark this Time (median): 28.001 ms ┊ GC (median): 0.00% Time (mean ± σ): 28.412 ms ± 1.079 ms ┊ GC (mean ± σ): 0.73% ± 2.24% - ▁▅▇█▅▂▄ ▁ + ▁▅▇█▅▂▄ ▁ ▄▁███████▆█▇█▄▆▃▃▃▃▁▁▃▁▁▃▁▃▃▁▄▁▁▃▃▁▁▄▁▁▃▅▃▃▃▁▃▃▁▁▁▁▁▁▁▁▃▁▁▃ ▃ 27.4 ms Histogram: frequency by time 31.9 ms < diff --git a/docs/styles/Vocab/Manopt/accept.txt b/docs/styles/Vocab/Manopt/accept.txt new file mode 100644 index 0000000000..8c780bb7f0 --- /dev/null +++ b/docs/styles/Vocab/Manopt/accept.txt @@ -0,0 +1,67 @@ +Absil +Adagrad +[A|a]djoint +API +Armijo +Bergmann +Chambolle +Constantin +Diepeveen +Dornig +Douglas +cubic +Frank +Frobenius +functor +geodesically +Geomstats +Geoopt +Grassmann +Hadamard +Hessian +injectivity +Jax +JuMP.jl +Levenberg +Lagrangian +Lanczos +LineSearches.jl +Manifolds.jl +ManifoldsBase.jl +[Mm]anopt(:?.org|.jl)? +Marquardt +Munkvold +Mead +Nelder +Newton +[Pp]arametrising +Parametrising +Pock +preconditioner +[Pp]rox(:?imal) +pullback +pushforward +Rachford +Ravn +reimplement +representer +Riemannian +Riemer +Riemopt +Riesz +Rosenbrock +Steihaug +Stiefel +semismooth +Stephansen +[Ss]tepsize +[Ss]ubgradient +[Ss]ubsolver +summand +supertype +th +Tom-Christian +Toint +Willem +Wolfe +vectorial \ No newline at end of file diff --git a/joss/paper.md b/joss/paper.md index 861cbb4249..fed7814785 100644 --- a/joss/paper.md +++ b/joss/paper.md @@ -67,7 +67,7 @@ In the current version 0.3.17 of `Manopt.jl` the following algorithms are availa * Conjugate Gradient Descent ([`conjugate_gradient_descent`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html)), which includes eight direction update rules using the `coefficient` keyword: [`SteepestDirectionUpdateRule`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.SteepestDirectionUpdateRule), [`ConjugateDescentCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.ConjugateDescentCoefficient). [`DaiYuanCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.DaiYuanCoefficient), [`FletcherReevesCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.FletcherReevesCoefficient), [`HagerZhangCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.HagerZhangCoefficient), [`HeestenesStiefelCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.HeestenesStiefelCoefficient), [`LiuStoreyCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.LiuStoreyCoefficient), and [`PolakRibiereCoefficient`](https://manoptjl.org/v0.3/solvers/conjugate_gradient_descent.html#Manopt.PolakRibiereCoefficient) * Cyclic Proximal Point ([`cyclic_proximal_point`](https://manoptjl.org/v0.3/solvers/cyclic_proximal_point.html)) [@Bacak:2014:1] -* (parallel) Douglas–Rachford ([`DouglasRachford`](https://manoptjl.org/v0.3/solvers/DouglasRachford.html)) [@BergmannPerschSteidl:2016:1] +* (parallel) Douglas—Rachford ([`DouglasRachford`](https://manoptjl.org/v0.3/solvers/DouglasRachford.html)) [@BergmannPerschSteidl:2016:1] * Gradient Descent ([`gradient_descent`](https://manoptjl.org/v0.3/solvers/gradient_descent.html)), including direction update rules ([`IdentityUpdateRule`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.IdentityUpdateRule) for the classical gradient descent) to perform [`MomentumGradient`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.MomentumGradient), [`AverageGradient`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.AverageGradient), and [`Nesterov`](https://manoptjl.org/v0.3/solvers/gradient_descent.html#Manopt.Nesterov) types * Nelder-Mead ([`NelderMead`](https://manoptjl.org/v0.3/solvers/NelderMead.html)) * Particle-Swarm Optimization ([`particle_swarm`](https://manoptjl.org/v0.3/solvers/particle_swarm.html)) [@BorckmansIshtevaAbsil2010] diff --git a/src/plans/primal_dual_plan.jl b/src/plans/primal_dual_plan.jl index c14571a3de..6d1177f257 100644 --- a/src/plans/primal_dual_plan.jl +++ b/src/plans/primal_dual_plan.jl @@ -679,7 +679,7 @@ function dual_residual( throw( DomainError( apds.variant, - "Unknown Chambolle–Pock variant, allowed are `:exact` or `:linearized`.", + "Unknown Chambolle—Pock variant, allowed are `:exact` or `:linearized`.", ), ) end diff --git a/src/plans/stepsize.jl b/src/plans/stepsize.jl index 800eb2aa48..27d51d675f 100644 --- a/src/plans/stepsize.jl +++ b/src/plans/stepsize.jl @@ -33,7 +33,7 @@ Get the maximum stepsize (at point `p`) on manifold `M`. It should be used to li distance an algorithm is trying to move in a single step. """ function max_stepsize(M::AbstractManifold, p) - return injectivity_radius(M, p) + return max_stepsize(M) end function max_stepsize(M::AbstractManifold) return injectivity_radius(M) diff --git a/src/solvers/ChambollePock.jl b/src/solvers/ChambollePock.jl index 6ae6fb740e..3ff8b9da55 100644 --- a/src/solvers/ChambollePock.jl +++ b/src/solvers/ChambollePock.jl @@ -173,7 +173,7 @@ end evaluation=AllocatingEvaluation() ) -Perform the Riemannian Chambolle–Pock algorithm. +Perform the Riemannian Chambolle—Pock algorithm. Given a `cost` function ``\mathcal E:\mathcal M → ℝ`` of the form ```math @@ -211,7 +211,7 @@ For more details on the algorithm, see [Bergmann et al., Found. Comput. Math., 2 * `relax` – (`:primal`) whether to relax the primal or dual * `variant` - (`:exact` if `Λ` is missing, otherwise `:linearized`) variant to use. Note that this changes the arguments the `forward_operator` will be called. -* `stopping_criterion` – (`stopAtIteration(100)`) a [`StoppingCriterion`](@ref) +* `stopping_criterion` – (`[StopAfterIteration`](@ref)`(100)`) a [`StoppingCriterion`](@ref) * `update_primal_base` – (`missing`) function to update `m` (identity by default/missing) * `update_dual_base` – (`missing`) function to update `n` (identity by default/missing) * `retraction_method` – (`default_retraction_method(M, typeof(p))`) the retraction to use @@ -260,7 +260,7 @@ end @doc raw""" ChambollePock(M, N, cost, x0, ξ0, m, n, prox_F, prox_G_dual, adjoint_linear_operator) -Perform the Riemannian Chambolle–Pock algorithm in place of `x`, `ξ`, and potentially `m`, +Perform the Riemannian Chambolle—Pock algorithm in place of `x`, `ξ`, and potentially `m`, `n` if they are not fixed. See [`ChambollePock`](@ref) for details and optional parameters. """ function ChambollePock!( diff --git a/src/solvers/DouglasRachford.jl b/src/solvers/DouglasRachford.jl index d66bf0d10e..ab96485475 100644 --- a/src/solvers/DouglasRachford.jl +++ b/src/solvers/DouglasRachford.jl @@ -175,8 +175,8 @@ If you provide a [`ManifoldProximalMapObjective`](@ref) `mpo` instead, the proxi a [`StoppingCriterion`](@ref). * `parallel` – (`false`) clarify that we are doing a parallel DR, i.e. on a `PowerManifold` manifold with two proxes. This can be used to trigger - parallel Douglas–Rachford if you enter with two proxes. Keep in mind, that a - parallel Douglas–Rachford implicitly works on a `PowerManifold` manifold and + parallel Douglas—Rachford if you enter with two proxes. Keep in mind, that a + parallel Douglas—Rachford implicitly works on a `PowerManifold` manifold and its first argument is the result then (assuming all are equal after the second prox. diff --git a/src/solvers/alternating_gradient_descent.jl b/src/solvers/alternating_gradient_descent.jl index 61235d5faa..fcc1b3d69f 100644 --- a/src/solvers/alternating_gradient_descent.jl +++ b/src/solvers/alternating_gradient_descent.jl @@ -181,10 +181,6 @@ perform an alternating gradient descent usually the obtained (approximate) minimizer, see [`get_solver_return`](@ref) for details -!!! note - - This Problem requires the `ProductManifold` from `Manifolds.jl`, so `Manifolds.jl` needs to be loaded. - !!! note The input of each of the (component) gradients is still the whole vector `X`, diff --git a/src/solvers/gradient_descent.jl b/src/solvers/gradient_descent.jl index 72861ad76d..5301b275e6 100644 --- a/src/solvers/gradient_descent.jl +++ b/src/solvers/gradient_descent.jl @@ -66,7 +66,7 @@ function GradientDescentState( p::P=rand(M); X::T=zero_vector(M, p), stopping_criterion::StoppingCriterion=StopAfterIteration(200) | - StopWhenGradientNormLess(1e-9), + StopWhenGradientNormLess(1e-8), retraction_method::AbstractRetractionMethod=default_retraction_method(M, typeof(p)), stepsize::Stepsize=default_stepsize( M, GradientDescentState; retraction_method=retraction_method @@ -126,23 +126,24 @@ with different choices of the stepsize ``s_k`` available (see `stepsize` option # Input -* `M` – a manifold ``\mathcal M`` -* `f` – a cost function ``f: \mathcal M→ℝ`` to find a minimizer ``p^*`` for +* `M` – a manifold ``\mathcal M`` +* `f` – a cost function ``f: \mathcal M→ℝ`` to find a minimizer ``p^*`` for * `grad_f` – the gradient ``\operatorname{grad}f: \mathcal M → T\mathcal M`` of f as a function `(M, p) -> X` or a function `(M, X, p) -> X` -* `p` – an initial value `p` ``= p_0 ∈ \mathcal M`` +* `p` – an initial value `p` ``= p_0 ∈ \mathcal M`` Alternatively to `f` and `grad_f` you can provide the [`AbstractManifoldGradientObjective`](@ref) `gradient_objective` directly. # Optional -* `direction` – ([`IdentityUpdateRule`](@ref)) perform a processing of the direction, e.g. -* `evaluation` – ([`AllocatingEvaluation`](@ref)) specify whether the gradient works by allocation (default) form `grad_f(M, p)` +* `direction` – ([`IdentityUpdateRule`](@ref)) perform a processing of the direction, e.g. +* `evaluation` – ([`AllocatingEvaluation`](@ref)) specify whether the gradient works by allocation (default) form `grad_f(M, p)` or [`InplaceEvaluation`](@ref) in place, i.e. is of the form `grad_f!(M, X, p)`. -* `retraction_method` – ([`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold})(M, typeof(p))`) a retraction to use -* `stepsize` – ([`default_stepsize`](@ref)`(M, GradientDescentState)`) a [`Stepsize`](@ref) +* `retraction_method` – ([`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold})(M, typeof(p))`) a retraction to use +* `stepsize` – ([`default_stepsize`](@ref)`(M, GradientDescentState)`) a [`Stepsize`](@ref) * `stopping_criterion` – ([`StopAfterIteration`](@ref)`(200) | `[`StopWhenGradientNormLess`](@ref)`(1e-8)`) a functor inheriting from [`StoppingCriterion`](@ref) indicating when to stop. +* `X` - ([`zero_vector(M,p)`]) provide memory and/or type of the gradient to use` If you provide the [`ManifoldGradientObjective`](@ref) directly, `evaluation` is ignored. @@ -206,6 +207,7 @@ p_{k+1} = \operatorname{retr}_{p_k}\bigl( s_k\operatorname{grad}f(p_k) \bigr) in place of `p` with different choices of ``s_k`` available. # Input + * `M` – a manifold ``\mathcal M`` * `f` – a cost function ``F:\mathcal M→ℝ`` to minimize * `grad_f` – the gradient ``\operatorname{grad}F:\mathcal M→ T\mathcal M`` of F @@ -237,9 +239,10 @@ function gradient_descent!( M, GradientDescentState; retraction_method=retraction_method ), stopping_criterion::StoppingCriterion=StopAfterIteration(200) | - StopWhenGradientNormLess(1e-9), + StopWhenGradientNormLess(1e-8), debug=stepsize isa ConstantStepsize ? [DebugWarnIfCostIncreases()] : [], direction=IdentityUpdateRule(), + X=zero_vector(M, p), kwargs..., #collect rest ) where {O<:Union{AbstractManifoldGradientObjective,AbstractDecoratedManifoldObjective}} dmgo = decorate_objective!(M, mgo; kwargs...) @@ -251,6 +254,7 @@ function gradient_descent!( stepsize=stepsize, direction=direction, retraction_method=retraction_method, + X=X, ) ds = decorate_state!(s; debug=debug, kwargs...) solve!(dmp, ds) diff --git a/src/solvers/trust_regions.jl b/src/solvers/trust_regions.jl index b47a9252a8..fd1a4ff665 100644 --- a/src/solvers/trust_regions.jl +++ b/src/solvers/trust_regions.jl @@ -40,8 +40,6 @@ All the following fields (besides `p`) can be set by specifying them as keywords All the following constructors have the above fields as keyword arguents with the defaults given in brackets. If no initial point `p` is provided, `p=rand(M)` is used - - TrustRegionsState(M, mho; kwargs...) TrustRegionsState(M, p, mho; kwargs...) diff --git a/tutorials/AutomaticDifferentiation.qmd b/tutorials/AutomaticDifferentiation.qmd index 2721ae2fb2..1e233667f6 100644 --- a/tutorials/AutomaticDifferentiation.qmd +++ b/tutorials/AutomaticDifferentiation.qmd @@ -10,8 +10,8 @@ While by default we use [FiniteDifferences.jl](https://juliadiff.org/FiniteDiffe In this tutorial we will take a look at a few possibilities to approximate or derive the gradient of a function $f:\mathcal M \to ℝ$ on a Riemannian manifold, without computing it yourself. There are mainly two different philosophies: -1. Working _instrinsically_, i.e. staying on the manifold and in the tangent spaces. Here, we will consider approximating the gradient by forward differences. -2. Working in an embedding – there we can use all tools from functions on Euclidean spaces – finite differences or automatic differenciation – and then compute the corresponding Riemannian gradient from there. +1. Working _intrinsically_, that is staying on the manifold and in the tangent spaces. Here, we will consider approximating the gradient by forward differences. +2. Working in an embedding where all tools from functions on Euclidean spaces can be used, like finite differences or automatic differentiation, and then compute the corresponding Riemannian gradient from there. ```{julia} #| echo: false @@ -34,7 +34,7 @@ Random.seed!(42); A first idea is to generalize (multivariate) finite differences to Riemannian manifolds. Let $X_1,\ldots,X_d ∈ T_p\mathcal M$ denote an orthonormal basis of the tangent space $T_p\mathcal M$ at the point $p∈\mathcal M$ on the Riemannian manifold. -We can generalize the notion of a directional derivative, i.e. for the “direction” $Y∈T_p\mathcal M$. Let $c\colon [-ε,ε]$, $ε>0$, be a curve with $c(0) = p$, $\dot c(0) = Y$, e.g. $c(t)= \exp_p(tY)$. We obtain +We can generalize the notion of a directional derivative to a “direction” $Y∈T_p\mathcal M$. Let $c\colon [-ε,ε]$, $ε>0$, be a curve with $c(0) = p$, $\dot c(0) = Y$, for example $c(t)= \exp_p(tY)$. We obtain ```math Df(p)[Y] = \left. \frac{d}{dt} \right|_{t=0} f(c(t)) = \lim_{t \to 0} \frac{1}{t}(f(\exp_p(tY))-f(p)) @@ -46,7 +46,7 @@ We can approximate $Df(p)[X]$ by a finite difference scheme for an $h>0$ as DF(p)[Y] ≈ G_h(Y) := \frac{1}{h}(f(\exp_p(hY))-f(p)) ``` -Furthermore the gradient $\operatorname{grad}f$ is the Riesz representer of the differential, ie. +Furthermore the gradient $\operatorname{grad}f$ is the Riesz representer of the differential: ```math Df(p)[Y] = g_p(\operatorname{grad}f(p), Y),\qquad \text{ for all } Y ∈ T_p\mathcal M @@ -141,7 +141,7 @@ or in words: we have to change the Riesz representer of the (restricted/projecte ### A Continued Example -We continue with the Rayleigh Quotient from before, now just starting with the defintion of the Euclidean case in the embedding, the function $F$. +We continue with the Rayleigh Quotient from before, now just starting with the definition of the Euclidean case in the embedding, the function $F$. ```{julia} F(x) = x' * A * x / (x' * x); @@ -165,11 +165,11 @@ X3 = grad_f2_AD(M, p) norm(M, p, X1 - X3) ``` -### An Example for a Nonisometrically Embedded Manifold +### An Example for a Non-isometrically Embedded Manifold on the manifold $\mathcal P(3)$ of symmetric positive definite matrices. -The following function computes (half) the distance squared (with respect to the linear affine metric) on the manifold $\mathcal P(3)$ to the identity, i.e. $I_3$. Denoting the unit matrix we consider the function +The following function computes (half) the distance squared (with respect to the linear affine metric) on the manifold $\mathcal P(3)$ to the identity matrix $I_3$. Denoting the unit matrix we consider the function ```math G(q) @@ -214,7 +214,7 @@ end G1 = grad_G_FD(N, q) ``` -Now, we can again compare this to the (known) solution of the gradient, namely the gradient of (half of) the distance squared, i.e. $G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3)$ is given by $\operatorname{grad} G(q) = -\operatorname{log}_q I_3$, where $\operatorname{log}$ is the [logarithmic map](https://juliamanifolds.github.io/Manifolds.jl/latest/manifolds/symmetricpositivedefinite.html#Base.log-Tuple{SymmetricPositiveDefinite,%20Vararg{Any,%20N}%20where%20N}) on the manifold. +Now, we can again compare this to the (known) solution of the gradient, namely the gradient of (half of) the distance squared $G(q) = \frac{1}{2}d^2_{\mathcal P(3)}(q,I_3)$ is given by $\operatorname{grad} G(q) = -\operatorname{log}_q I_3$, where $\operatorname{log}$ is the [logarithmic map](https://juliamanifolds.github.io/Manifolds.jl/latest/manifolds/symmetricpositivedefinite.html#Base.log-Tuple{SymmetricPositiveDefinite,%20Vararg{Any,%20N}%20where%20N}) on the manifold. ```{julia} G2 = -log(N, q, Matrix{Float64}(I, 3, 3)) diff --git a/tutorials/ConstrainedOptimization.qmd b/tutorials/ConstrainedOptimization.qmd index 339b2934f6..4cd4659397 100644 --- a/tutorials/ConstrainedOptimization.qmd +++ b/tutorials/ConstrainedOptimization.qmd @@ -16,7 +16,7 @@ A constraint optimisation problem is given by &\quad h(p) = 0,\\ \end{align*} ``` -where $f\colon \mathcal M → ℝ$ is a cost function, and $g\colon \mathcal M → ℝ^m$ and $h\colon \mathcal M → ℝ^n$ are the inequality and equality constraints, respectively. The $\leq$ and $=$ in (P) are meant elementwise. +where $f\colon \mathcal M → ℝ$ is a cost function, and $g\colon \mathcal M → ℝ^m$ and $h\colon \mathcal M → ℝ^n$ are the inequality and equality constraints, respectively. The $\leq$ and $=$ in (P) are meant element-wise. This can be seen as a balance between moving constraints into the geometry of a manifold $\mathcal M$ and keeping some, since they can be handled well in algorithms, see [BergmannHerzog:2019](@cite), [LiuBoumal:2019](@cite) for details. @@ -34,7 +34,7 @@ using Distributions, LinearAlgebra, Manifolds, Manopt, Random Random.seed!(42); ``` -In this tutorial we want to look at different ways to specify the problem and its implications. We start with specifying an example problems to illustrayte the different available forms. +In this tutorial we want to look at different ways to specify the problem and its implications. We start with specifying an example problems to illustrate the different available forms. We will consider the problem of a Nonnegative PCA, cf. Section 5.1.2 in [LiuBoumal:2019](@cite) @@ -44,7 +44,7 @@ let $v_0 ∈ ℝ^d$, $\lVert v_0 \rVert=1$ be given spike signal, that is a sign Z = \sqrt{σ} v_0v_0^{\mathrm{T}}+N, ``` -where $\sigma$ is a signal-to-noise ratio and $N$ is a matrix with random entries, where the diagonal entries are distributed with zero mean and standard deviation $1/d$ on the off-diagonals and $2/d$ on the daigonal +where $\sigma$ is a signal-to-noise ratio and $N$ is a matrix with random entries, where the diagonal entries are distributed with zero mean and standard deviation $1/d$ on the off-diagonals and $2/d$ on the diagonal ```{julia} d = 150; # dimension of v0 @@ -71,7 +71,7 @@ or in the previous notation $f(p) = -p^{\mathrm{T}}Zp^{\mathrm{T}}$ and $g(p) = M = Sphere(d - 1) ``` -## A first Augmented Lagrangian Run +## A first augmented Lagrangian run We first defined $f$ and $g$ as usual functions @@ -134,10 +134,9 @@ f(M, v1) maximum( g(M, v1) ) ``` -## A faster Augmented Lagrangian Run +## A faster augmented Lagrangian run - -Now this is a little slow, so we can modify two things, that we will directly do both – but one could also just change one of these – : +Now this is a little slow, so we can modify two things: 1. Gradients should be evaluated in place, so for example @@ -183,9 +182,9 @@ maximum(g(M, v2)) These are the very similar to the previous values but the solver took much less time and less memory allocations. -## Exact Penalty Method +## Exact penalty method -As a second solver, we have the [Exact Penalty Method](https://manoptjl.org/stable/solvers/exact_penalty_method/), which currenlty is available with two smoothing variants, which make an inner solver for smooth optimisationm, that is by default again [quasi Newton] possible: +As a second solver, we have the [Exact Penalty Method](https://manoptjl.org/stable/solvers/exact_penalty_method/), which currently is available with two smoothing variants, which make an inner solver for smooth optimization, that is by default again [quasi Newton] possible: [`LogarithmicSumOfExponentials`](https://manoptjl.org/stable/solvers/exact_penalty_method/#Manopt.LogarithmicSumOfExponentials) and [`LinearQuadraticHuber`](https://manoptjl.org/stable/solvers/exact_penalty_method/#Manopt.LinearQuadraticHuber). We compare both here as well. The first smoothing technique is the default, so we can just call @@ -227,9 +226,9 @@ f(M, v4) maximum(g(M, v4)) ``` -## Comparing to the unconstraint solver +## Comparing to the unconstrained solver -We can compare this to the _global_ optimum on the sphere, which is the unconstraint optimisation problem; we can just use Quasi Newton. +We can compare this to the _global_ optimum on the sphere, which is the unconstrained optimisation problem, where we can just use Quasi Newton. Note that this is much faster, since every iteration of the algorithms above does a quasi-Newton call as well. @@ -243,7 +242,7 @@ Note that this is much faster, since every iteration of the algorithms above doe f(M, w1) ``` -But for sure here the constraints here are not fulfilled and we have veru positive entries in $g(w_1)$ +But for sure here the constraints here are not fulfilled and we have quite positive entries in $g(w_1)$ ```{julia} maximum(g(M, w1)) diff --git a/tutorials/CountAndCache.qmd b/tutorials/CountAndCache.qmd index 60eb12ea41..8aa02dd3c9 100644 --- a/tutorials/CountAndCache.qmd +++ b/tutorials/CountAndCache.qmd @@ -1,9 +1,9 @@ --- -title: "How to Count and Cache Function Calls" +title: "How to count and cache function calls" author: Ronny Bergmann --- -In this tutorial, we want to investigate the caching and counting (i.e. statistics) features +In this tutorial, we want to investigate the caching and counting (statistics) features of [Manopt.jl](https://manoptjl.org). We will reuse the optimization tasks from the introductory tutorial [Get Started: Optimize!](https://manoptjl.org/stable/tutorials/Optimize!.html). @@ -56,7 +56,6 @@ the _least recently used_ strategy for our caches. using Pkg; cd(@__DIR__) Pkg.activate("."); # for reproducibility use the local tutorial environment. -Pkg.develop(path="../") # a trick to work on the local dev version ``` @@ -165,7 +164,7 @@ But since both the cost and the gradient require the computation of the matrix-v ### The [`ManifoldCostGradientObjective`](@ref) approach -The [`ManifoldCostGradientObjective`](@ref) uses a combined function to compute both the gradient and the cost at the same time. We define the inplace variant as +The [`ManifoldCostGradientObjective`](@ref) uses a combined function to compute both the gradient and the cost at the same time. We define the in-place variant as ```{julia} function g_grad_g!(M::AbstractManifold, X, p) @@ -223,7 +222,7 @@ An alternative to the previous approach is the usage of a functor that introduce of the result of computing `A*p`. We additionally have to store `p` though, since we have to check that we are still evaluating the cost and/or gradient at the same point at which the cached `A*p` was computed. -We again consider the (more efficient) inplace variant. +We again consider the (more efficient) in-place variant. This can be done as follows ```{julia} @@ -339,5 +338,5 @@ it is about the same effort both time and allocation-wise. ## Summary While the second approach of [`ManifoldCostGradientObjective`](@ref) is very easy to implement, both the storage and the (local) cache approach are more efficient. -All three are an improvement over the first implementation without sharing interms results. -The results with storage or cache have further advantage of being more flexible, i.e. the stored information could also be reused in a third function, for example when also computing the Hessian. \ No newline at end of file +All three are an improvement over the first implementation without sharing interim results. +The results with storage or cache have further advantage of being more flexible, since the stored information could also be reused in a third function, for example when also computing the Hessian. \ No newline at end of file diff --git a/tutorials/EmbeddingObjectives.qmd b/tutorials/EmbeddingObjectives.qmd index dada11ba4a..e4a0a17469 100644 --- a/tutorials/EmbeddingObjectives.qmd +++ b/tutorials/EmbeddingObjectives.qmd @@ -7,7 +7,7 @@ Specifying a cost function $f\colon \mathcal M \to \mathbb R$ on a manifold is usually the model one starts with. Specifying its gradient $\operatorname{grad} f\colon\mathcal M \to T\mathcal M$, or more precisely $\operatorname{grad}f(p) \in T_p\mathcal M$, and eventually a Hessian $\operatorname{Hess} f\colon T_p\mathcal M \to T_p\mathcal M$ are then necessary to perform optimization. Since these might be challenging to compute, especially when manifolds and differential geometry are not -the main area of a user – easier to use methods might be welcome. +the main area of a user, easier to use methods might be welcome. This tutorial discusses how to specify $f$ in the embedding as $\tilde f$, maybe only locally around the manifold, and use the Euclidean gradient $∇ \tilde f$ and Hessian $∇^2 \tilde f$ within `Manopt.jl`. @@ -86,12 +86,12 @@ and the [`check_Hessian`](@ref), which requires a bit more tolerance in its line check_Hessian(M, f, grad_f, Hess_f; plot=true, throw_error=true, atol=1e-15) ``` -While they look reasonable here and were already derived – for the general case this derivation +While they look reasonable here and were already derived, for the general case this derivation might be more complicated. Luckily there exist two functions in [`ManifoldDiff.jl`](https://juliamanifolds.github.io/ManifoldDiff.jl/stable/) that are implemented for several manifolds from [`Manifolds.jl`](https://github.com/JuliaManifolds/Manifolds.jl), namely [`riemannian_gradient`](https://juliamanifolds.github.io/ManifoldDiff.jl/stable/library/#ManifoldDiff.riemannian_gradient-Tuple{AbstractManifold,%20Any,%20Any})`(M, p, eG)` that converts a Riemannian gradient -`eG=`$\nabla \tilde f(p)$ into a the Riemannain one $\operatorname{grad} f(p)$ +`eG=`$\nabla \tilde f(p)$ into a the Riemannian one $\operatorname{grad} f(p)$ and [`riemannian_Hessian`](https://juliamanifolds.github.io/ManifoldDiff.jl/stable/library/#ManifoldDiff.riemannian_Hessian-Tuple{AbstractManifold,%20Any,%20Any,%20Any,%20Any})`(M, p, eG, eH, X)` which converts the Euclidean Hessian `eH=`$\nabla^2 \tilde f(p)[X]$ into $\operatorname{Hess} f(p)[X]$, where we also require the Euclidean gradient `eG=`$\nabla \tilde f(p)$. @@ -181,7 +181,7 @@ distance(M, q1, q2) This conversion also works for the gradients of constraints, and is passed down to -subsolvers by deault when these are created using the Euclidean objective $f$, $\nabla f$ and $\nabla^2 f$. +subsolvers by default when these are created using the Euclidean objective $f$, $\nabla f$ and $\nabla^2 f$. ## Summary diff --git a/tutorials/GeodesicRegression.qmd b/tutorials/GeodesicRegression.qmd index 4f9906784e..f809c650d3 100644 --- a/tutorials/GeodesicRegression.qmd +++ b/tutorials/GeodesicRegression.qmd @@ -77,7 +77,7 @@ render_asymptote(img_folder * "/regression_data.asy"; render=render_size); ## Time Labeled Data If for each data item $d_i$ we are also given a time point $t_i\in\mathbb R$, which are pairwise different, -then we can use the least squares error to state the objetive function as [Fletcher:2013](@cite) +then we can use the least squares error to state the objective function as [Fletcher:2013](@cite) ```math F(p,X) = \frac{1}{2}\sum_{i=1}^n d_{\mathcal M}^2(γ_{p,X}(t_i), d_i), @@ -98,8 +98,8 @@ p^* = d^* - t^*X^* and hence the linear regression result is the line $γ_{p^*,X^*}(t) = p^* + tX^*$. -On a Riemannian manifold we can phrase this as an optimization problem on the [tangent bundle](https://en.wikipedia.org/wiki/Tangent_bundle), -i.e. the disjoint union of all tangent spaces, as +On a Riemannian manifold we can phrase this as an optimization problem on the [tangent bundle](https://en.wikipedia.org/wiki/Tangent_bundle), which is +the disjoint union of all tangent spaces, as ```math \operatorname*{arg\,min}_{(p,X) \in \mathrm{T}\mathcal M} F(p,X) @@ -158,7 +158,7 @@ end ``` For the Euclidean case, the result is given by the first principal component of a principal component analysis, -see [PCR](https://en.wikipedia.org/wiki/Principal_component_regression), i.e. with $p^* = \frac{1}{n}\displaystyle\sum_{i=1}^n d_i$ +see [PCR](https://en.wikipedia.org/wiki/Principal_component_regression) which is given by $p^* = \frac{1}{n}\displaystyle\sum_{i=1}^n d_i$ and the direction $X^*$ is obtained by defining the zero mean data matrix ```math @@ -261,7 +261,7 @@ inner( ) ``` -But we also started with one of the best scenarios, i.e. equally spaced points on a geodesic obstructed by noise. +But we also started with one of the best scenarios of equally spaced points on a geodesic obstructed by noise. This gets worse if you start with less evenly distributed data @@ -336,7 +336,7 @@ render_asymptote(img_folder * "/regression_result2.asy"; render=render_size); ## Unlabeled Data -If we are not given time points $t_i$, then the optimization problem extends – informally speaking – +If we are not given time points $t_i$, then the optimization problem extends, informally speaking, to also finding the “best fitting” (in the sense of smallest error). To formalize, the objective function here reads @@ -349,7 +349,7 @@ where $t = (t_1,\ldots,t_n) \in \mathbb R^n$ is now an additional parameter of t We write $F_1(p, X)$ to refer to the function on the tangent bundle for fixed values of $t$ (as the one in the last part) and $F_2(t)$ for the function $F(p, X, t)$ as a function in $t$ with fixed values $(p, X)$. -For the Euclidean case, there is no neccessity to optimize with respect to $t$, as we saw +For the Euclidean case, there is no necessity to optimize with respect to $t$, as we saw above for the initialization of the fixed time points. On a Riemannian manifold this can be stated as a problem on the product manifold $\mathcal N = \mathrm{T}\mathcal M \times \mathbb R^n$, i.e. @@ -363,7 +363,7 @@ N = M × Euclidean(length(t2)) ``` In this tutorial we present an approach to solve this using an alternating gradient descent scheme. -To be precise, we define the cost funcion now on the product manifold +To be precise, we define the cost function now on the product manifold ```{julia} struct RegressionCost2{T} @@ -414,8 +414,8 @@ function (a::RegressionGradient2a!)(N, Y, x) end ``` -Finally, we addionally look for a fixed point $x=(p,X) ∈ \mathrm{T}\mathcal M$ at -the gradient with respect to $t∈\mathbb R^n$, i.e. the second component, which is given by +Finally, we additionally look for a fixed point $x=(p,X) ∈ \mathrm{T}\mathcal M$ at +the gradient with respect to $t∈\mathbb R^n$, the second component, which is given by ```math (\operatorname{grad}F_2(t))_i diff --git a/tutorials/HowToDebug.qmd b/tutorials/HowToDebug.qmd index 4d703c6b66..fd460df2e2 100644 --- a/tutorials/HowToDebug.qmd +++ b/tutorials/HowToDebug.qmd @@ -1,5 +1,5 @@ --- -title: "How to Print Debug Output" +title: "How to print debug output" author: Ronny Bergmann --- @@ -56,12 +56,12 @@ Any solver accepts the keyword `debug=`, which in the simplest case can be set t * the last number in the array is used with [`DebugEvery`](@ref) to print the debug only every $i$th iteration. * Any Symbol is converted into certain debug prints -Certain symbols starting with a capital letter are mapped to certain prints, e.g. `:Cost` is mapped to [`DebugCost`](@ref)`()` to print the current cost function value. A full list is provided in the [`DebugActionFactory`](@ref). +Certain symbols starting with a capital letter are mapped to certain prints, for example `:Cost` is mapped to [`DebugCost`](@ref)`()` to print the current cost function value. A full list is provided in the [`DebugActionFactory`](@ref). A special keyword is `:Stop`, which is only added to the final debug hook to print the stopping criterion. Any symbol with a small letter is mapped to fields of the [`AbstractManoptSolverState`](@ref) which is used. This way you can easily print internal data, if you know their names. -Let's look at an example first: If we want to print the current iteration number, the current cost function value as well as the value `ϵ` from the [`ExactPenaltyMethodState`](@ref). To keep the amount of print at a reasonable level, we want to only print the debug every 25th iteration. +Let's look at an example first: If we want to print the current iteration number, the current cost function value as well as the value `ϵ` from the [`ExactPenaltyMethodState`](@ref). To keep the amount of print at a reasonable level, we want to only print the debug every twentyfifth iteration. Then we can write @@ -72,13 +72,13 @@ p1 = exact_penalty_method( ); ``` -## Advanced Debug output +## Advanced debug output There is two more advanced variants that can be used. The first is a tuple of a symbol and a string, where the string is used as the format print, that most [`DebugAction`](@ref)s have. The second is, to directly provide a `DebugAction`. We can for example change the way the `:ϵ` is printed by adding a format string and use [`DebugCost`](@ref)`()` which is equivalent to using `:Cost`. -Especially with the format change, the lines are more coniststent in length. +Especially with the format change, the lines are more consistent in length. ```{julia} @@ -88,7 +88,7 @@ p2 = exact_penalty_method( ); ``` -You can also write your own [`DebugAction`](@ref) functor, where the function to implement has the same signature as the `step` function, that is an [`AbstractManoptProblem`](@ref), an [`AbstractManoptSolverState`](@ref), as well as the current iterate. For example the already mentioned [`DebugDivider](@ref)`(s)` is given as +You can also write your own [`DebugAction`](@ref) functor, where the function to implement has the same signature as the `step` function, that is an [`AbstractManoptProblem`](@ref), an [`AbstractManoptSolverState`](@ref), as well as the current iterate. For example the already mentioned[`DebugDivider`](@ref)`(s)` is given as ```{julia} #| eval: false @@ -105,7 +105,7 @@ end or you could implement that of course just for your specific problem or state. -## Subsolver Debug +## Subsolver debug most subsolvers have a `sub_kwargs` keyword, such that you can pass keywords to the sub solver as well. This works well if you do not plan to change the subsolver. If you do you can wrap your own `solver_state=` argument in a [`decorate_state!`](@ref) and pass a `debug=` password to this function call. Keywords in a keyword have to be passed as pairs (`:debug => [...]`). diff --git a/tutorials/HowToRecord.qmd b/tutorials/HowToRecord.qmd index b940739dd7..02a291c367 100644 --- a/tutorials/HowToRecord.qmd +++ b/tutorials/HowToRecord.qmd @@ -1,5 +1,5 @@ --- -title: "How to Record Data During the Iterations" +title: "How to record data during the iterations" author: Ronny Bergmann --- @@ -80,8 +80,8 @@ To record more than one value, you can pass an array of a mix of symbols and [`R R2 = gradient_descent(M, f, grad_f, data[1]; record=[:Iteration, :Cost], return_state=true) ``` -Here, the symbol `:Cost` is mapped to using the [`RecordCost`](https://manoptjl.org/stable/plans/record/#Manopt.RecordCost) action. The same holds for `:Iteration` obiously records the current iteration number `i`. -To access these you can first extract the group of records (that is where the `:Iteration`s are recorded – note the plural) and then access the `:Cost` +Here, the symbol `:Cost` is mapped to using the [`RecordCost`](https://manoptjl.org/stable/plans/record/#Manopt.RecordCost) action. The same holds for `:Iteration` obviously records the current iteration number `i`. +To access these you can first extract the group of records (that is where the `:Iteration`s are recorded; note the plural) and then access the `:Cost` """ ```{julia} @@ -109,13 +109,14 @@ We can also pass a tuple as second argument to have our own order within the tup get_record(R2, :Iteration, (:Iteration, :Cost)) ``` -## A more Complex Example +## A more complex example To illustrate a complicated example let's record: + * the iteration number, cost and gradient field, but only every sixth iteration; * the iteration at which we stop. -We first generate the problem and the state, to also illustrate the low-level works when not using the high-level iterface [`gradient_descent`](https://manoptjl.org/stable/solvers/gradient_descent.html). +We first generate the problem and the state, to also illustrate the low-level works when not using the high-level interface [`gradient_descent`](https://manoptjl.org/stable/solvers/gradient_descent.html). ```{julia} p = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) @@ -126,7 +127,7 @@ s = GradientDescentState( ) ``` -We now first build a [`RecordGroup`](https://manoptjl.org/stable/plans/record/#Manopt.RecordGroup) to group the three entries we want to record per iteration. We then put this into a [`RecordEvery`](https://manoptjl.org/stable/plans/record/#Manopt.RecordEvery) to only record this every 6th iteration +We now first build a [`RecordGroup`](https://manoptjl.org/stable/plans/record/#Manopt.RecordGroup) to group the three entries we want to record per iteration. We then put this into a [`RecordEvery`](https://manoptjl.org/stable/plans/record/#Manopt.RecordEvery) to only record this every sixth iteration ```{julia} rI = RecordEvery( @@ -139,13 +140,13 @@ rI = RecordEvery( ) ``` -and for recodring the final iteration number +and for recording the final iteration number ```{julia} sI = RecordIteration() ``` -We now combine both into the [`RecordSolverState`](https://manoptjl.org/stable/plans/record/#Manopt.RecordSolverState) decorator. It acts completely the same as any [`AbstractManoptSolverState`](https://manoptjl.org/stable/plans/state/#Manopt.AbstractManoptSolverState) but records something in every iteration additionally. This is stored in a dictionary of [`RecordAction`](https://manoptjl.org/stable/plans/record/#Manopt.RecordAction)s, where `:Iteration` is the action (here the only every 6th iteration group) and the `sI` which is executed at stop. +We now combine both into the [`RecordSolverState`](https://manoptjl.org/stable/plans/record/#Manopt.RecordSolverState) decorator. It acts completely the same as any [`AbstractManoptSolverState`](https://manoptjl.org/stable/plans/state/#Manopt.AbstractManoptSolverState) but records something in every iteration additionally. This is stored in a dictionary of [`RecordAction`](https://manoptjl.org/stable/plans/record/#Manopt.RecordAction)s, where `:Iteration` is the action (here the only every sixth iteration group) and the `sI` which is executed at stop. Note that the keyword `record=` in the high level interface `gradient_descent` only would fill the `:Iteration` symbol of said dictionary. @@ -189,7 +190,7 @@ function (c::MyCost)(M, x) end ``` -and we define an own, new [`RecordAction`](https://manoptjl.org/stable/plans/record/#Manopt.RecordAction), which is a functor, i.e. a struct that is also a function. The function we have to implement is similar to a single solver step in signature, since it might get called every iteration: +and we define an own, new [`RecordAction`](https://manoptjl.org/stable/plans/record/#Manopt.RecordAction), which is a functor, that is a struct that is also a function. The function we have to implement is similar to a single solver step in signature, since it might get called every iteration: ```{julia} mutable struct RecordCount <: RecordAction @@ -206,7 +207,7 @@ end ``` Now we can initialize the new cost and call the gradient descent. -Note that this illustrates also the last use case – you can pass symbol-action pairs into the `record=`array. +Note that this illustrates also the last use case since you can pass symbol-action pairs into the `record=`array. ```{julia} f2 = MyCost(data) @@ -244,7 +245,7 @@ R3[:Iteration, :Count] and we see that the cost function is called once per iteration. -If we use this counting cost and run the default gradient descent with Armijo linesearch, we can infer how many Armijo linesearch backtracks are preformed: +If we use this counting cost and run the default gradient descent with Armijo line search, we can infer how many Armijo line search backtracks are preformed: ```{julia} f3 = MyCost(data) @@ -267,4 +268,4 @@ R4 = gradient_descent( get_record(R4) ``` -We can see that the number of cost function calls varies, depending on how many linesearch backtrack steps were required to obtain a good stepsize. +We can see that the number of cost function calls varies, depending on how many line search backtrack steps were required to obtain a good stepsize. diff --git a/tutorials/ImplementASolver.qmd b/tutorials/ImplementASolver.qmd index 228e47eaba..12b5becc7b 100644 --- a/tutorials/ImplementASolver.qmd +++ b/tutorials/ImplementASolver.qmd @@ -8,7 +8,7 @@ tutorial [Get Started: Optimize!](https://manoptjl.org/stable/tutorials/Optimize you might come to the idea of implementing a solver yourself. After a short introduction of the algorithm we will implement, -this tutorial first discusses the structural details, i.e. what a solver consists of and “works with”. +this tutorial first discusses the structural details, for example what a solver consists of and “works with”. Afterwards, we will show how to implement the algorithm. Finally, we will discuss how to make the algorithm both nice for the user as well as initialized in a way, that it can benefit from features already available in `Manopt.jl`. @@ -55,16 +55,16 @@ We can run the following steps of the algorithm 2. set our best point $q = p^{(0)}$ 2. Repeat until a stopping criterion is fulfilled 1. Choose a random tangent vector $X^{(k)} \in T_{p^{(k)}}\mathcal M$ of length $\lVert X^{(k)} \rVert = \sigma$ - 2. “Walk” along this direction, i.e. $p^{(k+1)} = \operatorname{retr}_{p^{(k)}}(X^{(k)})$ + 2. “Walk” along this direction, that is $p^{(k+1)} = \operatorname{retr}_{p^{(k)}}(X^{(k)})$ 3. If $f(p^{(k+1)}) < f(q)$ set q = p^{(k+1)}$ as our new best visited point 4. Return $q$ as the resulting best point we visited -## Preliminaries – Elements a Solver works on +## Preliminaries: elements a solver works on There are two main ingredients a solver needs: a problem to work on and the state of a solver, which “identifies” the solver and stores intermediate results. -### The “Task” – An `AbstractManoptProblem` +### The “task”: an `AbstractManoptProblem` A problem in `Manopt.jl` usually consists of a manifold (an [`AbstractManifold`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/types.html#The-AbstractManifold)) and an [`AbstractManifoldObjective`](@ref) describing the function we have and its features. @@ -73,14 +73,14 @@ In our case the objective is (just) a [`ManifoldCostObjective`](@ref) that store or any other information we have about our task. This is something independent of the solver itself, since it only identifies the problem we -want to solve independent of how we want to solve it – or in other words, this type contains +want to solve independent of how we want to solve it, or in other words, this type contains all information that is static and independent of the specific solver at hand. Usually the problems variable is called `mp`. -### The Solver – An `AbstractManoptSolverState` +### The solver: an `AbstractManoptSolverState` -Everything that is needed by a solver during the iterations, all its parameters, interims +Everything that is needed by a solver during the iterations, all its parameters, interim values that are needed beyond just one iteration, is stored in a subtype of the [`AbstractManoptSolverState`](@ref). This identifies the solver uniquely. @@ -90,7 +90,7 @@ In our case we want to store five things - the best visited point $q$ - the variable $\sigma > 0$ - the retraction $\operatorname{retr}$ to use (cf. [retractions and inverse retractions](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions.html)) -- a criterion, when to stop, i.e. a [`StoppingCriterion`](@ref) +- a criterion, when to stop: a [`StoppingCriterion`](@ref) We can defined this as @@ -111,7 +111,7 @@ end The stopping criterion is usually stored in the state's `stop` field. If you have a reason to do otherwise, you have one more function to implement (see next section). -For ease of use, we can provide a constructor, that for example chooses a good default for +For ease of use, a constructor can be provided, that for example chooses a good default for the retraction based on a given manifold. ```{julia} @@ -131,7 +131,7 @@ in `Manopt.jl` and provide an easy way to construct this state now. States usually have a shortened name as their variable, we will use `rws` for our state here. -## Implementing the Your solver +## Implementing your solver There is basically only two methods we need to implement for our solver @@ -158,9 +158,9 @@ If your choice is different, you need to reimplement - `get_iterate(rws)` to access the current iterate We recommend to follow the general scheme with the `stop` field. If you have specific criteria -when to stop, consider implementing your own [stoping criterion](https://manoptjl.org/stable/plans/stopping_criteria/) instead. +when to stop, consider implementing your own [stopping criterion](https://manoptjl.org/stable/plans/stopping_criteria/) instead. -### Initialization & Iterate Access +### Initialization and iterate access For our solver, there is not so much to initialize, just to be safe we should copy over the initial value in `p` we start with, to `q`. We do not have to care about remembering the iterate, @@ -202,7 +202,7 @@ We could also store the cost of `q` in the state, but we will see how to easily this solver to allow for [caching](https://manoptjl.org/stable/tutorials/CountAndCache/#How-to-Count-and-Cache-Function-Calls). In practice, however, it is preferable to cache intermediate values like cost of `q` in the state when it can be easily achieved. This way we do not have to deal with overheads of an external cache. -Now we can just run the solver already! We take the same example as for the other tutorials +Now we can just run the solver already. We take the same example as for the other tutorials We first define our task, the Riemannian Center of Mass from the [Get Started: Optimize!](https://manoptjl.org/stable/tutorials/Optimize!.html) tutorial. @@ -240,7 +240,7 @@ solve!(mp, s2) get_solver_result(s2) ``` -## Ease of Use I: The high level interface(s) +## Ease of use I: a high level interface `Manopt.jl` offers a few additional features for solvers in their high level interfaces, for example ``[`debug=` for debug](@ref DebugSection)``{=commonmark}, ``[`record=`](@ref RecordSection)``{=commonmark} keywords for debug and recording @@ -255,12 +255,13 @@ using Manopt: get_solver_return, indicates_convergence, status_summary ### A high level interface using the objective -This could be considered as an interims step to the high-level interface: -If we already have the objective – in our case a [`ManifoldCostObjective`](@ref) at hand, the high level interface consists of the steps +This could be considered as an interim step to the high-level interface: +If objective, a [`ManifoldCostObjective`](@ref) is already initialized, +the high level interface consists of the steps 1. possibly decorate the objective 2. generate the problem -3. generate and possiblz generate the state +3. generate and possibly generate the state 4. call the solver 5. determine the return value @@ -311,6 +312,7 @@ about the reason it stopped and whether this indicates convergence. Here it would for example look like ```{julia} +#| output: false import Base: show function show(io::IO, rws::RandomWalkState) i = get_count(rws, :Iterations) @@ -330,8 +332,8 @@ function show(io::IO, rws::RandomWalkState) end ``` -Now the algorithm can be easily called and provides – if wanted – all features of a `Manopt.jl` -algorithm. For example to see the summary, we could now just call +Now the algorithm can be easily called and provides all features of a `Manopt.jl` algorithm. +For example to see the summary, we could now just call ```{julia} q = random_walk_algorithm!(M, f; return_state=true) @@ -341,4 +343,5 @@ q = random_walk_algorithm!(M, f; return_state=true) We saw in this tutorial how to implement a simple cost-based algorithm, to illustrate how optimization algorithms are covered in `Manopt.jl`. -One feature we did not cover is that most algorithms allow for inplace and allocation functions, as soon as they work on more than just the cost, e.g. gradients, proximal maps or Hessians. This is usually a keyword argument of the objective and hence also part of the high-level interfaces. \ No newline at end of file +One feature we did not cover is that most algorithms allow for in-place and allocation functions, as soon as they work on more than just the cost, for example use gradients, proximal maps or Hessians. +This is usually a keyword argument of the objective and hence also part of the high-level interfaces. \ No newline at end of file diff --git a/tutorials/ImplementOwnManifold.qmd b/tutorials/ImplementOwnManifold.qmd new file mode 100644 index 0000000000..60671d5728 --- /dev/null +++ b/tutorials/ImplementOwnManifold.qmd @@ -0,0 +1,244 @@ +--- +title: "Optimize on your own manifold" +author: Ronny Bergmann +--- + +````{=commonmark} +```@meta +CurrentModule = Manopt +``` +```` + +When you have used a few solvers from [`Manopt.jl`](https://manoptjl.org/) for example like in the opening +tutorial [🏔️ Get Started: Optimize!](https://manoptjl.org/stable/tutorials/Optimize!.html) +and also familiarized yourself with how to work with manifolds in general at +[🚀 Get Started with `Manifolds.jl`](https://juliamanifolds.github.io/Manifolds.jl/stable/tutorials/getstarted.html), +you might come across the point that you want to +[implementing a manifold](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/tutorials/implement-a-manifold/) +yourself and use it within [`Manopt.jl`](https://manoptjl.org/). +A challenge might be, which functions are necessary, since the overall interface of [`ManifoldsBase.jl`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/) +is maybe not completely necessary. + +This tutorial aims to help you through these steps to implement necessary parts of a manifold +to get started with the `[solver](@ref SolversSection)`{=commonmark} you have in mind. + +## An example problem + +We get started by loading the packages we need. + +```{julia} +#| echo: false +#| code-fold: true +#| output: false +using Pkg; +cd(@__DIR__) +Pkg.activate("."); # for reproducibility use the local tutorial environment. +Pkg.develop(path="../") # a trick to work on the local dev version +``` + +```{julia} +#| output: false +using LinearAlgebra, Manifolds, ManifoldsBase, Random +using Manopt +Random.seed!(42) +``` + +We also define the same manifold as in +the [implementing a manifold](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/tutorials/implement-a-manifold/) +tutorial. + +```{julia} +#| output: false +""" + ScaledSphere <: AbstractManifold{ℝ} + +Define a sphere of fixed radius + +# Fields + +* `dimension` dimension of the sphere +* `radius` the radius of the sphere + +# Constructor + + ScaledSphere(dimension,radius) + +Initialize the manifold to a certain `dimension` and `radius`, +which by default is set to `1.0` +""" +struct ScaledSphere <: AbstractManifold{ℝ} + dimension::Int + radius::Float64 +end +``` + +We would like to compute a mean and/or median similar to [🏔️ Get Started: Optimize!](https://manoptjl.org/stable/tutorials/Optimize!.html). +For given a set of points $q_1,\ldots,q_n$ we want to compute [Karcher:1977](@cite) + +```math + \operatorname*{arg\,min}_{p\in\mathcal M} + \frac{1}{2n} \sum_{i=1}^n d_{\mathcal M}^2(p, q_i) +``` + +On the `ScaledSphere` we just defined above. +We define a few parameters first + +```{julia} +d = 5 # dimension of the sphere - embedded in R^{d+1} +r = 2.0 # radius of the sphere +N = 100 # data set size + +M = ScaledSphere(d,r) +``` + + + If we generate a few points + +```{julia} +#| output : false +# generate 100 points around the north pole +pts = [ [zeros(d)..., M.radius] .+ 0.5.*([rand(d)...,0.5] .- 0.5) for _=1:N] +# project them onto the r-sphere +pts = [ r/norm(p) .* p for p in pts] +``` + +Then, before starting with optimization, we need the distance on the manifold, +to define the cost function, as well as the logarithmic map to defined the gradient. +For both, we here use the “lazy” approach of using the [Sphere](https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/sphere.html) as a fallback + +```{julia} +#| output : false +import ManifoldsBase: distance, log +function distance(M::ScaledSphere, p, q) + return M.radius * distance(Sphere(M.dimension), p ./ M.radius, q ./ M.radius) +end +function log(M::ScaledSphere, p, q) + return M.radius * log(Sphere(M.dimension), p ./ M.radius, q ./ M.radius) +end +``` + +## Define the cost and gradient + +```{julia} +#| output : false +f(M, q) = sum(distance(M, q, p)^2 for p in pts) +grad_f(M,q) = sum( - log(M, q, p) for p in pts) +``` + +## Defining the necessary functions to run a solver + +The documentation usually lists the necessary functions in a +section “Technical Details” close to the end of the documentation of a solver, +for our case that is [The gradient descent's Technical Details](https://manoptjl.org/stable/solvers/gradient_descent.html#Technical-Details), + +They list all details, but we can start even step by step here if we are a bit careful. + +### A retraction + +We first implement a [retract](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/)ion. Informally, given a current point and a direction to “walk into” we need a function that performs that walk. +Since we take an easy one that just projects onto +the sphere, we use the [`ProjectionRetraction`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.ProjectionRetraction) type. +To be precise, we have to implement the [in-place variant](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/design/#inplace-and-noninplace) [`retract_project!`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.retract_project!-Tuple{AbstractManifold,%20Vararg{Any,%204}}) + +```{julia} +import ManifoldsBase: retract_project! +function retract_project!(M::ScaledSphere, q, p, X, t::Number) + q .= p .+ t .* X + q .*= M.radius/norm(q) + return q +end +``` + +The other two technical remarks refer to the step size and the stopping criterion, +so if we set these to something simpler, we should already be able to do a first run. + +We have to specify + +* that we want to use the new retraction, +* a simple step size and stopping criterion + +We start with a certain point of cost +```{julia} +p0 = [zeros(d)...,1.0] +f(M,p0) +``` + +Then we can run our first solver, where we have to overwrite a few +defaults, which would use functions we do not (yet) have. +We will discuss these in the next steps. + +```{julia} +q1 = gradient_descent(M, f, grad_f, p0; + retraction_method = ProjectionRetraction(), #state, that we use the retraction from above + stepsize = DecreasingStepsize(M; length=1.0), # A simple step size + stopping_criterion = StopAfterIteration(10), # A simple stopping crtierion + X = zeros(d+1), # how we define a tangent vector +) +f(M,q1) +``` + +We at least see, that the function value decreased. + +### Norm and maximal step size. + +To use more advanced stopping criteria and step sizes we first need an[`inner`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.inner-Tuple%7BAbstractManifold,%20Any,%20Any,%20Any%7D)`(M, p, X)`. +We also need a [`max_stepsize`](@ref)`(M)`, to avoid having too large steps +on positively curved manifolds like our scaled sphere in this example + +```{julia} +import ManifoldsBase: inner +import Manopt: max_stepsize +inner(M::ScaledSphere, p, X,Y) = dot(X,Y) # inherited from the embedding + # set the maximal allowed stepsize to injectivity radius. +Manopt.max_stepsize(M::ScaledSphere) = M.radius*π +``` + +Then we can use the default step size ([`ArmijoLinesearch`](@ref)) and +the default stopping criterion, which checks for a small gradient Norm + +```{julia} +q2 = gradient_descent(M, f, grad_f, p0; + retraction_method = ProjectionRetraction(), # as before + X = zeros(d+1), # as before +) +f(M, q2) +``` + +### Making life easier: default retraction and zero vector + +To initialize tangent vector memory, the function [`zero_vector`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/functions/#ManifoldsBase.zero_vector-Tuple{AbstractManifold,%20Any})`(M,p)` is called. Similarly, +the most-used retraction is returned by [`default_retraction_method`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/retractions/#ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) + +We can use both here, to make subsequent calls to the solver less verbose. +We define + +```{julia} +import ManifoldsBase: zero_vector, default_retraction_method +zero_vector(M::ScaledSphere, p) = zeros(M.dimension+1) +default_retraction_method(M::ScaledSphere) = ProjectionRetraction() +``` + +and now we can even just call + +```{julia} +q3 = gradient_descent(M, f, grad_f, p0) +f(M, q3) +``` + +But we for example automatically also get the possibility to obtain debug information like + +```{julia} +gradient_descent(M, f, grad_f, p0; debug = [:Iteration, :Cost, :Stepsize, 25, :GradientNorm, :Stop, "\n"]); +``` + +see [How to Print Debug Output](https://manoptjl.org/stable/tutorials/HowToDebug.html) +for more details. + +## Literature + +````{=commonmark} +```@bibliography +Pages = ["ImplementOwnManifold.md"] +Canonical=false +``` +```` \ No newline at end of file diff --git a/tutorials/InplaceGradient.qmd b/tutorials/InplaceGradient.qmd index 63a0118ed2..da856c24a6 100644 --- a/tutorials/InplaceGradient.qmd +++ b/tutorials/InplaceGradient.qmd @@ -1,10 +1,10 @@ --- -title: "Speedup using Inplace Evaluation" +title: "Speedup using in-place evaluation" author: Ronny Bergmann --- -When it comes to time critital operations, a main ingredient in Julia is given by -mutating functions, i.e. those that compute in place without additional memory +When it comes to time critical operations, a main ingredient in Julia is given by +mutating functions, that is those that compute in place without additional memory allocations. In the following, we illustrate how to do this with `Manopt.jl`. Let's start with the same function as in [Get Started: Optimize!](https://manoptjl.org/stable/tutorials/Optimize!.html) @@ -95,7 +95,7 @@ end For the actual call to the solver, we first have to generate an instance of `GradF!` and tell the solver, that the gradient is provided in an [`InplaceEvaluation`](https://manoptjl.org/stable/plans/objective/#Manopt.InplaceEvaluation). -We can further also use [`gradient_descent!`](https://manoptjl.org/stable/solvers/gradient_descent/#Manopt.gradient_descent!) to even work inplace of the initial point we pass. +We can further also use [`gradient_descent!`](https://manoptjl.org/stable/solvers/gradient_descent/#Manopt.gradient_descent!) to even work in-place of the initial point we pass. ```{julia} grad_f2! = GradF!(data, similar(data[1])) m2 = deepcopy(p0) diff --git a/tutorials/Optimize!.qmd b/tutorials/Optimize.qmd similarity index 85% rename from tutorials/Optimize!.qmd rename to tutorials/Optimize.qmd index 743aec7e7f..7e257752aa 100644 --- a/tutorials/Optimize!.qmd +++ b/tutorials/Optimize.qmd @@ -1,12 +1,12 @@ --- -title: "Get Started: Optimize!" +title: "🏔️ Get started: optimize." author: Ronny Bergmann --- In this tutorial, we will both introduce the basics of optimisation on manifolds as well as how to use [`Manopt.jl`](https://manoptjl.org) to perform optimisation on manifolds in [Julia](https://julialang.org). -For more theoretical background, see e.g. [doCarmo:1992](@cite) for an introduction to Riemannian manifolds +For more theoretical background, see for example [doCarmo:1992](@cite) for an introduction to Riemannian manifolds and [AbsilMahonySepulchre:2008](@cite) or [Boumal:2023](@cite) to read more about optimisation thereon. Let $\mathcal M$ denote a [Riemannian manifold](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/#ManifoldsBase.Manifold) @@ -29,7 +29,7 @@ In the Euclidean case with$d\in\mathbb N$, that is for $n\in \mathbb N$ data poi can not be directly generalised to data $q_1,\ldots,q_n$, since on a manifold we do not have an addition. -But the mean can also be charcterised as +But the mean can also be characterised as ```math \operatorname*{arg\,min}_{x\in\mathbb R^d} \frac{1}{2n}\sum_{i=1}^n \lVert x - y_i\rVert^2 @@ -38,14 +38,14 @@ But the mean can also be charcterised as and using the Riemannian distance $d_\mathcal M$, this can be written on Riemannian manifolds. We obtain the _Riemannian Center of Mass_ [Karcher:1977](@cite) ```math - \operatorname*{arg\,min}_{p\in\mathbb R^d} + \operatorname*{arg\,min}_{p\in\mathcal M} \frac{1}{2n} \sum_{i=1}^n d_{\mathcal M}^2(p, q_i) ``` Fortunately the gradient can be computed and is ```math - \operatorname*{arg\,min}_{p\in\mathbb R^d} \frac{1}{n} \sum_{i=1}^n -\log_p q_i + \frac{1}{n} \sum_{i=1}^n -\log_p q_i ``` ## Loading the necessary packages @@ -59,8 +59,8 @@ cd(@__DIR__) Pkg.activate("."); # for reproducibility use the local tutorial environment. ``` -Let's assume you have already installed both Manotp and Manifolds in Julia (using e.g. `using Pkg; Pkg.add(["Manopt", "Manifolds"])`). -Then we can get started by loading both packages – and `Random` for persistency in this tutorial. +Let's assume you have already installed both `Manopt.jl` and `Manifolds.jl` in Julia (using for example `using Pkg; Pkg.add(["Manopt", "Manifolds"])`). +Then we can get started by loading both packages as well as `Random.jl` for persistency in this tutorial. ```{julia} using Manopt, Manifolds, Random, LinearAlgebra @@ -88,7 +88,7 @@ grad_f(M, p) = sum(1 / n * grad_distance.(Ref(M), data, Ref(p))); and just call [`gradient_descent`](https://manoptjl.org/stable/solvers/gradient_descent/). For a first start, we do not have to provide more than the manifold, the cost, the gradient, -and a startig point, which we just set to the first data point +and a starting point, which we just set to the first data point ```{julia} m1 = gradient_descent(M, f, grad_f, data[1]) @@ -107,8 +107,8 @@ The goal is to get an output of the form but where we also want to fix the display format for the change and the cost numbers (the `[...]`) to have a certain format. Furthermore, the reason why the solver stopped should be printed at the end -These can easily be specified using either a Symbol – using the default format for numbers – or a tuple of a symbol and a format-string in the `debug=` keyword that is avaiable for every solver. -We can also – for illustration reasons – just look at the first 6 steps by setting a [`stopping_criterion=`](https://manoptjl.org/stable/plans/stopping_criteria/) +These can easily be specified using either a Symbol when using the default format for numbers, or a tuple of a symbol and a format-string in the `debug=` keyword that is available for every solver. +We can also, for illustration reasons, just look at the first 6 steps by setting a [`stopping_criterion=`](https://manoptjl.org/stable/plans/stopping_criteria/) ```{julia} m2 = gradient_descent(M, f, grad_f, data[1]; @@ -125,7 +125,7 @@ See [here](https://manoptjl.org/stable/plans/debug/#Manopt.DebugActionFactory-Tu The `debug=` keyword is actually a list of [`DebugActions`](https://manoptjl.org/stable/plans/debug/#Manopt.DebugAction) added to every iteration, allowing you to write your own ones even. Additionally, `:Stop` is an action added to the end of the solver to display the reason why the solver stopped. ``` -The default stopping criterion for [`gradient_descent`](https://manoptjl.org/stable/solvers/gradient_descent/) is, to either stopwhen the gradient is small (`<1e-9`) or a max number of iterations is reached (as a fallback. +The default stopping criterion for [`gradient_descent`](https://manoptjl.org/stable/solvers/gradient_descent/) is, to either stop when the gradient is small (`<1e-9`) or a max number of iterations is reached (as a fallback). Combining stopping-criteria can be done by `|` or `&`. We further pass a number `25` to `debug=` to only an output every `25`th iteration: @@ -167,7 +167,7 @@ data2 = [exp(N, q, σ * rand(N; vector_at=q)) for i in 1:m]; ``` Instead of the mean, let's consider a non-smooth optimisation task: -The median can be generalized to Manifolds as the minimiser of the sum of distances, see e.g. [Bacak:2014](@cite). We define +The median can be generalized to Manifolds as the minimiser of the sum of distances, see [Bacak:2014](@cite). We define ```{julia} g(N, q) = sum(1 / (2 * m) * distance.(Ref(N), Ref(q), data2)) @@ -209,8 +209,8 @@ at the recorded values at iteration 42 get_record(s)[42] ``` -But we can also access whole serieses and see that the cost does not decrease that fast; actually, the CPPA might converge relatively slow. For that we can for -example access the `:Cost` that was recorded every `:Iterate` as well as the (maybe a little boring) `:Iteration`-number in a semilogplot. +But we can also access whole series and see that the cost does not decrease that fast; actually, the CPPA might converge relatively slow. For that we can for +example access the `:Cost` that was recorded every `:Iterate` as well as the (maybe a little boring) `:Iteration`-number in a semi-log-plot. ```{julia} x = get_record(s, :Iteration, :Iteration) @@ -223,7 +223,7 @@ plot(x,y,xaxis=:log, label="CPPA Cost") ````{=commonmark} ```@bibliography -Pages = ["Optimize!.md"] +Pages = ["Optimize.md"] Canonical=false ``` ```` \ No newline at end of file diff --git a/tutorials/StochasticGradientDescent.qmd b/tutorials/StochasticGradientDescent.qmd index cbd8e49116..e42d5d6050 100644 --- a/tutorials/StochasticGradientDescent.qmd +++ b/tutorials/StochasticGradientDescent.qmd @@ -16,7 +16,7 @@ for given points $p_i ∈\mathcal M$, $i=1,…,N$ this optimization problem read \operatorname{d}^2_{\mathcal M}(x,p_i), ``` -which of course can be (and is) solved by a gradient descent, see the introductionary +which of course can be (and is) solved by a gradient descent, see the introductory tutorial or [Statistics in Manifolds.jl](https://juliamanifolds.github.io/Manifolds.jl/stable/features/statistics.html). If $N$ is very large, evaluating the complete gradient might be quite expensive. A remedy is to evaluate only one of the terms at a time and choose a random order for these. @@ -63,7 +63,7 @@ For the mean, the gradient is ``` which we define in `Manopt.jl` in two different ways: -either as one function returning all gradients as a vector (see `gradF`), or – maybe more fitting for a large scale problem, as a vector of small gradient functions (see `gradf`) +either as one function returning all gradients as a vector (see `gradF`), or, maybe more fitting for a large scale problem, as a vector of small gradient functions (see `gradf`) ```{julia} @@ -73,7 +73,7 @@ gradf = [(M, p) -> grad_distance(M, q, p) for q in data]; p0 = 1 / sqrt(3) * [1.0, 1.0, 1.0] ``` -The calls are only slightly different, but notice that accessing the 2nd gradient element +The calls are only slightly different, but notice that accessing the second gradient element requires evaluating all logs in the first function, while we only call _one_ of the functions in the second array of functions. So while you can use both `gradF` and `gradf` in the following call, the second one is (much) faster: