Skip to content

Conversation

@Nimrais
Copy link
Contributor

@Nimrais Nimrais commented Sep 5, 2025

This PR adding a new stepsize schedule — Distance over Gradients (RDoG). RDoG is a learning-rate-free stepsize that adapts automatically without hyperparameter tuning https://arxiv.org/pdf/2406.02296.

You can use it like any other stepsize schedule.

using Manopt
using Manifolds
using LinearAlgebra
using Random

Random.seed!(42)

# Minimize negative Rayleigh quotient on the sphere S^1
M = Sphere(1)
A = randn(2, 2); A = A' * A  # symmetric positive definite

f(M, p) = -p' * A * p
function grad_f(M, p)
    g = -2 * A * p
    return g - dot(g, p) * p  # project to tangent space
end

p0 = rand(M)

x = gradient_descent(
    M, f, grad_f, p0;
    stepsize = DistanceOverGradients(M; initial_distance = 1e-2, use_curvature = false),
    stopping_criterion = StopAfterIteration(200) | StopWhenGradientNormLess(1e-8),
)

println("final cost = ", f(M, x))

@codecov
Copy link

codecov bot commented Sep 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.78%. Comparing base (a364659) to head (9b6a8ec).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #505      +/-   ##
==========================================
- Coverage   99.80%   99.78%   -0.03%     
==========================================
  Files          85       85              
  Lines        9375     9418      +43     
==========================================
+ Hits         9357     9398      +41     
- Misses         18       20       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kellertuer
Copy link
Member

Nice, thanks!

At first glance it seem the current constructor is “the old one” with M necessary – maybe we can have a second ode employing the factory (I wrote that after our discussions at last years JuliaCon)?

I will try to find time to review your code, but it would also be nice to write a test case and have test coverage. Maybe just one further run in the gradient descent tests where you have a cost and gradient already...

Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I am (maybe I said tha already) not so much a fan of that paper. Illustrating that a gradient descent with a constant stepsize does not converge, is no wonder – one needs an Armijo linesearch for example.
Sure they could have compared to Armijo and argued why theirs is easier, ... that would have been the proper comparison. But well, that is not your fault of couse

Here are a few small comment, mainly on the documentation, which I rendered locally.
Test coverage would be nice.

@mateuszbaran
Copy link
Member

Thanks for the contribution, it's good to have a wider choice of algorithms here.

I am (maybe I said tha already) not so much a fan of that paper. Illustrating that a gradient descent with a constant stepsize does not converge, is no wonder – one needs an Armijo linesearch for example.

Yes, doing better than a constant stepsize isn't a great achievement. On the other hand, it's stochastic optimization where AFAIK standard Armijo doesn't have any convergence guarantees. This paper proposes a modification that does have some guarantees: https://par.nsf.gov/servlets/purl/10208765 . It can be trivially adapted to the Riemannian setting, so the authors of the RDoG paper could have included it in their comparison.

@Nimrais
Copy link
Contributor Author

Nimrais commented Sep 8, 2025

I have read the feedback, thanks. it's not hard to address. However I have one problem about this one.

At first glance it seem the current constructor is “the old one” with M necessary – maybe we can have a second ode employing the factory (I wrote that after our discussions at last years JuliaCon)?

DoG algorithms are inherently dependent on your starting point, so that is why I do need the manifold argument M. It's essentially a poor man way to allocate a container for the starting position. Later, I am overwriting whatever is written there from the optimizer state on iteration 0. The only way how alternative constructor can be implemented is with a point on Manifold provided. Is it better? Any other ideas?

@kellertuer
Copy link
Member

I have read the feedback, thanks. it's not hard to address. However I have one problem about this one.

At first glance it seem the current constructor is “the old one” with M necessary – maybe we can have a second ode employing the factory (I wrote that after our discussions at last years JuliaCon)?

DoG algorithms are inherently dependent on your starting point, so that is why I do need the manifold argument M. It's essentially a poor man way to allocate a container for the starting position. Later, I am overwriting whatever is written there from the optimizer state on iteration 0. The only way how alternative constructor can be implemented is with a point on Manifold provided. Is it better? Any other ideas?

Hm but the one and only idea of the factory (from the one without a suffix to the one with stepwise) is to “plug in” the manifold (and call the constructor then) to some later point. So that scheme has exactly use usage, namely to not write M in the constructor as you wrote above. You can but you do not have to.

Is that the case here?

@kellertuer
Copy link
Member

Thanks for your work today. Coverage is already on a good way.

Could you add an entry to the Changelog as well? Just roughly follow the format in there, I can fix it before doing the release.
If you feel this is enough of a contribution, we can also check whether you get an entry on the about.md and the zenodo metadata. For me this is on the brink, but if you like, sure we can add you :)

@Nimrais
Copy link
Contributor Author

Nimrais commented Sep 8, 2025

I am (maybe I said tha already) not so much a fan of that paper. Illustrating that a gradient descent with a constant stepsize does not converge, is no wonder – one needs an Armijo linesearch for example.

Yes, doing better than a constant stepsize isn't a great achievement. On the other hand, it's stochastic optimization where AFAIK standard Armijo doesn't have any convergence guarantees.

I agree with you both. To my taste, the paper does not have the proper scholarly tone :). But it does not change the fact that the method is really cheap — you really just need one additional number to store to outperform the constant stepsize. So once Armijo becomes computationally infeasible because it's just too slow to backtrack, at least you can try this as something cheap that is better than constant stepsize without playing with strange schedules for the stepsize.

This paper proposes a modification that does have some guarantees: https://par.nsf.gov/servlets/purl/10208765 . It can be trivially adapted to the Riemannian setting, so the authors of the RDoG paper could have included it in their comparison

Interesting! I will take a look. I haven't seen this paper; the authors of the RDoG probably haven't seen it as well, as I do not see them citing it.

@Nimrais
Copy link
Contributor Author

Nimrais commented Sep 8, 2025

If you feel this is enough of a contribution, we can also check whether you get an entry on the about.md and the zenodo metadata. For me this is on the brink, but if you like, sure we can add you :)

Ah no I am fine you don't need to add me. I will return later with RDoWG I just do not have time to add it into this PR.

@kellertuer
Copy link
Member

Great!

And sure, shorter PRs that focus on one thing are better than super long PRs, so that decision is a very good one I think :)

Let me just check the rendered docs during the day somewhen, the rest looks already good I think.

Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks very nice already!

I have two small comments on the docs and one for test coverage.

@kellertuer
Copy link
Member

since we are nearly finished here, I will wait with #503 for this one and release both together as a new version.

Mainly because by waiting I have to merge the changelog, then you do not have to worry about that.

Co-authored-by: Ronny Bergmann <[email protected]>
@Nimrais
Copy link
Contributor Author

Nimrais commented Sep 9, 2025

@kellertuer I think it's in a good shape now, do you see smt that still needs to be addressed?

Copy link
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

@kellertuer kellertuer merged commit a91e67a into JuliaManifolds:master Sep 9, 2025
13 of 14 checks passed
@kellertuer kellertuer mentioned this pull request Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants