-
Notifications
You must be signed in to change notification settings - Fork 46
Add Distance over Gradients Stepsize #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #505 +/- ##
==========================================
- Coverage 99.80% 99.78% -0.03%
==========================================
Files 85 85
Lines 9375 9418 +43
==========================================
+ Hits 9357 9398 +41
- Misses 18 20 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Nice, thanks! At first glance it seem the current constructor is “the old one” with I will try to find time to review your code, but it would also be nice to write a test case and have test coverage. Maybe just one further run in the gradient descent tests where you have a cost and gradient already... |
kellertuer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I am (maybe I said tha already) not so much a fan of that paper. Illustrating that a gradient descent with a constant stepsize does not converge, is no wonder – one needs an Armijo linesearch for example.
Sure they could have compared to Armijo and argued why theirs is easier, ... that would have been the proper comparison. But well, that is not your fault of couse
Here are a few small comment, mainly on the documentation, which I rendered locally.
Test coverage would be nice.
Co-authored-by: Ronny Bergmann <[email protected]>
|
Thanks for the contribution, it's good to have a wider choice of algorithms here.
Yes, doing better than a constant stepsize isn't a great achievement. On the other hand, it's stochastic optimization where AFAIK standard Armijo doesn't have any convergence guarantees. This paper proposes a modification that does have some guarantees: https://par.nsf.gov/servlets/purl/10208765 . It can be trivially adapted to the Riemannian setting, so the authors of the RDoG paper could have included it in their comparison. |
|
I have read the feedback, thanks. it's not hard to address. However I have one problem about this one.
DoG algorithms are inherently dependent on your starting point, so that is why I do need the manifold argument |
Hm but the one and only idea of the factory (from the one without a suffix to the one with stepwise) is to “plug in” the manifold (and call the constructor then) to some later point. So that scheme has exactly use usage, namely to not write Is that the case here? |
test: add Hyperbolic test for DoG
|
Thanks for your work today. Coverage is already on a good way. Could you add an entry to the Changelog as well? Just roughly follow the format in there, I can fix it before doing the release. |
I agree with you both. To my taste, the paper does not have the proper scholarly tone :). But it does not change the fact that the method is really cheap — you really just need one additional number to store to outperform the constant stepsize. So once Armijo becomes computationally infeasible because it's just too slow to backtrack, at least you can try this as something cheap that is better than constant stepsize without playing with strange schedules for the stepsize.
Interesting! I will take a look. I haven't seen this paper; the authors of the RDoG probably haven't seen it as well, as I do not see them citing it. |
Ah no I am fine you don't need to add me. I will return later with RDoWG I just do not have time to add it into this PR. |
|
Great! And sure, shorter PRs that focus on one thing are better than super long PRs, so that decision is a very good one I think :) Let me just check the rendered docs during the day somewhen, the rest looks already good I think. |
kellertuer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall looks very nice already!
I have two small comments on the docs and one for test coverage.
|
since we are nearly finished here, I will wait with #503 for this one and release both together as a new version. Mainly because by waiting I have to merge the changelog, then you do not have to worry about that. |
Co-authored-by: Ronny Bergmann <[email protected]>
Co-authored-by: Ronny Bergmann <[email protected]>
|
@kellertuer I think it's in a good shape now, do you see smt that still needs to be addressed? |
kellertuer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks!
This PR adding a new stepsize schedule — Distance over Gradients (RDoG). RDoG is a learning-rate-free stepsize that adapts automatically without hyperparameter tuning https://arxiv.org/pdf/2406.02296.
You can use it like any other stepsize schedule.