Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try linear warm-up and higher peak learning rate #39

Closed
matsen opened this issue Jun 18, 2024 · 0 comments · Fixed by #41
Closed

try linear warm-up and higher peak learning rate #39

matsen opened this issue Jun 18, 2024 · 0 comments · Fixed by #41

Comments

@matsen
Copy link
Contributor

matsen commented Jun 18, 2024

For stabilizing and enhancing training, we used a linear warm-up for 1k steps, a peak learning rate of 0.0004, a cosine
learning rate decay over 9k steps, and a weight decay of 0.01.

@matsen matsen linked a pull request Jun 19, 2024 that will close this issue
willdumm added a commit that referenced this issue Nov 16, 2024
Adds output_dim to `single` model for matching output dimensions expected by DASM. Supporting PR for matsengrp/dnsm-experiments-1#41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant