try linear warm-up and higher peak learning rate #39

matsen · 2024-06-18T10:45:40Z

For stabilizing and enhancing training, we used a linear warm-up for 1k steps, a peak learning rate of 0.0004, a cosine
learning rate decay over 9k steps, and a weight decay of 0.01.

Adds output_dim to `single` model for matching output dimensions expected by DASM. Supporting PR for matsengrp/dnsm-experiments-1#41

matsen linked a pull request Jun 19, 2024 that will close this issue

Try "warm up" phase #41

Merged

matsen closed this as completed in #41 Jun 19, 2024

willdumm added a commit that referenced this issue Nov 16, 2024

dnsm-experiments-1 #39 supporting PR (#87)

a6f02a2

Adds output_dim to `single` model for matching output dimensions expected by DASM. Supporting PR for matsengrp/dnsm-experiments-1#41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

try linear warm-up and higher peak learning rate #39

try linear warm-up and higher peak learning rate #39

matsen commented Jun 18, 2024

try linear warm-up and higher peak learning rate #39

try linear warm-up and higher peak learning rate #39

Comments

matsen commented Jun 18, 2024