Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the bug of constantlr and deespeed #749

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Shengqiang-Li
Copy link

@Shengqiang-Li Shengqiang-Li commented Dec 19, 2024

修复使用deepspeed训练,且scheduler为constantlr时的bug:

  1. 原有代码传入的是conf/ds_stage2.json中的optimizer,并未传入yaml文件中的optimizer。
  2. constantlr会报关于scheduler_conf和warmup_steps的KeyError

在yaml中设置lr为0.00001,
修复bug之前的log:
TypeError: __init__() got an unexpected keyword argument 'warmup_steps'
2024-12-19 11:25:00,526 DEBUG TRAIN Batch 0/100 loss 3.138866 acc 0.303122 lr 0.00100000 grad_norm 0.286094 rank 0
修复bug之后的log:
2024-12-19 06:20:32,664 DEBUG TRAIN Batch 0/100 loss 3.581383 acc 0.204409 lr 0.00001000 grad_norm 1.757465 rank 0

@aluminumbox
Copy link
Collaborator

这里是因为deepspeed optimizer效率更高,因此传入的是none,但确实需要根据yaml里配置调整一下ds的optimizer配置

@Shengqiang-Li
Copy link
Author

这里是因为deepspeed optimizer效率更高,因此传入的是none,但确实需要根据yaml里配置调整一下ds的optimizer配置

了解,那这块的optimizer就不改了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants