-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradients are None after booster.backward #5792
Comments
I meet the same problem,have you found a solution? |
hey @ArnaudFickinger @B-Soul , could you please share the settings of your scripts? |
My code is related to my own ongoing research, so it is not convenient to share. But I just changed the distributed framework used to Huggingface Accelerate, and gradients are not None. So, I think there is a bug in colossalai framwork. |
hi @B-Soul , a snippet of optimizer/plugin settings will help. Besides, the gradient accessing API might be different due to optimization, if you are using |
@ArnaudFickinger Glad to hear that! And we might work on the API to make it more intuitive. Regarding the performance, Generally speaking, you should choose the plugin by the intended Do let us know if you have further doubts :p |
@botbw when I define 2 param_groups the id() of the parameters of the second group do not match any keys of optimizer._grad_store._grads_of_params[1] |
@ArnaudFickinger I guess it's unexpected since each group is handled separately in the same way (like a for loop), would you mind sharing the version (or commit) you are using and a min repro if possible? |
@botbw I have written a min repro with a simple network and in this case the keys actually match! I will take a closer look at my code and get back to you if I believe the issue might still be ColossalAI related. |
@ArnaudFickinger Sure, feel free to ask here or raise a new issue |
After calling booster.backward(loss=loss, optimizer=optimizer), all gradients of model.module are None. Is there a way to access the gradients?
The text was updated successfully, but these errors were encountered: