Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDPPrecision should support 16-true with a loss scaler #19973

Open
zaptrem opened this issue Jun 13, 2024 · 1 comment
Open

FSDPPrecision should support 16-true with a loss scaler #19973

zaptrem opened this issue Jun 13, 2024 · 1 comment
Labels
feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers

Comments

@zaptrem
Copy link

zaptrem commented Jun 13, 2024

Description & Motivation

if scaler is not None and self.precision != "16-mixed":

What if I want to use fp16 true, but with a loss scaler? This is closer to DeepSpeed's default settings. With FSDP, 16-true, no loss scaler my model doesn't converge. However, with FSDP, 16-true, and a loss scaler (commented out the assert and fixed the typo'ed return scaler instead of return none line) my model converges.

Pitch

No response

Alternatives

No response

Additional context

No response

cc @Borda

@zaptrem zaptrem added feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers labels Jun 13, 2024
@oabuhamdan
Copy link

I came here to open this issue, and you already did.
I second this issue.

I fixed the package itself by adding

if scaler is not None and self.precision not in ["16-mixed", "16-true"]:
    raise ValueError(f"`precision={precision!r}` does not use a scaler, found {scaler}.")

but it has to be fixed naturally.

@zaptrem zaptrem changed the title FSDPPrecision should support 16-true with a loss scalar FSDPPrecision should support 16-true with a loss scaler Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants