You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a model hits an OOM error, we could try quantizing it for the user.
They can already specify quantization themselves, this is just an error-handling auto thing.
Cuda-OOM errors: If your model doesn't fit on our 4xA40 (48 GB) server we return an error. Coming soon, we should fallback to accelerate ZeRO stage-3 (CPU/Disk offload). And/or allow a flag for quantization, load_in_8bit=True or load_in_4bit=True.
The text was updated successfully, but these errors were encountered:
If a model hits an OOM error, we could try quantizing it for the user.
They can already specify quantization themselves, this is just an error-handling auto thing.
Cuda-OOM errors: If your model doesn't fit on our 4xA40 (48 GB) server we return an error. Coming soon, we should fallback to accelerate ZeRO stage-3 (CPU/Disk offload). And/or allow a flag for quantization, load_in_8bit=True or load_in_4bit=True.
The text was updated successfully, but these errors were encountered: