Auto Quantization? #2

KastanDay · 2023-10-11T18:54:50Z

If a model hits an OOM error, we could try quantizing it for the user.

They can already specify quantization themselves, this is just an error-handling auto thing.

Cuda-OOM errors: If your model doesn't fit on our 4xA40 (48 GB) server we return an error. Coming soon, we should fallback to accelerate ZeRO stage-3 (CPU/Disk offload). And/or allow a flag for quantization, load_in_8bit=True or load_in_4bit=True.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto Quantization? #2

Auto Quantization? #2

KastanDay commented Oct 11, 2023

Auto Quantization? #2

Auto Quantization? #2

Comments

KastanDay commented Oct 11, 2023