-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent GPU memory usage of QLoRA (vs LoRA) with different numbers of GPUs #373
Comments
I think the first thing that is confusing me, is that if you are keeping everything constant, and increasing the number of GPUs. then we do not expect memory consumption to increase. So can you post the arguments you are using to perform the different experiments with different number of GPUs |
@fabianlim these are the configs that we used for all the tests: QLoRA configThe only variable here was the amount of GPUs.
LoRA configThe only variable here was the amount of GPUs.
|
@anhuong do you know how come in @albertoperdomo2 if you see in our benches, our settings are quite similar to yours, but you can see that when |
@fabianlim we have seen this behavior mainly by this particular model pair. We are planning on testing different equivalent models but I wonder if this particular model itself might be the issue. Do you have results for |
@albertoperdomo2 no im sorry. |
Describe the bug
When validating the
fms-hf-tuning v2.0.1
image, we ran our workloads across different GPU counts to review improvements associated with it. One thing that we tried was fine tuning using LoRA + full precision model (in this casemistralai/Mistral-7B-v0.3
) and QLoRA + quantized model (in this casemistral-7b-v0.3-gptq
) with the same settings to analyze the results, and we found out that with 8 GPUs, the QLoRA GPU memory usage was greater than the LoRA equivalent.Platform
RHOAI 2.12
Expected behavior
When running LoRA (with a full precision model) and QLoRA fine tuning (with the same model but quantized) the GPU memory usage is expected to always be lower in QLoRA, given the fact that the model parameters are in a lower precision.
The text was updated successfully, but these errors were encountered: