-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention? #37
Comments
Seems to be an hardware and environment issue unrelated to the code. I used cuda 11.8 |
I am also using cuda 11.8, and pytorch 2.01 for cuda 11.8. Also tried pytorch nightly and got the same error.
This error looked similar to lm-sys/FastChat#199; tried their suggests and none worked. One explanation on that thread is the vocab causing embedding lookup out-of-bounds issue though the vocab seems already fixed in llama-2. |
Does the example without code changes work? |
Yes, it worked well without any code change. |
What change did you make? |
The only change I made is
|
did you make changes to the flash attention patch? The example only works with falcon since it has a custom patch to use flash attention. |
Ahh, I didn't. I saw your code https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/peft_utils.py#L38-L41, and thought it's already taken care of. Also, even when I used |
Excited to see flash-attn 2 natively supported in transformers! Would you plan to update this post to work with this new feature? |
Yes! 👍🏻 Plan to update all my posts and remove that patches once there is an official release. |
Great! Looking forward to the updates. |
Thanks Phil for the great post "Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention". When I tried to change falon to llama2 (tried all 3 mode sizes), I always got "CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)". Should there be more changes than just model name to make it work? Or will you have a follow up post about fine tuning Llama2 with DeepSpeed + LoRA?
The text was updated successfully, but these errors were encountered: