-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precision Issue #39
Comments
What llama model size are you using? |
NousResearch/Llama-2-7b-hf |
Did you make any other change to the code than the model id? What GPU are you using? |
Below is the code I was using. I don't think I made any changes from yours. And I was using an A40.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Philipp!
Thanks for this great repo!
I was trying to run llama2 instruction tuning following the tutorial. The code went well without flash attention. But after I commented in flash attention, I got this error:
"Runtime Error: Flash Attention only support fp16 and bf16 data type", from line 87 in llama_patch.py.
It seems to have something to do with data precision. Could you help me figure it out? Thanks a lot!
The text was updated successfully, but these errors were encountered: