-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About masked acceleration #127
Comments
Hi, I think |
BUT. |
Hi, Have you read the SpargeAttn code? Please note that it is different from SageAttention. |
You are a genius! I love you! I confused the two things, they are so similar. |
Also, does it support different length data within the same batch? |
@Harry-Miral Currently not. What kind of workload are you working on? If the batch size is quite small, then you can use for loop on each single batch of data. |
Can I still use SageAttention to speed up if I manually provide att_mask (not causal nor padding mask)?Thank you.
The text was updated successfully, but these errors were encountered: