Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About masked acceleration #127

Open
Harry-Miral opened this issue Mar 4, 2025 · 6 comments
Open

About masked acceleration #127

Harry-Miral opened this issue Mar 4, 2025 · 6 comments

Comments

@Harry-Miral
Copy link

Can I still use SageAttention to speed up if I manually provide att_mask (not causal nor padding mask)?Thank you.

@jt-zhang
Copy link
Member

jt-zhang commented Mar 4, 2025

Hi, I think SpargeAttn may have done the thing you mentioned: https://github.com/thu-ml/SpargeAttn.

@Harry-Miral
Copy link
Author

Harry-Miral commented Mar 4, 2025

BUT.
After reading the code again, I only found the causal mask.

@jt-zhang
Copy link
Member

jt-zhang commented Mar 5, 2025

Hi, Have you read the SpargeAttn code? Please note that it is different from SageAttention.

@Harry-Miral
Copy link
Author

You are a genius! I love you! I confused the two things, they are so similar.

@Harry-Miral
Copy link
Author

Also, does it support different length data within the same batch?

@jason-huang03
Copy link
Member

@Harry-Miral Currently not. What kind of workload are you working on? If the batch size is quite small, then you can use for loop on each single batch of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants