About masked acceleration #127

Harry-Miral · 2025-03-04T03:52:09Z

Can I still use SageAttention to speed up if I manually provide att_mask (not causal nor padding mask)？Thank you.

jt-zhang · 2025-03-04T11:01:40Z

Hi, I think SpargeAttn may have done the thing you mentioned: https://github.com/thu-ml/SpargeAttn.

Harry-Miral · 2025-03-04T11:10:58Z

BUT.
After reading the code again, I only found the causal mask.

jt-zhang · 2025-03-05T05:26:13Z

Hi, Have you read the SpargeAttn code? Please note that it is different from SageAttention.

Harry-Miral · 2025-03-05T08:32:49Z

You are a genius! I love you! I confused the two things, they are so similar.

Harry-Miral · 2025-03-05T08:51:58Z

Also, does it support different length data within the same batch?

jason-huang03 · 2025-03-05T16:34:17Z

@Harry-Miral Currently not. What kind of workload are you working on? If the batch size is quite small, then you can use for loop on each single batch of data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About masked acceleration #127

About masked acceleration #127

Harry-Miral commented Mar 4, 2025

jt-zhang commented Mar 4, 2025

Harry-Miral commented Mar 4, 2025 •

edited

Loading

jt-zhang commented Mar 5, 2025

Harry-Miral commented Mar 5, 2025

Harry-Miral commented Mar 5, 2025

jason-huang03 commented Mar 5, 2025

About masked acceleration #127

About masked acceleration #127

Comments

Harry-Miral commented Mar 4, 2025

jt-zhang commented Mar 4, 2025

Harry-Miral commented Mar 4, 2025 • edited Loading

jt-zhang commented Mar 5, 2025

Harry-Miral commented Mar 5, 2025

Harry-Miral commented Mar 5, 2025

jason-huang03 commented Mar 5, 2025

Harry-Miral commented Mar 4, 2025 •

edited

Loading