You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to ask a general question. When analyzin attention score, I feel that my attention score is quite sparse and their values are also very low. I cannot obtain any valuable information, such as more attention on what kinds of tokens. Considering that a model has n layers and m and attention head, how can I gain some valuable insights?
my task is to extracting important information from the input I provide
The text was updated successfully, but these errors were encountered:
If I understand correctly, you're asking how to determine which parts of the attention weights are more important to preserve, especially in highly sparse scenarios.
In MInference, we don’t perform fine-tuned adjustments. Most heads use the same kernel sparsity rate. However, we replace block sparsity with a higher-budget VS pattern for certain heads, as we found that allocating more resources to these heads can significantly improve performance.
There are several related works exploring this direction, including:
Describe the issue
I want to ask a general question. When analyzin attention score, I feel that my attention score is quite sparse and their values are also very low. I cannot obtain any valuable information, such as more attention on what kinds of tokens. Considering that a model has n layers and m and attention head, how can I gain some valuable insights?
my task is to extracting important information from the input I provide
The text was updated successfully, but these errors were encountered: