You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Confusion about Optimal Search Pattern Configuration
First of all, thank you for your outstanding research.
I noticed that in Appendix E of the paper, it is mentioned that "according to the ablation study, using only the Vertical-Slash pattern significantly impacts performance in highly dynamic tasks like KV retrieval."
However, the model configuration provided in the repository still uses the Vertical-Slash pattern exclusively.
You mentioned in other comments that "the search_pattern function reroutes to vertical_and_slash because our tests have shown that this setting offers better generalization and efficiency across different context windows and tasks."
This seems to contradict the conclusion given in the paper, which leaves me somewhat confused.
Could you please clarify how we should set the optimal search pattern in practice?
The text was updated successfully, but these errors were encountered:
As shown in Figure 11, the majority (>90%) of the patterns we found through our search are "vertical and slash" patterns.
As shown in Table 4, using only the "vertical and slash" pattern results in minimal performance differences across most tasks. The most crucial aspect here is the slash pattern.
Based on our tests, using only the "vertical and slash" pattern and fine-tuning some compression ratios (e.g., increasing the slash lines in certain heads) can lead to better generalization across different context windows and tasks.
I recommend following our instructions in practical use and employing the "vertical and slash" pattern entirely. Our tests have shown that this approach performs well across different models, sizes, and tasks.
Confusion about Optimal Search Pattern Configuration
First of all, thank you for your outstanding research.
I noticed that in Appendix E of the paper, it is mentioned that "according to the ablation study, using only the Vertical-Slash pattern significantly impacts performance in highly dynamic tasks like KV retrieval."
However, the model configuration provided in the repository still uses the Vertical-Slash pattern exclusively.
You mentioned in other comments that "the search_pattern function reroutes to vertical_and_slash because our tests have shown that this setting offers better generalization and efficiency across different context windows and tasks."
This seems to contradict the conclusion given in the paper, which leaves me somewhat confused.
Could you please clarify how we should set the optimal search pattern in practice?
The text was updated successfully, but these errors were encountered: