Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Confusion about Optimal Search Pattern Configuration #64

Open
Dianaia opened this issue Aug 6, 2024 · 2 comments
Open

[Question]: Confusion about Optimal Search Pattern Configuration #64

Dianaia opened this issue Aug 6, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@Dianaia
Copy link

Dianaia commented Aug 6, 2024

Confusion about Optimal Search Pattern Configuration

First of all, thank you for your outstanding research.
I noticed that in Appendix E of the paper, it is mentioned that "according to the ablation study, using only the Vertical-Slash pattern significantly impacts performance in highly dynamic tasks like KV retrieval."
However, the model configuration provided in the repository still uses the Vertical-Slash pattern exclusively.
You mentioned in other comments that "the search_pattern function reroutes to vertical_and_slash because our tests have shown that this setting offers better generalization and efficiency across different context windows and tasks."
This seems to contradict the conclusion given in the paper, which leaves me somewhat confused.
Could you please clarify how we should set the optimal search pattern in practice?

@Dianaia Dianaia added the question Further information is requested label Aug 6, 2024
@iofu728 iofu728 self-assigned this Aug 6, 2024
@iofu728
Copy link
Contributor

iofu728 commented Aug 6, 2024

Hi @Dianaia,

Thanks for your feedback and great question.

Actually, there's no contradiction.

  1. As shown in Figure 11, the majority (>90%) of the patterns we found through our search are "vertical and slash" patterns.
  2. As shown in Table 4, using only the "vertical and slash" pattern results in minimal performance differences across most tasks. The most crucial aspect here is the slash pattern.
  3. Based on our tests, using only the "vertical and slash" pattern and fine-tuning some compression ratios (e.g., increasing the slash lines in certain heads) can lead to better generalization across different context windows and tasks.

I recommend following our instructions in practical use and employing the "vertical and slash" pattern entirely. Our tests have shown that this approach performs well across different models, sizes, and tasks.

@Dianaia
Copy link
Author

Dianaia commented Aug 6, 2024

Got it, I understand now. Thank you again for your explanation and outstanding work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants