Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Why is every head config saved with "vertical_and_slash"? #57

Open
fmmoret opened this issue Jul 29, 2024 · 3 comments
Open
Assignees
Labels
question Further information is requested

Comments

@fmmoret
Copy link

fmmoret commented Jul 29, 2024

Describe the issue

Regardless of the pattern observed, the config saves it as "vertical_and_slash" when using the search_patterns function.

for ty, fc in [("stream_llm", stream_llm), ("vertical_and_slash", vertical_and_slash), ("block_sparse", block_sparse)]:
if ty == "stream_llm":
vs_list = [(100, 800)]
elif ty == "vertical_and_slash":
vs_list = [(30, 800), (100, 750), (500, 700), (3500, 100)]
else:
vs_list = [(8, 1)]
for v_size, s_size in vs_list:
score = fc(v_size, s_size)
score = score.item()
all_info.append([ty, v_size, s_size, score])
if score > best_score:
best_score = score
best_s, best_v = s_size, v_size
best_ty = ty
if best_ty == "stream_llm":
best_ty = "vertical_and_slash"
if best_ty == "block_sparse":
best_ty, best_v, best_s = "vertical_and_slash", 1000, 6096

The configs saved in the repo appear to only contain this method type ^.

Specifically those lines:

 if best_ty == "stream_llm": 
     best_ty = "vertical_and_slash" 
 if best_ty == "block_sparse": 
     best_ty, best_v, best_s = "vertical_and_slash", 1000, 6096 

When doing the forward pass, I think this means that we never route to anything other than the vertical_and_slash impl / kernels.

Is this a bug or intended? The experiment docs cite the use of this search patterns function.


On the other hand:

Search pattern v2

def search_pattern_v2(q, k, v, head):

does actually appear to save pattern type with specific names for routing to different pattern impls.


Do we need to use search patterns v2 to replicate the results of the paper? Or are the vertical_and_slash settings actually enough to pull off needle-in-a-haystack for long sequences?

@fmmoret fmmoret added the question Further information is requested label Jul 29, 2024
@iofu728
Copy link
Contributor

iofu728 commented Jul 30, 2024

Hi @fmmoret, thanks for your feedback.

The configs provided in the repo can reproduce the results from the paper. This means that the vertical_and_slash settings are sufficient to pass the Needle In A Haystack test for long sequences.

The search_pattern function reroutes to vertical_and_slash because our tests have shown that this setting offers better generalization and efficiency across different context windows and tasks.

@iofu728 iofu728 self-assigned this Jul 30, 2024
@PengWenChen
Copy link

PengWenChen commented Oct 16, 2024

Hi @iofu728 @fmmoret,

Why set best_v and best_s to 1000 and 6096 when the search results are block_sparse in ln216?

best_ty, best_v, best_s = "vertical_and_slash", 1000, 6096

Besides, how to set these initial v, s values?
Where do these values ​​come from?

if ty == "stream_llm":
vs_list = [(100, 800)]
elif ty == "vertical_and_slash":
vs_list = [(30, 800), (100, 750), (500, 700), (3500, 100)]

@iofu728
Copy link
Contributor

iofu728 commented Oct 17, 2024

Hi @PengWenChen, thanks for your question.

The sparse search is built based on Algorithm 1 in the paper, aligning different patterns with the kernel's runtime. Block sparse falls back to the VS pattern because our tests showed that this adjustment achieves better generalization across different lengths and tasks. This pattern adjustment was made empirically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants