Skip to content

Conversation

@xjmxyt
Copy link

@xjmxyt xjmxyt commented Dec 3, 2025

When using pytest to run gpt_oss/triton/attention.py , meet failures

FAILED attention_sink.py::test_eq[5-None-0.125-64-8-8-128-1-1] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-None-0.125-64-8-8-128-1-2] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-None-0.125-64-8-8-128-128-1] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-None-0.125-64-8-8-128-128-2] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-None-0.125-64-8-8-32-1-1] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-None-0.125-64-8-8-32-1-2] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-128-0.125-64-8-8-128-1-1] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-128-0.125-64-8-8-128-1-2] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-128-0.125-64-8-8-128-128-1] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-128-0.125-64-8-8-128-128-2] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-128-0.125-64-8-8-32-1-1] - AssertionError: Tensor-likes are not close!
FAILED attention_sink.py::test_eq[5-128-0.125-64-8-8-32-1-2] - AssertionError: Tensor-likes are not close!

This MR fix those failures.
There are minor fixes:

  1. Use hi = tl.minimum(hi, N_KV_CTX) avoid cross-border access of K.
  2. The kernel implementation does not match the reference implementation, update kernel implementaion.

After those fixes, we can pass all the tests in gpt_oss/triton/attention.py

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 182 to 186
pos_keys = torch.arange(num_keys, device=query.device)
pos_queries = torch.arange(num_queries, device=query.device) + start_q
mask = pos_keys[None, :] > pos_queries[:, None]
mask = mask | (pos_keys[None, :] < start_q)
mask = mask.float().masked_fill(mask, float("-inf"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Mask now drops cached prefix tokens

The new mask mask = mask | (pos_keys[None, :] < start_q) causes the reference path to drop every key with index < start_q. During cached decoding we set start_q to the cache offset (see AttentionBlock.forward in gpt_oss/triton/model.py lines 218-253) while passing the full cache as k/v, so this change makes n_ctx==1 or small-context calls ignore all previously cached tokens and attend only to the current block. That silently breaks causal attention whenever offset > 0, producing wrong outputs for generation with a warm cache.

Useful? React with 👍 / 👎.

Copy link
Author

@xjmxyt xjmxyt Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solve this by matching kernel implementation to reference instead of modifying reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant