Specify attention-23 kernel and relax assertion in prepare qkv #27217

titaiwangms · 2026-01-31T00:16:44Z

This pull request updates the attention kernel selection logic and clarifies support for unidirectional (causal) attention in the CUDA attention implementation. The main changes focus on improving documentation, removing outdated comments, and explicitly setting the kernel type for better maintainability and clarity.

Kernel selection and configuration improvements:

Explicitly set the kernel_type field to AttentionKernel_Unfused in the AttentionData structure to clarify which kernel is being used and improve future extensibility.

Documentation and code clarity:

Added comments to clarify that unidirectional (causal) attention is supported by several attention kernel implementations, and that the TRT fused runner is only used for non-unidirectional cases, as enforced elsewhere.
Removed outdated TODO comments regarding parameter continuation and kernel selection, as these are now handled more explicitly in the code. [1] [2]

Copilot

Pull request overview

This PR improves the attention kernel selection logic for the CUDA Attention operator by explicitly setting kernel types and removing outdated code. The changes enhance code clarity and maintainability while enabling proper support for causal (unidirectional) attention with the unfused kernel path.

Changes:

Explicitly set kernel_type to AttentionKernel_Unfused in the Attention operator to clarify kernel selection
Remove outdated TODO comments about parameter handling and kernel selection that are now properly implemented
Relax the assertion in PrepareQkv_MHA_NoPast that incorrectly prevented causal attention, replacing it with a clarifying comment about which kernels support unidirectional attention

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
onnxruntime/core/providers/cuda/llm/attention.cc	Removes obsolete TODOs and explicitly sets kernel_type to AttentionKernel_Unfused, making kernel selection more explicit and maintainable
onnxruntime/contrib_ops/cuda/bert/attention_prepare_qkv.cu	Removes overly restrictive assertion blocking causal attention and adds explanatory comment about kernel support for is_unidirectional flag

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

specify attention kernel and relax assertion

ed1f86a

titaiwangms requested review from Copilot and tianleiwu January 31, 2026 00:16

Copilot started reviewing on behalf of titaiwangms January 31, 2026 00:17 View session

Copilot AI reviewed Jan 31, 2026

View reviewed changes

tianleiwu approved these changes Jan 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify attention-23 kernel and relax assertion in prepare qkv #27217

Specify attention-23 kernel and relax assertion in prepare qkv #27217

titaiwangms commented Jan 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Specify attention-23 kernel and relax assertion in prepare qkv #27217

Are you sure you want to change the base?

Specify attention-23 kernel and relax assertion in prepare qkv #27217

Conversation

titaiwangms commented Jan 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants