Skip to content

Conversation

@MagellaX
Copy link
Owner

@MagellaX MagellaX commented Aug 24, 2025

Summary

  • switch FlashAttention to torch.nn.attention.sdpa_kernel(SDPBackend.FLASH_ATTENTION)
  • fallback to torch.backends.cuda.sdp_kernel on TypeError for older PyTorch
  • remove redundant attention invocation so SDPA runs once

Testing

  • pytest -q (fails: TabError in accuracy_test.py; ModuleNotFoundError: No module named 'yaml')

https://chatgpt.com/codex/tasks/task_e_68ab6a18b2448322a01c023171329cbb


Summary by cubic

Use the new torch.nn.attention.sdpa_kernel(SDPBackend.FLASH_ATTENTION) with a fallback to torch.backends.cuda.sdp_kernel for older PyTorch, and remove the redundant attention call so SDPA runs once. This improves version compatibility and avoids extra compute.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

except (AttributeError, RuntimeError):
sdpa_ctx = nullcontext()
# Fallback for older PyTorch releases
sdpa_ctx = torch.backends.cuda.sdp_kernel(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling torch.backends.cuda.sdp_kernel without its own error handling can raise and crash on environments where the kernel is absent, removing the prior graceful fallback.

Prompt for AI agents
Address the following comment on stream_attention/core/flashattention_v3.py at line 88:

<comment>Calling torch.backends.cuda.sdp_kernel without its own error handling can raise and crash on environments where the kernel is absent, removing the prior graceful fallback.</comment>

<file context>
@@ -80,15 +80,14 @@ def forward(
         if _use_flash_sdpa() and q.device.type == &quot;cuda&quot;:
             try:
                 # Prefer the newer torch.nn.attention API when available
-                sdpa_ctx = torch.nn.attention.sdpa_kernel(SDPBackend.FLASH_ATTENTION)
+                sdpa_ctx = torch.nn.attention.sdpa_kernel(
+                    SDPBackend.FLASH_ATTENTION
+                )
             except (AttributeError, TypeError):
-                try:
</file context>

@MagellaX
Copy link
Owner Author

@cursoragent look carefully and vigilantly at this issue, and properly fix it!!

@cursor
Copy link

cursor bot commented Aug 24, 2025

I enhanced the robustness of the FlashAttention module by adding fallback mechanisms for SDPA kernel selection and made PyYAML an optional dependency for configuration loading. I also performed a minor cleanup in a benchmark script.

  • In stream_attention/core/flashattention_v3.py, I improved the FlashAttentionV3 module's forward method by wrapping torch.backends.cuda.sdp_kernel in a try-except block to handle unavailability and F.scaled_dot_product_attention in another try-except RuntimeError. This ensures graceful degradation if a forced FlashAttention kernel fails, allowing PyTorch to select a compatible kernel.
  • In stream_attention/core/config.py, I refactored StreamAttentionConfig.from_yaml to lazy-load PyYAML only when the method is called, making PyYAML an optional dependency. I also added an ImportError check.
  • In stream_attention/benchmarks/accuracy_test.py, I removed a duplicate main() function call for minor code cleanup.

3 tasks completed

  • Harden SDPA fallback with safe error handling
  • Fix TabError in stream_attention/benchmarks/accuracy_test.py
  • Resolve PyYAML import issue in config
  • Run test suite and ensure all tests pass
Open in Cursor Open in Web

Learn more about Cursor Agents

@MagellaX MagellaX merged commit 1b3f176 into main Aug 24, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants