UPSTREAM PR #17356: vulkan: force full subgroups for flash attention to fix intel subgroup crash #256

DajanaV · 2025-11-18T11:34:25Z

Fixes #17297

To use subgroup operations on Intel, we have to force full subgroups, otherwise the subgroup size and how many of the threads are actually active may vary. I don't think this change should have any negative effect on other drivers.

…p crash

loci-agentic-ai · 2025-11-18T12:14:32Z

Access the complete analysis in the LOCI Dashboard

Based on the available context from the GitHub code review analysis, I can provide a summary of the pull request changes:

Pull Request Summary: Vulkan Flash Attention Intel Subgroup Fix

Overview

This pull request addresses Intel GPU subgroup crashes in Vulkan flash attention by enforcing full subgroups for all flash attention operations. The change modifies 4 locations in ggml_vk_load_shaders() function, replacing conditional full subgroup enforcement (FAPATH==FA_COOPMAT1) with unconditional enforcement (true).

Code Changes Analysis

The modifications affect four flash attention pipeline variants in ggml/src/ggml-vulkan/ggml-vulkan.cpp:

flash_attn_f32_f16_aligned_f32acc
flash_attn_f32_f16_aligned_f16acc
flash_attn_f32_f16_f32acc
flash_attn_f32_f16_f16acc

Technical Impact

Correctness: Eliminates Intel GPU crashes by ensuring consistent subgroup behavior across all Vulkan drivers. This defensive programming approach prioritizes stability over micro-optimizations.

Performance: The change may introduce slight overhead due to forced full subgroup synchronization. However, this is offset by crash elimination on Intel hardware and provides more predictable execution patterns.

Compatibility: Maintains functionality across all GPU vendors (Intel, NVIDIA, AMD) while simplifying the codebase by removing conditional subgroup logic.

Assessment

This represents a conservative fix that addresses a critical compatibility issue. The change simplifies code maintenance by providing uniform behavior across all flash attention variants. While there may be minor performance implications on some hardware, the stability benefits and crash prevention justify the modification.

The pull request demonstrates sound engineering judgment by choosing reliability over potential micro-optimizations in GPU compute operations.

vulkan: force full subgroups for flash attention to fix intel subgrou…

155a829

…p crash

DajanaV temporarily deployed to PROD__AL_DEMO November 18, 2025 11:34 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 27 times, most recently from 819d372 to ab559ce Compare November 24, 2025 20:10

loci-dev force-pushed the main branch 9 times, most recently from 53eeb3f to 2531f8a Compare November 26, 2025 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UPSTREAM PR #17356: vulkan: force full subgroups for flash attention to fix intel subgroup crash #256

UPSTREAM PR #17356: vulkan: force full subgroups for flash attention to fix intel subgroup crash #256

DajanaV commented Nov 18, 2025

Uh oh!

loci-agentic-ai bot commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UPSTREAM PR #17356: vulkan: force full subgroups for flash attention to fix intel subgroup crash #256

Are you sure you want to change the base?

UPSTREAM PR #17356: vulkan: force full subgroups for flash attention to fix intel subgroup crash #256

Conversation

DajanaV commented Nov 18, 2025

Uh oh!

loci-agentic-ai bot commented Nov 18, 2025

Pull Request Summary: Vulkan Flash Attention Intel Subgroup Fix

Overview

Code Changes Analysis

Technical Impact

Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants