Skip to content

fix(qwen35): guard baked head dims, not the runtime value-head count#536

Merged
xiaguan merged 1 commit into
openinfer-project:mainfrom
FeathBow:fix/qwen35-gdn-guard-value-heads
Jul 4, 2026
Merged

fix(qwen35): guard baked head dims, not the runtime value-head count#536
xiaguan merged 1 commit into
openinfer-project:mainfrom
FeathBow:fix/qwen35-gdn-guard-value-heads

Conversation

@FeathBow

@FeathBow FeathBow commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Description

Fixes #517.

The qwen35 config guard required linear_num_value_heads == 32, which blocked Qwen3.5-27B's 48 value heads at startup. That guard was too strict: the GDN Triton-AOT kernels take the head counts as runtime arguments; the baked constraints are the head dims / tile sizes.

This PR removes the value-head-count guard, keeps the real GDN dim guard (128/128), and adds fail-loud checks for the other baked assumptions the crate already depends on: full-attention head_dim == 256 and divisible linear key/value head grouping. The Triton kernel source's stale H=32 comment and unused QWEN35_GDR_HEADS constant are removed to match.

Test Env

  • Synthetic config with 48 value heads loads.
  • The published Qwen3.5-27B config.json passes all three load ensures (linear dims 128/128, head_dim 256, 48 % 16 == 0).
  • 48-value-head GDN numeric check: chunkwise Triton prefill matches stepwise CUDA decode over 96 steps within tolerance.
  • 4B Qwen3.5 gates unchanged.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

@xiaguan xiaguan merged commit 35ec82e into openinfer-project:main Jul 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qwen35: GDN Triton-AOT kernels are specialized to the 4B value-head count (blocks Qwen3.5-27B)

2 participants