fix(qwen35): guard baked head dims, not the runtime value-head count by FeathBow · Pull Request #536 · openinfer-project/openinfer

FeathBow · 2026-07-03T15:41:20Z

Description

Fixes #517.

The qwen35 config guard required linear_num_value_heads == 32, which blocked Qwen3.5-27B's 48 value heads at startup. That guard was too strict: the GDN Triton-AOT kernels take the head counts as runtime arguments; the baked constraints are the head dims / tile sizes.

This PR removes the value-head-count guard, keeps the real GDN dim guard (128/128), and adds fail-loud checks for the other baked assumptions the crate already depends on: full-attention head_dim == 256 and divisible linear key/value head grouping. The Triton kernel source's stale H=32 comment and unused QWEN35_GDR_HEADS constant are removed to match.

Test Env

Synthetic config with 48 value heads loads.
The published Qwen3.5-27B config.json passes all three load ensures (linear dims 128/128, head_dim 256, 48 % 16 == 0).
48-value-head GDN numeric check: chunkwise Triton prefill matches stepwise CUDA decode over 96 steps within tolerance.
4B Qwen3.5 gates unchanged.

Type of Change

Bug fix (non-breaking change which fixes an issue)

fix(qwen35): guard baked head dims, not the runtime value-head count

9069bd5

xiaguan merged commit 35ec82e into openinfer-project:main Jul 4, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(qwen35): guard baked head dims, not the runtime value-head count#536

fix(qwen35): guard baked head dims, not the runtime value-head count#536
xiaguan merged 1 commit into
openinfer-project:mainfrom
FeathBow:fix/qwen35-gdn-guard-value-heads

FeathBow commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

FeathBow commented Jul 3, 2026

Description

Test Env

Type of Change

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants