feat(higgs-audio): crate skeleton + config/weights + backbone forward + parity gate#409
feat(higgs-audio): crate skeleton + config/weights + backbone forward + parity gate#409ywh555hhh wants to merge 6 commits into
Conversation
- openinfer-higgs-audio crate with default (no GPU) and higgs-audio features - GPU deps (openinfer-core, openinfer-kernels, openinfer-kv-cache, cudarc) are optional, gated behind the higgs-audio feature - Pure-logic modules (config, weights) compile on Mac without GPU - backbone.rs placeholder for GPU forward (395.4) Refs: openinfer-project#395
Add openinfer-higgs-audio to members[] and [workspace.dependencies]. Refs: openinfer-project#395
- HiggsConfig with TextConfig, AudioEncoderConfig, and top-level fields - Nested rope_parameters.rope_theta resolution (1_000_000) - from_path() loads config.json from a model directory - Unit test asserts all known facts against real checkpoint config Refs: openinfer-project#395
- map_backbone() maps body.* → BackboneSlot::Layer, body.norm → FinalNorm, tied.embedding.text_embedding → EmbedTokens, tied.head.text_head → LmHead - Non-backbone tensors (audio/codec/modality) return None - Unit tests: 398 backbone tensors (36×11 + embed + norm), all audio/codec skipped, all 36 layers have all 11 components Refs: openinfer-project#395
- HiggsBackbone::from_safetensors() loads weights using body.* naming
(body.layers.{i}.{rest}, body.norm.weight, tied.embedding.text_embedding)
- Uses text_config from HiggsConfig as architecture source
- forward() runs full 36-layer Qwen3 prefill with paged KV attention
- Mirrors openinfer-qwen3-4b kernel ops: RMSNorm, fused QKV GEMM,
FlashInfer paged attention, SwiGLU MLP, residual adds
- last_token_logits() and compute_all_position_logits() for parity testing
- No LoRA, no TP, no CUDA graph, no decode — single-sequence prefill only
Refs: openinfer-project#395
- tests/backbone_parity.rs compares Higgs backbone logits against pre-computed golden (backbone_golden.safetensors) - Top-64 logprobs comparison: regret, mean delta, p99 delta - Tolerances: mean ≤ 0.06 nat, p99 ≤ 0.20 nat (same pattern as qwen3) - Requires higgs-audio feature (GPU only) - Skipped cleanly when model or golden file absent Refs: openinfer-project#395
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21abb93f25
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| fn test_model_path() -> PathBuf { | ||
| let env_path = std::env::var("OPENINFER_TEST_MODEL_PATH") | ||
| .map(PathBuf::from) | ||
| .unwrap_or_else(|_| PathBuf::from("docs/private/higgs-audio-v3-tts-4b")); |
There was a problem hiding this comment.
Skip private-model unit tests on clean checkout
When OPENINFER_TEST_MODEL_PATH is unset on a clean checkout, docs/private/ is gitignored, but the new crate is a workspace member so cargo test --release --workspace --lib runs this unit test and immediately panics trying to read docs/private/higgs-audio-v3-tts-4b/config.json (the same fallback in weights.rs later panics on the safetensors index). Please make these fixture-dependent tests skip unless an explicit Higgs checkpoint or checked-in fixture exists.
Useful? React with 👍 / 👎.
| ); | ||
|
|
||
| // 5+6. Residual add + MLP RMSNorm (fused) | ||
| openinfer_kernels::ops::fused_add_rms_norm_batch_into( |
There was a problem hiding this comment.
Use the round fused add/RMSNorm variant
For the Higgs GPU path that mirrors Qwen3, using FlashInfer's non-round fused add/norm changes the residual value used for the RMS reduction: the shared kernel notes it keeps the pre-BF16-round add in memory, while the Qwen3 prefill/unified paths call fused_add_rms_norm_round_batch_into to match hidden = bf16(hidden + residual). On real Higgs/Qwen-style bf16 weights this introduces per-layer logits drift across all 36 layers and can fail the golden parity gate; call the round variant here.
Useful? React with 👍 / 👎.
Closes #408 (sub-issue of #395).
Summary
New
openinfer-higgs-audiocrate for the Higgs Audio model backbone (text-only path).The backbone is 36-layer Qwen3-isomorphic, using Higgs checkpoint
body.*weightnaming and nested
text_config.What is included
395.1 — Crate skeleton + feature wiring
higgs-audio: default = pure logic (Mac-buildable), feature pulls GPU stackCargo.toml395.2 — Config parsing
HiggsConfig+TextConfigwith nestedrope_parameters.rope_theta = 1_000_000395.3 — Weight name mapping
map_backbone():body.layers.{i}.{rest}→BackboneSlot::LayerNone395.4 — Backbone forward
HiggsBackbone::from_safetensors()— loads weights viabody.*namingforward()— 36-layer Qwen3 text prefill (RMSNorm → fused QKV → FlashInfer paged attention → SwiGLU MLP)openinfer-qwen3-4b; only the weight-name prefix and config source differ395.5 — Backbone parity gate
tests/backbone_parity.rs(required-features = ["higgs-audio"])Guardrails
openinfer-higgs-audio/+ 2 lines in rootCargo.tomlopeninfer-core,openinfer-kernels, etc.)mod.rs, nounwrap()on external input, no weight transpositionMac verification