Skip to content

feat(higgs-audio): crate skeleton + config/weights + backbone forward + parity gate#409

Draft
ywh555hhh wants to merge 6 commits into
openinfer-project:mainfrom
ywh555hhh:feat/higgs-audio-v1
Draft

feat(higgs-audio): crate skeleton + config/weights + backbone forward + parity gate#409
ywh555hhh wants to merge 6 commits into
openinfer-project:mainfrom
ywh555hhh:feat/higgs-audio-v1

Conversation

@ywh555hhh

@ywh555hhh ywh555hhh commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Closes #408 (sub-issue of #395).

Summary

New openinfer-higgs-audio crate for the Higgs Audio model backbone (text-only path).
The backbone is 36-layer Qwen3-isomorphic, using Higgs checkpoint body.* weight
naming and nested text_config.

What is included

395.1 — Crate skeleton + feature wiring

  • Feature-gated higgs-audio: default = pure logic (Mac-buildable), feature pulls GPU stack
  • Workspace registration in root Cargo.toml

395.2 — Config parsing

  • HiggsConfig + TextConfig with nested rope_parameters.rope_theta = 1_000_000
  • 11 architecture fact assertions (hidden_size=2560, 36 layers, GQA 32/8, head_dim=128, etc.)

395.3 — Weight name mapping

  • map_backbone(): body.layers.{i}.{rest}BackboneSlot::Layer
  • All audio/codec tensors correctly return None
  • Tests: 398 backbone tensors (36×11 + embed + norm), all 36 layers with 11 components

395.4 — Backbone forward

  • HiggsBackbone::from_safetensors() — loads weights via body.* naming
  • forward() — 36-layer Qwen3 text prefill (RMSNorm → fused QKV → FlashInfer paged attention → SwiGLU MLP)
  • Mirrors openinfer-qwen3-4b; only the weight-name prefix and config source differ

395.5 — Backbone parity gate

  • tests/backbone_parity.rs (required-features = ["higgs-audio"])
  • Top-64 logprobs: regret + mean (≤ 0.06 nat) + p99 (≤ 0.20 nat)
  • Clean skip when model or golden file is absent

Guardrails

  • Only touches openinfer-higgs-audio/ + 2 lines in root Cargo.toml
  • Zero shared-crate modifications (openinfer-core, openinfer-kernels, etc.)
  • No mod.rs, no unwrap() on external input, no weight transposition

Mac verification

cargo fmt --all --check     ✅
cargo metadata --no-deps    ✅
cargo build -p openinfer-higgs-audio   ✅
cargo test --lib (3/3)      ✅

- openinfer-higgs-audio crate with default (no GPU) and higgs-audio features
- GPU deps (openinfer-core, openinfer-kernels, openinfer-kv-cache, cudarc)
  are optional, gated behind the higgs-audio feature
- Pure-logic modules (config, weights) compile on Mac without GPU
- backbone.rs placeholder for GPU forward (395.4)

Refs: openinfer-project#395
Add openinfer-higgs-audio to members[] and [workspace.dependencies].

Refs: openinfer-project#395
- HiggsConfig with TextConfig, AudioEncoderConfig, and top-level fields
- Nested rope_parameters.rope_theta resolution (1_000_000)
- from_path() loads config.json from a model directory
- Unit test asserts all known facts against real checkpoint config

Refs: openinfer-project#395
- map_backbone() maps body.* → BackboneSlot::Layer, body.norm → FinalNorm,
  tied.embedding.text_embedding → EmbedTokens, tied.head.text_head → LmHead
- Non-backbone tensors (audio/codec/modality) return None
- Unit tests: 398 backbone tensors (36×11 + embed + norm), all audio/codec
  skipped, all 36 layers have all 11 components

Refs: openinfer-project#395
- HiggsBackbone::from_safetensors() loads weights using body.* naming
  (body.layers.{i}.{rest}, body.norm.weight, tied.embedding.text_embedding)
- Uses text_config from HiggsConfig as architecture source
- forward() runs full 36-layer Qwen3 prefill with paged KV attention
- Mirrors openinfer-qwen3-4b kernel ops: RMSNorm, fused QKV GEMM,
  FlashInfer paged attention, SwiGLU MLP, residual adds
- last_token_logits() and compute_all_position_logits() for parity testing
- No LoRA, no TP, no CUDA graph, no decode — single-sequence prefill only

Refs: openinfer-project#395
- tests/backbone_parity.rs compares Higgs backbone logits against
  pre-computed golden (backbone_golden.safetensors)
- Top-64 logprobs comparison: regret, mean delta, p99 delta
- Tolerances: mean ≤ 0.06 nat, p99 ≤ 0.20 nat (same pattern as qwen3)
- Requires higgs-audio feature (GPU only)
- Skipped cleanly when model or golden file absent

Refs: openinfer-project#395

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21abb93f25

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

fn test_model_path() -> PathBuf {
let env_path = std::env::var("OPENINFER_TEST_MODEL_PATH")
.map(PathBuf::from)
.unwrap_or_else(|_| PathBuf::from("docs/private/higgs-audio-v3-tts-4b"));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip private-model unit tests on clean checkout

When OPENINFER_TEST_MODEL_PATH is unset on a clean checkout, docs/private/ is gitignored, but the new crate is a workspace member so cargo test --release --workspace --lib runs this unit test and immediately panics trying to read docs/private/higgs-audio-v3-tts-4b/config.json (the same fallback in weights.rs later panics on the safetensors index). Please make these fixture-dependent tests skip unless an explicit Higgs checkpoint or checked-in fixture exists.

Useful? React with 👍 / 👎.

);

// 5+6. Residual add + MLP RMSNorm (fused)
openinfer_kernels::ops::fused_add_rms_norm_batch_into(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the round fused add/RMSNorm variant

For the Higgs GPU path that mirrors Qwen3, using FlashInfer's non-round fused add/norm changes the residual value used for the RMS reduction: the shared kernel notes it keeps the pre-BF16-round add in memory, while the Qwen3 prefill/unified paths call fused_add_rms_norm_round_batch_into to match hidden = bf16(hidden + residual). On real Higgs/Qwen-style bf16 weights this introduces per-layer logits drift across all 36 layers and can fail the golden parity gate; call the round variant here.

Useful? React with 👍 / 👎.

@ywh555hhh ywh555hhh marked this pull request as draft June 16, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

higgs-audio: crate skeleton + config/weights + backbone forward + parity gate

1 participant