v0.6.0 docs, website, and setup onnx rename by peteonrails · Pull Request #212 · peteonrails/voxtype

peteonrails · 2026-02-17T02:21:48Z

Summary

Follow-up to #210. These changes were pushed to release/0.6.0 after the PR was merged.

Rename setup parakeet to setup onnx with hidden backwards-compatible alias
Full documentation pass: model selection guide rewrite, user manual update, new meeting mode guide, README update
Website: v0.6.0 news article, model grid with all ONNX engines, download URLs bumped to 0.6.0

Test plan

cargo test passes
cargo clippy clean
Website renders correctly (news article, model grid, download URLs)

The ONNX binaries now include both Parakeet and Moonshine engines, so the "parakeet" name no longer fits. Binary names change from voxtype-parakeet-{avx2,avx512,cuda,rocm} to voxtype-onnx-{avx2,avx512,cuda,rocm}. Backward compatible: symlink detection checks both old and new names, and gpu setup looks for both voxtype-onnx-* and voxtype-parakeet-* files on disk so existing v0.5.6 installations keep working. Cargo features, engine config, and CLI commands (setup parakeet) are unchanged.

Integrate Alibaba's SenseVoice model via ONNX Runtime for local transcription. SenseVoice is a CTC encoder-only model supporting zh/en/ja/ko/yue with a single forward pass. The preprocessing pipeline converts audio to 80-dim Fbank features, stacks via LFR to 560-dim, then CMVN-normalizes before ONNX inference. Cherry-picked from spike/sensevoice-onnx (3320058).

Cherry-picked from spike/sensevoice-onnx (5ae82b8).

Cherry-picked from spike/sensevoice-onnx (33ec709).

Move Fbank feature extraction from sensevoice.rs into shared fbank.rs with parameterizable FbankConfig (window type, frame length/shift, pre-emphasis). Add CTC greedy decoder in ctc.rs. Both modules will be reused by Paraformer, Dolphin, Omnilingual, and FireRedASR engines.

Consolidate ONNX dependencies under onnx-common feature flag. Add TranscriptionEngine variants, config structs, CLI parsing, daemon match arms, VAD auto-selection, and notification icons for Paraformer, Dolphin, Omnilingual, and FireRedASR engines.

Four new ONNX-based transcription backends using the shared fbank and CTC decoder infrastructure: - Paraformer: FunASR CTC encoder (zh/en), same preprocessing as SenseVoice with LFR from model metadata - Dolphin: dictation-optimized CTC encoder with Hann window, 31.25ms frame, no LFR or pre-emphasis - Omnilingual: FunASR 50+ language model with 20ms frame shift and per-utterance instance normalization - FireRedASR: autoregressive encoder-decoder (sherpa-onnx exports) following the Moonshine pattern for greedy decoding

Introduces continuous meeting transcription with chunked processing, speaker attribution, and export capabilities (Pro feature). New modules: meeting/chunk.rs, meeting/data.rs, meeting/state.rs, meeting/storage.rs, meeting/export/{json,markdown,txt}.rs CLI: voxtype meeting {start,stop,pause,resume,status,list,show,export} Adapted to multi-engine architecture (accepts Config instead of WhisperConfig for engine-agnostic transcriber creation).

Adds meeting lifecycle management to the daemon: start, stop, pause, resume, and chunk processing. Uses file-based IPC for state communication with CLI. Adapted send_notification calls for multi-engine signature and MeetingDaemon::new to accept full Config.

Introduces loopback audio capture alongside microphone for You/Remote speaker attribution. Adds diarization module with simple energy-based speaker detection.

Adds ML-based speaker embedding extraction using ONNX Runtime for improved speaker diarization accuracy. Includes spectral clustering for speaker assignment. Uses existing onnx-common deps via ml-diarization feature flag.

Adds meeting summarization using local Ollama LLM: key points, action items, and speaker attribution. Includes configurable prompt templates and async processing.

Comprehensive tests for meeting data types, storage, state transitions, chunk processing, and export formats. Updates smoke test documentation with meeting mode test procedures.

Auto-fix push_str single-char to push, unneeded returns, derivable impls, collapsed if-else, redundant closures. Manual fixes: rename ExportFormat::from_str to parse, remove wildcard-with-pattern in match arm.

…edASR The setup model command only knew about Whisper, Parakeet, Moonshine, and SenseVoice. Add model catalog entries, download logic, and interactive menu sections for the four remaining ONNX engines. Uses generic shared handlers (validate_onnx_ctc_model, download_onnx_model, handle_onnx_engine_selection, update_config_engine) to avoid duplicating ~200 lines per engine.

FireRedASR dropped: autoregressive encoder-decoder architecture is too complex for the value, v2 has no ONNX export, and the 1.74GB model is Chinese-primary niche. Replaced SenseVoice, Paraformer, Dolphin, Omnilingual, CTC, and Fbank implementations with improved versions that extract shared preprocessing into dedicated modules (ctc.rs for CTC decoding, fbank.rs for mel filterbank extraction).

Meeting mode ships as a standard feature, not Pro-gated. Remove license.rs module, Pro feature checks, and related error variants. Also incorporates code quality improvements from meeting mode review: storage path handling, summary module cleanup, VAD integration test coverage for all ONNX engine variants. Cherry-picked from feature/meeting-mode (22873ef, 51548ab).

Dolphin: Add Fbank preprocessing pipeline. The model expects [N,T,80] Fbank features, not raw waveform. Add CMVN normalization from model metadata (mean/invstd keys with already-negated values). Fix input tensor name (x_len) and type (i64). Add lob_probs output name. Paraformer: Fix BPE marker stripping for 3D logits path. Add ctc_decode_to_ids() that returns token IDs, then route through tokens_to_text() which handles @@ marker removal and special token filtering. Previously the CTC greedy decode path left @@ artifacts and </s> tokens in the output. Both engines now pass smoke tests with correct transcription output.

Cover all 7 engine variants (Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, Omnilingual) with quick validation, daemon integration, error handling, and performance comparison procedures.

Dockerfile.onnx and Dockerfile.onnx-cuda now build with all ONNX engines (sensevoice, paraformer, dolphin, omnilingual) instead of just parakeet + moonshine. Added .worktrees/ to .dockerignore to prevent 16GB of worktree data from being sent as build context over SSH.

The ONNX binaries now include six engines beyond Parakeet, so the command name should reflect that. Users with existing scripts using `voxtype setup parakeet` will continue to work via the hidden alias.

Rewrite model selection guide for all 7 engines with decision tree, per-engine details, hardware recommendations, and troubleshooting. Update user manual with all engine sections, meeting mode commands, setup onnx documentation, and config examples. Add meeting mode guide covering commands, configuration, storage, speaker diarization, AI summarization, and export formats. Update README with engine comparison table, meeting mode usage, architecture diagram, and --engine CLI flag. Add v0.6.0 news article to website with engine table, meeting mode, speaker attribution, export formats, and setup onnx rename. Update website homepage model grid with all 6 ONNX engines, remove "experimental" from Parakeet, bump download URLs to v0.6.0.

peteonrails added 23 commits February 16, 2026 09:51

Add CJK test audio fixtures for SenseVoice validation

f3097e8

Cherry-picked from spike/sensevoice-onnx (5ae82b8).

Add SenseVoice to setup model menu and fix SentencePiece artifacts

851e932

Cherry-picked from spike/sensevoice-onnx (33ec709).

Add dual audio capture and simple speaker attribution (Phase 2)

6d54c23

Introduces loopback audio capture alongside microphone for You/Remote speaker attribution. Adds diarization module with simple energy-based speaker detection.

Add ML speaker diarization with ONNX (Phase 3)

9483b3f

Adds ML-based speaker embedding extraction using ONNX Runtime for improved speaker diarization accuracy. Includes spectral clustering for speaker assignment. Uses existing onnx-common deps via ml-diarization feature flag.

Add AI summarization with Ollama integration (Phase 5)

1fb4c82

Adds meeting summarization using local Ollama LLM: key points, action items, and speaker attribution. Includes configurable prompt templates and async processing.

Add meeting mode unit tests and smoke test procedures

4ae9a8c

Comprehensive tests for meeting data types, storage, state transitions, chunk processing, and export formats. Updates smoke test documentation with meeting mode test procedures.

Fix clippy warnings across codebase

392013c

Auto-fix push_str single-char to push, unneeded returns, derivable impls, collapsed if-else, redundant closures. Manual fixes: rename ExportFormat::from_str to parse, remove wildcard-with-pattern in match arm.

Bump version to 0.6.0

f0b1614

Add multi-engine transcription smoke tests

2539084

Cover all 7 engine variants (Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, Omnilingual) with quick validation, daemon integration, error handling, and performance comparison procedures.

Rename setup parakeet to setup onnx, keep parakeet as hidden alias

05abcfd

The ONNX binaries now include six engines beyond Parakeet, so the command name should reflect that. Users with existing scripts using `voxtype setup parakeet` will continue to work via the hidden alias.

peteonrails closed this Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0 docs, website, and setup onnx rename#212

v0.6.0 docs, website, and setup onnx rename#212
peteonrails wants to merge 23 commits intomainfrom
release/0.6.0

peteonrails commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peteonrails commented Feb 17, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant