v0.6.0: Meeting mode, SenseVoice, and 4 new ASR engines#210
Merged
peteonrails merged 22 commits intomainfrom Feb 17, 2026
Merged
v0.6.0: Meeting mode, SenseVoice, and 4 new ASR engines#210peteonrails merged 22 commits intomainfrom
peteonrails merged 22 commits intomainfrom
Conversation
The ONNX binaries now include both Parakeet and Moonshine engines, so
the "parakeet" name no longer fits. Binary names change from
voxtype-parakeet-{avx2,avx512,cuda,rocm} to
voxtype-onnx-{avx2,avx512,cuda,rocm}.
Backward compatible: symlink detection checks both old and new names,
and gpu setup looks for both voxtype-onnx-* and voxtype-parakeet-*
files on disk so existing v0.5.6 installations keep working.
Cargo features, engine config, and CLI commands (setup parakeet) are
unchanged.
Integrate Alibaba's SenseVoice model via ONNX Runtime for local transcription. SenseVoice is a CTC encoder-only model supporting zh/en/ja/ko/yue with a single forward pass. The preprocessing pipeline converts audio to 80-dim Fbank features, stacks via LFR to 560-dim, then CMVN-normalizes before ONNX inference. Cherry-picked from spike/sensevoice-onnx (3320058).
Cherry-picked from spike/sensevoice-onnx (5ae82b8).
Cherry-picked from spike/sensevoice-onnx (33ec709).
Move Fbank feature extraction from sensevoice.rs into shared fbank.rs with parameterizable FbankConfig (window type, frame length/shift, pre-emphasis). Add CTC greedy decoder in ctc.rs. Both modules will be reused by Paraformer, Dolphin, Omnilingual, and FireRedASR engines.
Consolidate ONNX dependencies under onnx-common feature flag. Add TranscriptionEngine variants, config structs, CLI parsing, daemon match arms, VAD auto-selection, and notification icons for Paraformer, Dolphin, Omnilingual, and FireRedASR engines.
Four new ONNX-based transcription backends using the shared fbank and CTC decoder infrastructure: - Paraformer: FunASR CTC encoder (zh/en), same preprocessing as SenseVoice with LFR from model metadata - Dolphin: dictation-optimized CTC encoder with Hann window, 31.25ms frame, no LFR or pre-emphasis - Omnilingual: FunASR 50+ language model with 20ms frame shift and per-utterance instance normalization - FireRedASR: autoregressive encoder-decoder (sherpa-onnx exports) following the Moonshine pattern for greedy decoding
Introduces continuous meeting transcription with chunked processing,
speaker attribution, and export capabilities (Pro feature).
New modules: meeting/chunk.rs, meeting/data.rs, meeting/state.rs,
meeting/storage.rs, meeting/export/{json,markdown,txt}.rs
CLI: voxtype meeting {start,stop,pause,resume,status,list,show,export}
Adapted to multi-engine architecture (accepts Config instead of
WhisperConfig for engine-agnostic transcriber creation).
Adds meeting lifecycle management to the daemon: start, stop, pause, resume, and chunk processing. Uses file-based IPC for state communication with CLI. Adapted send_notification calls for multi-engine signature and MeetingDaemon::new to accept full Config.
Introduces loopback audio capture alongside microphone for You/Remote speaker attribution. Adds diarization module with simple energy-based speaker detection.
Adds ML-based speaker embedding extraction using ONNX Runtime for improved speaker diarization accuracy. Includes spectral clustering for speaker assignment. Uses existing onnx-common deps via ml-diarization feature flag.
Adds meeting summarization using local Ollama LLM: key points, action items, and speaker attribution. Includes configurable prompt templates and async processing.
Comprehensive tests for meeting data types, storage, state transitions, chunk processing, and export formats. Updates smoke test documentation with meeting mode test procedures.
Auto-fix push_str single-char to push, unneeded returns, derivable impls, collapsed if-else, redundant closures. Manual fixes: rename ExportFormat::from_str to parse, remove wildcard-with-pattern in match arm.
…edASR The setup model command only knew about Whisper, Parakeet, Moonshine, and SenseVoice. Add model catalog entries, download logic, and interactive menu sections for the four remaining ONNX engines. Uses generic shared handlers (validate_onnx_ctc_model, download_onnx_model, handle_onnx_engine_selection, update_config_engine) to avoid duplicating ~200 lines per engine.
FireRedASR dropped: autoregressive encoder-decoder architecture is too complex for the value, v2 has no ONNX export, and the 1.74GB model is Chinese-primary niche. Replaced SenseVoice, Paraformer, Dolphin, Omnilingual, CTC, and Fbank implementations with improved versions that extract shared preprocessing into dedicated modules (ctc.rs for CTC decoding, fbank.rs for mel filterbank extraction).
Meeting mode ships as a standard feature, not Pro-gated. Remove license.rs module, Pro feature checks, and related error variants. Also incorporates code quality improvements from meeting mode review: storage path handling, summary module cleanup, VAD integration test coverage for all ONNX engine variants. Cherry-picked from feature/meeting-mode (22873ef, 51548ab).
Dolphin: Add Fbank preprocessing pipeline. The model expects [N,T,80] Fbank features, not raw waveform. Add CMVN normalization from model metadata (mean/invstd keys with already-negated values). Fix input tensor name (x_len) and type (i64). Add lob_probs output name. Paraformer: Fix BPE marker stripping for 3D logits path. Add ctc_decode_to_ids() that returns token IDs, then route through tokens_to_text() which handles @@ marker removal and special token filtering. Previously the CTC greedy decode path left @@ artifacts and </s> tokens in the output. Both engines now pass smoke tests with correct transcription output.
Cover all 7 engine variants (Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, Omnilingual) with quick validation, daemon integration, error handling, and performance comparison procedures.
Dockerfile.onnx and Dockerfile.onnx-cuda now build with all ONNX engines (sensevoice, paraformer, dolphin, omnilingual) instead of just parakeet + moonshine. Added .worktrees/ to .dockerignore to prevent 16GB of worktree data from being sent as build context over SSH.
The ONNX binaries now include six engines beyond Parakeet, so the command name should reflect that. Users with existing scripts using `voxtype setup parakeet` will continue to work via the hidden alias.
This was referenced Feb 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
voxtype-parakeet-*binaries renamed tovoxtype-onnx-*(now include all ONNX engines, not just Parakeet)21 commits, 65 files changed, ~12,700 lines added across meeting mode, new engines, shared preprocessing, CLI commands, configuration, and tests.
Test plan
cargo testpasses (526 tests)cargo clippyclean