v0.6.0: Meeting mode, SenseVoice, and 4 new ASR engines by peteonrails · Pull Request #210 · peteonrails/voxtype

peteonrails · 2026-02-16T21:33:17Z

Summary

Meeting mode: Continuous transcription with speaker attribution, pause/resume, export (text/markdown/JSON/SRT/VTT), and AI summarization via Ollama
5 new ONNX engines: SenseVoice (zh/en/ja/ko/yue with auto language detection), Paraformer, Dolphin, Omnilingual (50+ languages), plus shared fbank/CTC infrastructure
ML speaker diarization: ONNX embedding-based speaker identification as alternative to simple energy-based attribution
Dual audio capture: Mic + system audio loopback for capturing remote meeting participants
Binary rename: voxtype-parakeet-* binaries renamed to voxtype-onnx-* (now include all ONNX engines, not just Parakeet)
Removed: FireRedASR (license incompatible), Pro license gate
Updated: Docker build configs for all ONNX engines, smoke test procedures, model selection setup

21 commits, 65 files changed, ~12,700 lines added across meeting mode, new engines, shared preprocessing, CLI commands, configuration, and tests.

Test plan

cargo test passes (526 tests)
cargo clippy clean
All 7 binary variants built and version-verified
AVX-512 instruction validation (AVX2 and Vulkan clean)
Multi-engine transcription smoke tests (Whisper EN, SenseVoice EN, SenseVoice ZH)
Signal handling and daemon lifecycle tests
CLI commands and config compatibility tests
Extended manual testing of meeting mode across real meetings
SenseVoice English accuracy comparison against sherpa-onnx reference

The ONNX binaries now include both Parakeet and Moonshine engines, so the "parakeet" name no longer fits. Binary names change from voxtype-parakeet-{avx2,avx512,cuda,rocm} to voxtype-onnx-{avx2,avx512,cuda,rocm}. Backward compatible: symlink detection checks both old and new names, and gpu setup looks for both voxtype-onnx-* and voxtype-parakeet-* files on disk so existing v0.5.6 installations keep working. Cargo features, engine config, and CLI commands (setup parakeet) are unchanged.

Integrate Alibaba's SenseVoice model via ONNX Runtime for local transcription. SenseVoice is a CTC encoder-only model supporting zh/en/ja/ko/yue with a single forward pass. The preprocessing pipeline converts audio to 80-dim Fbank features, stacks via LFR to 560-dim, then CMVN-normalizes before ONNX inference. Cherry-picked from spike/sensevoice-onnx (3320058).

Cherry-picked from spike/sensevoice-onnx (5ae82b8).

Cherry-picked from spike/sensevoice-onnx (33ec709).

Move Fbank feature extraction from sensevoice.rs into shared fbank.rs with parameterizable FbankConfig (window type, frame length/shift, pre-emphasis). Add CTC greedy decoder in ctc.rs. Both modules will be reused by Paraformer, Dolphin, Omnilingual, and FireRedASR engines.

Consolidate ONNX dependencies under onnx-common feature flag. Add TranscriptionEngine variants, config structs, CLI parsing, daemon match arms, VAD auto-selection, and notification icons for Paraformer, Dolphin, Omnilingual, and FireRedASR engines.

Four new ONNX-based transcription backends using the shared fbank and CTC decoder infrastructure: - Paraformer: FunASR CTC encoder (zh/en), same preprocessing as SenseVoice with LFR from model metadata - Dolphin: dictation-optimized CTC encoder with Hann window, 31.25ms frame, no LFR or pre-emphasis - Omnilingual: FunASR 50+ language model with 20ms frame shift and per-utterance instance normalization - FireRedASR: autoregressive encoder-decoder (sherpa-onnx exports) following the Moonshine pattern for greedy decoding

Introduces continuous meeting transcription with chunked processing, speaker attribution, and export capabilities (Pro feature). New modules: meeting/chunk.rs, meeting/data.rs, meeting/state.rs, meeting/storage.rs, meeting/export/{json,markdown,txt}.rs CLI: voxtype meeting {start,stop,pause,resume,status,list,show,export} Adapted to multi-engine architecture (accepts Config instead of WhisperConfig for engine-agnostic transcriber creation).

Adds meeting lifecycle management to the daemon: start, stop, pause, resume, and chunk processing. Uses file-based IPC for state communication with CLI. Adapted send_notification calls for multi-engine signature and MeetingDaemon::new to accept full Config.

Introduces loopback audio capture alongside microphone for You/Remote speaker attribution. Adds diarization module with simple energy-based speaker detection.

Adds ML-based speaker embedding extraction using ONNX Runtime for improved speaker diarization accuracy. Includes spectral clustering for speaker assignment. Uses existing onnx-common deps via ml-diarization feature flag.

Adds meeting summarization using local Ollama LLM: key points, action items, and speaker attribution. Includes configurable prompt templates and async processing.

Comprehensive tests for meeting data types, storage, state transitions, chunk processing, and export formats. Updates smoke test documentation with meeting mode test procedures.

Auto-fix push_str single-char to push, unneeded returns, derivable impls, collapsed if-else, redundant closures. Manual fixes: rename ExportFormat::from_str to parse, remove wildcard-with-pattern in match arm.

…edASR The setup model command only knew about Whisper, Parakeet, Moonshine, and SenseVoice. Add model catalog entries, download logic, and interactive menu sections for the four remaining ONNX engines. Uses generic shared handlers (validate_onnx_ctc_model, download_onnx_model, handle_onnx_engine_selection, update_config_engine) to avoid duplicating ~200 lines per engine.

FireRedASR dropped: autoregressive encoder-decoder architecture is too complex for the value, v2 has no ONNX export, and the 1.74GB model is Chinese-primary niche. Replaced SenseVoice, Paraformer, Dolphin, Omnilingual, CTC, and Fbank implementations with improved versions that extract shared preprocessing into dedicated modules (ctc.rs for CTC decoding, fbank.rs for mel filterbank extraction).

Meeting mode ships as a standard feature, not Pro-gated. Remove license.rs module, Pro feature checks, and related error variants. Also incorporates code quality improvements from meeting mode review: storage path handling, summary module cleanup, VAD integration test coverage for all ONNX engine variants. Cherry-picked from feature/meeting-mode (22873ef, 51548ab).

Dolphin: Add Fbank preprocessing pipeline. The model expects [N,T,80] Fbank features, not raw waveform. Add CMVN normalization from model metadata (mean/invstd keys with already-negated values). Fix input tensor name (x_len) and type (i64). Add lob_probs output name. Paraformer: Fix BPE marker stripping for 3D logits path. Add ctc_decode_to_ids() that returns token IDs, then route through tokens_to_text() which handles @@ marker removal and special token filtering. Previously the CTC greedy decode path left @@ artifacts and </s> tokens in the output. Both engines now pass smoke tests with correct transcription output.

Cover all 7 engine variants (Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, Omnilingual) with quick validation, daemon integration, error handling, and performance comparison procedures.

Dockerfile.onnx and Dockerfile.onnx-cuda now build with all ONNX engines (sensevoice, paraformer, dolphin, omnilingual) instead of just parakeet + moonshine. Added .worktrees/ to .dockerignore to prevent 16GB of worktree data from being sent as build context over SSH.

The ONNX binaries now include six engines beyond Parakeet, so the command name should reflect that. Users with existing scripts using `voxtype setup parakeet` will continue to work via the hidden alias.

peteonrails added 22 commits February 16, 2026 09:51

Add CJK test audio fixtures for SenseVoice validation

f3097e8

Cherry-picked from spike/sensevoice-onnx (5ae82b8).

Add SenseVoice to setup model menu and fix SentencePiece artifacts

851e932

Cherry-picked from spike/sensevoice-onnx (33ec709).

Add dual audio capture and simple speaker attribution (Phase 2)

6d54c23

Introduces loopback audio capture alongside microphone for You/Remote speaker attribution. Adds diarization module with simple energy-based speaker detection.

Add ML speaker diarization with ONNX (Phase 3)

9483b3f

Adds ML-based speaker embedding extraction using ONNX Runtime for improved speaker diarization accuracy. Includes spectral clustering for speaker assignment. Uses existing onnx-common deps via ml-diarization feature flag.

Add AI summarization with Ollama integration (Phase 5)

1fb4c82

Adds meeting summarization using local Ollama LLM: key points, action items, and speaker attribution. Includes configurable prompt templates and async processing.

Add meeting mode unit tests and smoke test procedures

4ae9a8c

Comprehensive tests for meeting data types, storage, state transitions, chunk processing, and export formats. Updates smoke test documentation with meeting mode test procedures.

Fix clippy warnings across codebase

392013c

Auto-fix push_str single-char to push, unneeded returns, derivable impls, collapsed if-else, redundant closures. Manual fixes: rename ExportFormat::from_str to parse, remove wildcard-with-pattern in match arm.

Bump version to 0.6.0

f0b1614

Add multi-engine transcription smoke tests

2539084

Cover all 7 engine variants (Whisper, Parakeet, Moonshine, SenseVoice, Paraformer, Dolphin, Omnilingual) with quick validation, daemon integration, error handling, and performance comparison procedures.

Rename setup parakeet to setup onnx, keep parakeet as hidden alias

05abcfd

The ONNX binaries now include six engines beyond Parakeet, so the command name should reflect that. Users with existing scripts using `voxtype setup parakeet` will continue to work via the hidden alias.

peteonrails merged commit dd23552 into main Feb 17, 2026
5 checks passed

This was referenced Feb 17, 2026

v0.6.0 docs, website, and setup onnx rename #212

Closed

Update documentation and website for v0.6.0 #213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0: Meeting mode, SenseVoice, and 4 new ASR engines#210

v0.6.0: Meeting mode, SenseVoice, and 4 new ASR engines#210
peteonrails merged 22 commits intomainfrom
release/0.6.0

peteonrails commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peteonrails commented Feb 16, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant