Skip to content

WIP: VAD with Whisper and Silero backends#147

Draft
peteonrails wants to merge 1 commit intomainfrom
feature/vad-silero-compat
Draft

WIP: VAD with Whisper and Silero backends#147
peteonrails wants to merge 1 commit intomainfrom
feature/vad-silero-compat

Conversation

@peteonrails
Copy link
Owner

Summary

Implements Voice Activity Detection with engine-specific backends:

  • WhisperVad: Uses whisper-rs built-in VAD (GGML Silero model) - works now
  • SileroVad: Uses voice_activity_detector crate for Parakeet builds - blocked on ort compatibility

Status: Blocked

The voice_activity_detector 0.2.0 crate uses an older ort API that conflicts with parakeet-rs's ort version. Build fails with API mismatch errors when the parakeet feature is enabled.

Unblocking options

  1. Wait for voice_activity_detector to update for newer ort versions
  2. Find an alternative Silero ONNX implementation compatible with our ort version
  3. Contribute ort compatibility fix upstream to voice_activity_detector

Related

  • See feature/vad-silence-filter for the working energy-based VAD (no external dependencies)
  • WhisperVad from this branch can be cherry-picked independently since it has no ort dependency

Implements Voice Activity Detection with two backends:
- WhisperVad: Uses whisper-rs built-in VAD (GGML Silero model)
- SileroVad: Uses voice_activity_detector crate (ONNX Silero model)

WhisperVad works, but SileroVad is blocked on ort API compatibility.
The voice_activity_detector 0.2.0 crate uses an older ort API that
conflicts with parakeet-rs's ort version.

This branch preserves the work for when:
- voice_activity_detector updates to support newer ort
- Or we find an alternative Silero ONNX implementation

See feature/vad-silence-filter for the working energy-based VAD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant