Filtering the training data #151
ANonEntity
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Whisper tends to transcribe/translate silence as "Thank you for watching!", "Please subscribe to my channel!" and so on, since the training data contains YouTube captions. It seems like removing these lines from the training data, and then retraining/finetuning Whisper is the only way to solve this.
Would it be possible to detect unvoiced lines like these automatically? Maybe by filtering all the training data through a VAD?
Beta Was this translation helpful? Give feedback.
All reactions