Replies: 1 comment 2 replies
-
|
What you're seeing is expected, and it stems from an architectural difference between Large-v3/Turbo and earlier Whisper models. Short answerFor transcription with Large-v3 and Turbo, you should generally use the same transcription prefix that Whisper itself uses: However, if you're manually computing sequence probabilities by calling: model.decoder(...)you may be missing an additional token that the decoder expects internally, which is why you're seeing: at position 1. Why is
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all, I'm trying to determine the probability of a sequence of "target tokens" for a given audio recording of speech. To do so, I'm passing in this sequence of tokens and the encoded Mel spectrogram to the decoder. My problem is I'm unsure what special tokens to prepend to the sequence to indicate to the model to transcribe in English. In particular, for the "turbo" and "large-v3" models, I see the token "<|startoflm|>" has a very high probability output from the decoder, so it seems the model expects this token, but I don't know how to use it. I also don't see this token predicted with high probability when using other models, e.g., "base".
What prefix special tokens should I use for the turbo and large models? Below are snippets of code to show what I'm doing. If there is documentation, please point me there. Thanks!
Outputs:
Beta Was this translation helpful? Give feedback.
All reactions