-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whisper without translation #652
Comments
@Plemeur Hi, you need to detect language first. Then set text prefix to your detected language. You can't do it by setting prompt only. |
I was hoping to piggyback off this thread in case @Plemeur you were able to find a workaround for your stated use-case. I am also having issues using Whisper Triton Server in multilingual contexts. Specifically my use case is transcribing English speech to English text, but occasionally Arabic speech is anticipated and we would want this to be translated to English and then transcribed to English text. Having played around with the text prompt, I was able to get it to work for either one of the two use cases, but not both. As I do not know in advance which scenario is expected, the text prompt cannot be preset. Strangely using HuggingFace Whisper, I found that setting it to transcribe to English was sufficient for it to translate + transcribe speech of any language to English. In contrast with Whisper Triton it returns output such as '[Arabic]' or '[Speaking foreign language]'. @yuekaizhang could you please expand on your comment on 'detect language first'. Is there a way to have this detected on the fly? |
@hjaved202 In general, for a sentence that mixes English and Arabic, if you want the output to be entirely in English, you can try using <|startoftranscript|><|en|><|transcribe|><|notimestamps|> or <|startoftranscript|><|en|><|translate|><|notimestamps|>. If you want the output to include both languages, you can try <|startoftranscript|><|en|><|ar|><|transcribe|><|notimestamps|>. The detect_language feature refers to first using the Whisper model to compute and obtain the language code and then incorporating it into the prompt for a second computation. For more details, you can check the official Whisper repository. |
Hello,
I have been trying to use the whisper triton server to transcribe english and japanese, but by settings multiple languages in the text prefix
<|en|><|ja|>
it will always translate into the second languageI am seeing some other people reporting the same issue on different repo related to large v3, with some different tips and tricks to make it "work"
Is this a limitation of large-v3 ? Did anyone get a good result using this triton server on multi languages speech ?
The text was updated successfully, but these errors were encountered: