-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Few domain related terminologies are not transcribed correctly in whisper-triton. #649
Comments
@yuekaizhang Could you have a look? |
Hi @krishnardt, whisper-triton is an acclerated solution which can't improve whisper's accuracy. If you can't get correct results using pytorch whisper implementation, whisper-triton can't help either. |
Try |
@yuekaizhang I edited in the wrong place. I got the ouptut correctly... I have few other hot words.. Added them as comma seperated values. It is working fine. But won't it increase the latency? I tried to for 2 mins data, that means for 4 requests, the prefix would be added and it the hotwords list is bigger, it may increase the latency. This is what I am thinking. Please correct if I am wrong. |
Hi, during conversion I gave input words/tokens as 340. server error: inference error: At: ^CTraceback (most recent call last): |
@krishnardt
|
Hi,
Is there any way to correct above mentioned examples while transcribing through whisper-triton?
Model is not able to transcribe few words properly even though spelt normally.
For example: Atomberg is transcribed as "Atombuck".
I tried to add custom tokens to the tokeniser(tiktoken) by modifying its tokenizer.py code as in below image, without disturbing the flow. but I am getting worst output compared to without custom token.
I followed K2 Sherpa's approach to generate the model and ran the triton server.
Can someone guide me how to resolve this issue?
The text was updated successfully, but these errors were encountered: