Replies: 2 comments 3 replies
-
PEBKAC |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks, @Purfview, for that that helpful and insightful comment. Would you be willing to enlighten me about what I'm doing wrong that makes 2 or 3 models fail at some tasks when using identical code works for the other 15 (for English) or 9 (for non-English languages) models I tested? If I'm the problem, as you suggest, I'm happy to learn how to make things work better. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Introducing FWEval
I have written a small tool that I use to systematically evaluate processing speed and accuracy of different models within faster_whisper. You can check it out in my FWEval repository. I hope others will find it interesting and useful.
Issues I have discovered
Overall, I've found Faster Whisper to be fantastic. It performs well overall. I'm particularly pleased that transcripts within each model for each file I've tested are identical regardless of what hardware (computer, OS, and device) I use. They differ from model to model, of course, but each model behaves consistently in very different environments.
However, I have found a few issues. Among the issues I have found using FWEval are:
For the most part, the English-specific versions of Faster Whisper models do not appear to be measurably better than the multi-lingual versions. Am I missing something here?
The Distil-Large-v2, Distil-Medium.en, and Distil-Small.en models are wildly inaccurate under most circumstances I've tested, even with high quality audio in English. The Tiny model requires really high quality audio or it, too, is not acceptably accurate. I consistently find these models unreliable. The Large and Large-v3 models appear to be more sensitive to audio quality than other models as well. Are others finding this as well?
The Distil-Large-v2 and Distil-Large-v3 models only work for English data, despite claiming to be multi-lingual. That is, these models provide English transcripts even when given non-English data, even when I expect transcripts in the language of the source file.
And (based on using a different program) it appears that when you explicitly request an English translation of a non-English data file, the Large-v3-Turbo and the Turbo models do not perform a translation, but give you a native-language transcript that differs from the native-language transcript you get when you don't request an English translation.
If anyone knows ways to resolve any of these issues, I'd love to hear about it. When it's just a few models out of the larger set of options, that makes me somewhat less inclined to assume it's my fault, but I'm certainly open to that possibility.
Beta Was this translation helpful? Give feedback.
All reactions