-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collaboration effort : regarding speech recognition #269
Comments
I haven't tried sayboard, but I am surprised that it has "superior" speech recognition, as it also uses Vosk. Anyway, I do think that the speech recognition of Dicio is not good, most of the time (80% or more) it doesn't understand what I'm saying. In comparison, the speech to text voice input method FUTO is so much more accurate. So I hope that Dicio switches to it (or at least allows the option). See #197 |
Externalize the STT (ASR) to a external app (default engine on android with STT API) is in the roadmap. In the linux world there are much very good FOSS STT. |
@atototenten thanks for the kind words :-) Are you sure Sayboard works better? Because as |
maybe its just my experience ,i cannot be very sure ,since speech part of both are less than okay ,in my opinion otherwise i agree with @paolo-caroni thanks |
I use Whisper from F-Droid for STT: https://f-droid.org/en/packages/org.woheller69.whisper/ and Sherpa for TTS: https://f-droid.org/en/packages/org.woheller69.ttsengine/ Both work offline and surprisingly well. Dicio's STT engine seems worse after playing with it for half an hour. Even the example sentences from the skills don't get recognized, e.g. Calculator "What is five times four minus a million?" becomes "what is five times for minors a million" Answer: "5 million." Thanks, but that's useless. Whisper output: "What is 5 times 4 minus a million?" including correct capitalization, digits and question mark. Bingo! Would also be good not to have to download separate language models if I already have a working one on my phone. |
hi ,
i sincerely admire the project's author ,and want them and their creations to succeed .
im a user of another FOS app ,github.com/ElishaAz/Sayboard (speech-to-text virtual keyboard) ,which has superior speech recognition ,in my opinion .
since the most important and difficult component of both the projects is speech , why don't the projects join the forces ,into a single TTS and STT library project ?
i found that linux-world is also struggling with speech-recognition ,which is indeed quite difficult .
i think mozilla also has interest in speech part ,since they have an incomplete TTS engine project active ,also they released orbit(mistral-LLM based virtual-assistant) for firefox ,which would gain immensely from speech interface for communication with humans
hope for a positive action
thanks ,
well wisher anon.
The text was updated successfully, but these errors were encountered: