Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] - Integrate Voicetyping #115

Open
Ulf3000 opened this issue Jan 31, 2025 · 1 comment
Open

[Feature request] - Integrate Voicetyping #115

Ulf3000 opened this issue Jan 31, 2025 · 1 comment
Labels
discussion enhancement New feature or request

Comments

@Ulf3000
Copy link

Ulf3000 commented Jan 31, 2025

Given that your tool is exceptionally well-programmed and functions seamlessly across various applications, it would be beneficial to incorporate voice-typing processing through Gemini or other suitable LLMs.
It would be fantastic to have a dedicated button within your app, rather than relying on inferior voice-typing solutions. Perhaps my perspective is incorrect; if so, please correct me.

@theJayTea
Copy link
Owner

This is an interesting request that we could think about adding in the future.

There's actually a very nice dedicated model for this by OpenAI called Whisper, however, running it locally requires ~4 GB of vram/ram and almost everyone certainly wouldn't be able to run it alongside a local LLM. There's actually 1 project I found that does what you requested with this:
https://github.com/savbell/whisper-writer

A way to get free & accessible state of the art transcription would be using the Gemini API and asking the multimodal Gemini 2.0 for a transcript. However, I'm unsure what the latency would be like.

This is not something I can immediately work on, and I'd also like to hear what others think about this proposal first.

@theJayTea theJayTea added enhancement New feature or request discussion labels Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants