-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Overview
Enable users to define multiple dictation modes (intents) that can be triggered via customizable keyboard shortcuts, allowing switching between different languages, models, or processing pipelines during dictation.
Problem Statement
As a multilingual user, I frequently switch between languages while dictating. While the current Auto mode works well for longer sentences, it often misidentifies the language in shorter phrases (e.g., "Hola" vs "Hello", "Bonjour" vs "Good morning"). This leads to incorrect transcription and requires manual correction.
Proposed Solution
Core Concept: Dictation Intents
Introduce "Intents" – configurable profiles that define:
- Input language(s)
- Transcription model (local/remote)
- Post-processing hooks (e.g., "send to LLM with prompt: 'Format this as a professional email'")
- Audio input source (if multiple microphones)
- possibly more...
Keyboard Shortcut Implementation
Allow users to assign keyboard shortcuts to switch between Intents:
Basic Example (Language Switching):
Cmd+1→ Spanish Intent (transcribe in Spanish, no post-processing)Cmd+2→ English Intent (transcribe in English, no post-processing)Cmd+3→ "Meeting Notes" Intent:- Input: English
- Model: Local Whisper (fast, offline)
- Post-process: Send to GPT with prompt: "Extract action items and format as bullet points"
Benefits
- Immediate language switching without relying on auto-detection
- Extensible architecture that can grow with user needs
- Power user workflow enhancement through customizable post-processing, posibly using third party tools directly
- Accessibility for users who need to switch contexts rapidly
Alternatives Considered
While the immediate need is language switching, abstracting this into "Intents" provides a future-proof solution that can accommodate a wide range of user workflows without requiring repeated feature requests for each new use case.