High performance local Mac speech-to-keyboard app that uses the highly performance Whisper models from OpenAI. 15 seconds of audio in 1 second! Just press cmd+opt to start/stop recording.
brew install portaudio
git clone https://github.com/jeffzwang/mac-whisperer
cd mac-whisper
pip install -r requirements.txt
python whisper-dictation.py
On Mac, press cmd+opt to start/stop recording.
Fork of foges/whisper-dictation that 1) uses whispercpp for better performance on M1 Macs (about 2x) 2) has more ergonomic start/stop abilities. Right now, it's using the base.en
model by default. File an issue if you have trouble setting it up - I broke my wrist recently so this has been clutch for me.
The PortAudio library is required for this app to work.
brew install portaudio
The app requires accessibility permissions to register global hotkeys and permission to access your microphone for speech recognition.
Can specify model and shortcut like below:
python whisper-dictation.py -m large -k cmd_r+shift -l en
The models are multilingual, and you can specify a two-letter language code (e.g., "no" for Norwegian) with the -l
or --language
option. Specifying the language can improve recognition accuracy, especially for smaller model sizes.
To have the app run automatically when your computer starts, follow these steps:
- Open System Preferences.
- Go to Users & Groups.
- Click on your username, then select the Login Items tab.
- Click the + button and add the
run.sh
script from the whisper-dictation folder.