Whisper Dictate - Global Key Binding

A Python CLI for voice dictation using Whisper API (OpenAI-compatible providers) with global key binding support on Arch Linux.

Features

🎤 Toggle Recording: Single key press to start/stop recording
🧠 AI Transcription: Uses Whisper API (OpenAI, Groq, Together AI, DeepInfra, local whisper.cpp, and more) for accurate speech-to-text
📋 Clipboard Integration: Automatically copies transcription to clipboard
🔔 System Notifications: Visual feedback via notify-send
⚡ Fast Response: Minimal latency for real-time usage
📊 Persistent History: SQLite database stores all transcriptions and logs
🔍 CLI Management: Full command-line interface for managing transcriptions and logs

Prerequisites

FFmpeg (Required for MP3 audio support)

FFmpeg is required for MP3 encoding and conversion. Install it via your package manager:

# Arch Linux
sudo pacman -S ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

Quick Start

Package Manager

This project uses uv for fast Python package management. Install it if you haven't already:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

If you prefer pip, you can still use pip install -e . instead of uv sync.

1. Install the Package

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

2. Configure Your Whisper Provider

Create a .env file in the project directory (copy from .env.example):

cp .env.example .env

Then edit .env with your provider settings. By default, it uses OpenAI:

WHISPER_PROVIDER=openai
WHISPER_API_KEY="your-api-key-here"
WHISPER_MODEL=whisper-1

Or use any OpenAI-compatible provider — just change the environment variables, no code changes needed:

# Example: Groq
WHISPER_PROVIDER=groq
WHISPER_API_KEY="your-groq-api-key"
WHISPER_MODEL=whisper-large-v3-turbo

# Example: DeepInfra
WHISPER_PROVIDER=deepinfra
WHISPER_API_KEY="your-deepinfra-api-key"
WHISPER_MODEL=Whisper/whisper-large-v3
WHISPER_BASE_URL=https://api.deepinfra.com/v1/openai

See .env.example for all supported providers and configuration options.

3. Add global bind to the CLI

Run the setup script to configure i3:

# This will modify your i3 config to add the key binding
./setup_i3.sh

Or manually add to your i3 config (~/.config/i3/config):

# Bind whisper dictate (using mod+z)
bindsym $mod+z exec whisper-dictate dictate

4. Test the CLI

# Check system info
whisper-dictate info

# Run a quick dictation test
whisper-dictate dictate

CLI Usage

The whisper-dictate CLI provides multiple subcommands for dictation, logs management, history management, and audio maintenance.

Global Options

Option	Description
`--log-level LEVEL`	Set logging level (DEBUG, INFO, WARNING, ERROR). Default: INFO

Main Commands

Dictation

whisper-dictate dictate [--duration SECONDS]

Record audio and transcribe it to text.

Options:

--duration SECONDS - Optional recording duration (default: unlimited, stop with Ctrl+C)

Example:

# Record until Ctrl+C
whisper-dictate dictate

# Record for 30 seconds
whisper-dictate dictate --duration 30

# With debug logging
whisper-dictate --log-level DEBUG dictate

System Information

whisper-dictate info

Display system information including audio devices, clipboard tools, and configuration.

Example:

whisper-dictate info
# Output:
# 🔍 System Information:
# ========================================
# 
# 🎤 Audio Devices:
#   • Built-in Microphone
#   • USB Microphone
# 
# 📋 Clipboard Tools:
#   • xclip
# 
# ⚙️  Configuration:
#   • model: base
#   • language: auto
# 
# 📊 Logging:
#   • Log file: /home/user/.local/share/whisper-dictate/whisper-dictate.log
#   • View logs: tail -f /home/user/.local/share/whisper-dictate/whisper-dictate.log

Logs Management

The CLI provides comprehensive log management with database-backed logging and configurable retention.

List Logs

whisper-dictate logs list [OPTIONS]

Query application logs with filters.

Options:

--level LEVEL - Filter by log level (DEBUG, INFO, WARNING, ERROR)
--source SOURCE - Filter by source module (e.g., whisper_dictate.audio)
--from-time TIME - Filter from timestamp (ISO format: YYYY-MM-DD HH:MM:SS)
--to-time TIME - Filter to timestamp (ISO format: YYYY-MM-DD HH:MM:SS)
--limit N - Maximum number of logs to display (default: 100)

Examples:

# List recent logs
whisper-dictate logs list

# Show only errors
whisper-dictate logs list --level ERROR

# Filter by source module
whisper-dictate logs list --source whisper_dictate.audio --limit 50

# Filter by date range
whisper-dictate logs list --from-time "2024-01-01" --to-time "2024-01-31"

Export Logs

whisper-dictate logs export FILENAME [OPTIONS]

Export logs to a file in text or JSON format.

Options:

--format FORMAT - Export format: text or json (default: text)
All filter options from logs list are available

Examples:

# Export to text file
whisper-dictate logs export error_logs.txt --level ERROR

# Export to JSON
whisper-dictate logs export logs.json --format json

Cleanup Logs

whisper-dictate logs cleanup [OPTIONS]

Clean up old logs based on retention policy.

Options:

--days N - Delete logs older than N days (default: use configured retention)

Examples:

# Use default retention (configured in database)
whisper-dictate logs cleanup

# Delete logs older than 7 days
whisper-dictate logs cleanup --days 7

History Management

Search, view, and manage your transcription history stored in the SQLite database.

List History

whisper-dictate history list [OPTIONS]

List recent transcriptions with pagination.

Options:

--limit N - Maximum number to display (default: 20)
--date YYYY-MM-DD - Filter by specific date

Examples:

# List recent transcriptions
whisper-dictate history list

# Show last 10
whisper-dictate history list --limit 10

# Filter by date
whisper-dictate history list --date 2024-03-15

Show Transcription Details

whisper-dictate history show ID [OPTIONS]

Show full details of a specific transcription.

Options:

--audio - Show the audio file path

Examples:

# Show transcription details
whisper-dictate history show 42

# Include audio file path
whisper-dictate history show 42 --audio

Search Transcriptions

whisper-dictate history search QUERY [OPTIONS]

Search transcriptions by text (case-insensitive).

Options:

--limit N - Maximum number of results (default: 20)

Examples:

# Search for "meeting"
whisper-dictate history search "meeting"

# Search with more results
whisper-dictate history search "project" --limit 50

Delete Transcription

whisper-dictate history delete ID [OPTIONS]

Delete a transcription and its associated audio file.

Options:

--yes - Skip confirmation prompt

Examples:

# Delete with confirmation
whisper-dictate history delete 42

# Delete without confirmation
whisper-dictate history delete 42 --yes

Update Transcription

whisper-dictate history update ID [OPTIONS]

Update a transcription's text and optionally language.

Options:

--text "NEW TEXT" - New transcript text (required)
--language CODE - New language code (optional, e.g., "en", "es")

Examples:

# Update text only
whisper-dictate history update 123 --text "corrected transcription"

# Update text and language
whisper-dictate history update 123 --text "new text" --language en

Audio Management

Cleanup Orphaned Files

whisper-dictate audio cleanup [OPTIONS]

Clean up orphaned audio files not referenced in the database.

Options:

--dry-run - Show what would be deleted without actually deleting (default: True)
--confirm - Actually delete the orphaned files (default: False)

Examples:

# Preview what would be deleted (default)
whisper-dictate audio cleanup

# Actually delete orphaned files
whisper-dictate audio cleanup --confirm

Migration

Migrate Legacy State Files

whisper-dictate migrate [OPTIONS]

Migrate legacy state files to the database.

Options:

--force - Force re-migration even if already completed
--status - Check migration status only

Examples:

# Check migration status
whisper-dictate migrate --status

# Run migration
whisper-dictate migrate

# Force re-migration
whisper-dictate migrate --force

Configuration

Environment Variables

Whisper Provider Configuration

All Whisper settings are configurable via environment variables. Switch providers without touching code:

Variable	Description	Default
`WHISPER_PROVIDER`	Provider: `openai`, `groq`, `together`, `deepinfra`, `custom`	`openai`
`WHISPER_API_KEY`	API key for the selected provider	(required)
`WHISPER_BASE_URL`	Custom API base URL (for `custom` provider)	Provider default
`WHISPER_MODEL`	Model name to use	`whisper-1`
`WHISPER_LANGUAGE`	Language code (e.g., `en`, `es`, `auto`)	`auto`
`WHISPER_TEMPERATURE`	Sampling temperature (0.0-1.0)	`0.0`
`WHISPER_TIMEOUT`	API request timeout in seconds	`60`

Supported Providers

Provider	`WHISPER_PROVIDER`	Default Model	Notes
OpenAI	`openai`	`whisper-1`	Default provider
Groq	`groq`	`whisper-large-v3-turbo`	Fast inference
Together AI	`together`	`whisper-large-v3`
DeepInfra	`deepinfra`	`Whisper/whisper-large-v3`
whisper.cpp	`custom`	`whisper-large-v3`	Set `WHISPER_BASE_URL` to your server
faster-whisper-server	`custom`	`large-v3`	Set `WHISPER_BASE_URL` to your server

Provider-Specific API Key Fallbacks

If WHISPER_API_KEY is not set, the system falls back to provider-specific env vars:

OPENAI_API_KEY (for openai)
GROQ_API_KEY (for groq)
TOGETHER_API_KEY (for together)
DEEPINFRA_API_KEY (for deepinfra)

Other Configuration

Variable	Description	Default
`LOG_LEVEL`	Logging level (DEBUG, INFO, WARNING, ERROR)	`INFO`
`LOG_RETENTION_DAYS`	Days to keep logs	`30`
`MIN_FREE_SPACE_MB`	Minimum free disk space for recording	`100`
`MP3_ENABLED`	Enable/disable MP3 conversion	`true`
`MP3_BITRATE`	MP3 encoding bitrate: `64k`, `128k`, `192k`	`128k`
`KEEP_WAV`	Keep original WAV after MP3 conversion	`false`

Database Configuration

The CLI uses a SQLite database stored at:

~/.local/share/whisper-dictate/whisper-dictate.db

Configuration options (in .env):

LOG_RETENTION_DAYS - Days to keep logs (default: 30)
MIN_FREE_SPACE_MB - Minimum free disk space required for recording (default: 100)

Audio Settings

The CLI uses sensible defaults:

Sample rate: 16kHz (optimal for Whisper)
Channels: 1 (mono)
Format: 16-bit WAV (converted to MP3 before upload)

MP3 Conversion

WAV files are large (~10MB per minute at 44.1kHz stereo) but the Whisper API supports MP3 natively. By default, audio is automatically converted to MP3 before upload, achieving 80-90% file size reduction with no impact on transcription quality for speech.

Configuration options (in .env):

MP3_ENABLED - Enable/disable MP3 conversion (default: true)
MP3_BITRATE - MP3 encoding bitrate: '64k', '128k', '192k' (default: '128k')
KEEP_WAV - Keep original WAV after MP3 conversion (default: false)

Example .env configuration:

OPENAI_API_KEY=your-api-key-here
MP3_ENABLED=true
MP3_BITRATE=128k
KEEP_WAV=false

Bitrate Guide:

Bitrate	Quality	Size	Use Case
64k	Good	Smallest	Low bandwidth, voice notes
128k	Excellent	Moderate	Recommended for most users
192k	Best	Largest	High quality requirements

Provider Examples

See .env.example in the project root for ready-to-use configurations for all supported providers. Simply copy the block for your provider into your .env file.

Notification Action Buttons

When recording, the notification displays a "Stop Recording" action button. There are two ways to use it:

Prerequisites

Install dunst and dmenu (required for action buttons):

# Arch Linux
sudo pacman -S dunst dmenu

# Debian/Ubuntu
sudo apt-get install dunst dmenu

Start dunst notification daemon if not already running:
```
dunst &
```

i3 Configuration for Context Menu

To use the notification action button, you need to configure a keybinding for dunst's context menu in your i3 config (~/.config/i3/config):

# Dunst context menu keybinding (required for notification actions)
bindsym Ctrl+Shift+. exec dunstctl context

Reload i3 after adding:

i3-msg reload

How to Use Action Buttons

Method 1: Click the notification (if supported)

Some dunst configurations support clicking action buttons directly on the notification

Method 2: Use the context menu

While recording, press Ctrl+Shift+. to open dunst's context menu
Select "Stop Recording" from the menu
The recording will stop and transcription will begin

Note: The context menu keybinding (Ctrl+Shift+.) is configurable in your dunst configuration. Check your dunstrc for the shortcut setting under [global] or [keybind] sections.

Troubleshooting

Dependencies Missing

# Install missing Python packages
uv sync
# Or: pip install -e .

# Install system packages
sudo pacman -S python-pip ffmpeg portaudio

FFmpeg Installation Issues

FFmpeg is required for MP3 audio conversion. If you see warnings about FFmpeg not being found, follow these steps:

Verify FFmpeg is Installed

# Check if FFmpeg is installed and its version
ffmpeg -version

# Should output something like:
# ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers

Install FFmpeg

# Arch Linux
sudo pacman -S ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Fedora
sudo dnf install ffmpeg

# macOS
brew install ffmpeg

If FFmpeg is Installed but Not Working

# Check if FFmpeg is in your PATH
which ffmpeg

# Verify FFmpeg can be found by Python
python -c "import subprocess; subprocess.run(['ffmpeg', '-version'])"

Disable MP3 Conversion if FFmpeg is Unavailable

If FFmpeg cannot be installed or is not working, you can disable MP3 conversion to continue using WAV files (note: this will result in larger file sizes):

# Set in your .env file
MP3_ENABLED=false

The system will continue to function with WAV files, but uploads will be larger (~10MB per minute at 16kHz mono).

Graceful Degradation

If FFmpeg is missing during runtime:

The system logs a warning message
Returns the original WAV path
Continues to function normally with larger file sizes
Transcription quality remains unchanged

No Audio Devices

# List available audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"

Clipboard Not Working

# Install clipboard tools
sudo pacman -S xclip xsel wl-clipboard

Debug Mode

# Run with debug logging
whisper-dictate --log-level DEBUG dictate

View Application Logs

# Tail the log file
tail -f ~/.local/share/whisper-dictate/whisper-dictate.log

# Or use the CLI to query logs
whisper-dictate logs list --level DEBUG

Database Issues

# Check database location
ls -la ~/.local/share/whisper-dictate/

# Check migration status
whisper-dictate migrate --status

# Re-run migration if needed
whisper-dictate migrate --force

Key Files

whisper_dictate/ - Main Python package
- cli.py - Click-based CLI interface
- dictation.py - Core dictation service
- database.py - SQLite database operations
- transcription.py - Whisper API abstraction and provider implementations
- notifications.py - System notifications
- clipboard.py - Clipboard integration
setup.py - Package installation configuration
main.py - CLI entry point script
.env - Environment configuration

Usage Tips

Speak clearly and at normal pace
Use in quiet environment for best results
Test with short recordings first
Check system notifications for status updates
Use history commands to review past transcriptions
Use logs commands for debugging issues

The CLI is designed to be fast and responsive, with minimal latency between key press and recording start/stop.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.memories		.memories
.opencode		.opencode
.specify		.specify
docs/adr		docs/adr
openspec		openspec
specs		specs
tests		tests
whisper_dictate		whisper_dictate
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
generate_run_script.sh		generate_run_script.sh
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_i3.sh		setup_i3.sh
toggle_dictate.py		toggle_dictate.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Whisper Dictate - Global Key Binding

Features

Prerequisites

FFmpeg (Required for MP3 audio support)

Quick Start

Package Manager

1. Install the Package

2. Configure Your Whisper Provider

3. Add global bind to the CLI

4. Test the CLI

CLI Usage

Global Options

Main Commands

Dictation

System Information

Logs Management

List Logs

Export Logs

Cleanup Logs

History Management

List History

Show Transcription Details

Search Transcriptions

Delete Transcription

Update Transcription

Audio Management

Cleanup Orphaned Files

Migration

Migrate Legacy State Files

Configuration

Environment Variables

Whisper Provider Configuration

Supported Providers

Provider-Specific API Key Fallbacks

Other Configuration

Database Configuration

Audio Settings

MP3 Conversion

Provider Examples

Notification Action Buttons

Prerequisites

i3 Configuration for Context Menu

How to Use Action Buttons

Troubleshooting

Dependencies Missing

FFmpeg Installation Issues

Verify FFmpeg is Installed

Install FFmpeg

If FFmpeg is Installed but Not Working

Disable MP3 Conversion if FFmpeg is Unavailable

Graceful Degradation

No Audio Devices

Clipboard Not Working

Debug Mode

View Application Logs

Database Issues

Key Files

Usage Tips

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages