A Python CLI for voice dictation using Whisper API (OpenAI-compatible providers) with global key binding support on Arch Linux.
- 🎤 Toggle Recording: Single key press to start/stop recording
- 🧠 AI Transcription: Uses Whisper API (OpenAI, Groq, Together AI, DeepInfra, local whisper.cpp, and more) for accurate speech-to-text
- 📋 Clipboard Integration: Automatically copies transcription to clipboard
- 🔔 System Notifications: Visual feedback via notify-send
- ⚡ Fast Response: Minimal latency for real-time usage
- 📊 Persistent History: SQLite database stores all transcriptions and logs
- 🔍 CLI Management: Full command-line interface for managing transcriptions and logs
FFmpeg is required for MP3 encoding and conversion. Install it via your package manager:
# Arch Linux
sudo pacman -S ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpegThis project uses uv for fast Python package management. Install it if you haven't already:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | shIf you prefer pip, you can still use pip install -e . instead of uv sync.
# Using uv (recommended)
uv sync
# Or using pip
pip install -e .Create a .env file in the project directory (copy from .env.example):
cp .env.example .envThen edit .env with your provider settings. By default, it uses OpenAI:
WHISPER_PROVIDER=openai
WHISPER_API_KEY="your-api-key-here"
WHISPER_MODEL=whisper-1Or use any OpenAI-compatible provider — just change the environment variables, no code changes needed:
# Example: Groq
WHISPER_PROVIDER=groq
WHISPER_API_KEY="your-groq-api-key"
WHISPER_MODEL=whisper-large-v3-turbo
# Example: DeepInfra
WHISPER_PROVIDER=deepinfra
WHISPER_API_KEY="your-deepinfra-api-key"
WHISPER_MODEL=Whisper/whisper-large-v3
WHISPER_BASE_URL=https://api.deepinfra.com/v1/openaiSee .env.example for all supported providers and configuration options.
Run the setup script to configure i3:
# This will modify your i3 config to add the key binding
./setup_i3.shOr manually add to your i3 config (~/.config/i3/config):
# Bind whisper dictate (using mod+z)
bindsym $mod+z exec whisper-dictate dictate# Check system info
whisper-dictate info
# Run a quick dictation test
whisper-dictate dictateThe whisper-dictate CLI provides multiple subcommands for dictation, logs management, history management, and audio maintenance.
| Option | Description |
|---|---|
--log-level LEVEL |
Set logging level (DEBUG, INFO, WARNING, ERROR). Default: INFO |
whisper-dictate dictate [--duration SECONDS]Record audio and transcribe it to text.
Options:
--duration SECONDS- Optional recording duration (default: unlimited, stop with Ctrl+C)
Example:
# Record until Ctrl+C
whisper-dictate dictate
# Record for 30 seconds
whisper-dictate dictate --duration 30
# With debug logging
whisper-dictate --log-level DEBUG dictatewhisper-dictate infoDisplay system information including audio devices, clipboard tools, and configuration.
Example:
whisper-dictate info
# Output:
# 🔍 System Information:
# ========================================
#
# 🎤 Audio Devices:
# • Built-in Microphone
# • USB Microphone
#
# 📋 Clipboard Tools:
# • xclip
#
# ⚙️ Configuration:
# • model: base
# • language: auto
#
# 📊 Logging:
# • Log file: /home/user/.local/share/whisper-dictate/whisper-dictate.log
# • View logs: tail -f /home/user/.local/share/whisper-dictate/whisper-dictate.logThe CLI provides comprehensive log management with database-backed logging and configurable retention.
whisper-dictate logs list [OPTIONS]Query application logs with filters.
Options:
--level LEVEL- Filter by log level (DEBUG, INFO, WARNING, ERROR)--source SOURCE- Filter by source module (e.g.,whisper_dictate.audio)--from-time TIME- Filter from timestamp (ISO format: YYYY-MM-DD HH:MM:SS)--to-time TIME- Filter to timestamp (ISO format: YYYY-MM-DD HH:MM:SS)--limit N- Maximum number of logs to display (default: 100)
Examples:
# List recent logs
whisper-dictate logs list
# Show only errors
whisper-dictate logs list --level ERROR
# Filter by source module
whisper-dictate logs list --source whisper_dictate.audio --limit 50
# Filter by date range
whisper-dictate logs list --from-time "2024-01-01" --to-time "2024-01-31"whisper-dictate logs export FILENAME [OPTIONS]Export logs to a file in text or JSON format.
Options:
--format FORMAT- Export format: text or json (default: text)- All filter options from
logs listare available
Examples:
# Export to text file
whisper-dictate logs export error_logs.txt --level ERROR
# Export to JSON
whisper-dictate logs export logs.json --format jsonwhisper-dictate logs cleanup [OPTIONS]Clean up old logs based on retention policy.
Options:
--days N- Delete logs older than N days (default: use configured retention)
Examples:
# Use default retention (configured in database)
whisper-dictate logs cleanup
# Delete logs older than 7 days
whisper-dictate logs cleanup --days 7Search, view, and manage your transcription history stored in the SQLite database.
whisper-dictate history list [OPTIONS]List recent transcriptions with pagination.
Options:
--limit N- Maximum number to display (default: 20)--date YYYY-MM-DD- Filter by specific date
Examples:
# List recent transcriptions
whisper-dictate history list
# Show last 10
whisper-dictate history list --limit 10
# Filter by date
whisper-dictate history list --date 2024-03-15whisper-dictate history show ID [OPTIONS]Show full details of a specific transcription.
Options:
--audio- Show the audio file path
Examples:
# Show transcription details
whisper-dictate history show 42
# Include audio file path
whisper-dictate history show 42 --audiowhisper-dictate history search QUERY [OPTIONS]Search transcriptions by text (case-insensitive).
Options:
--limit N- Maximum number of results (default: 20)
Examples:
# Search for "meeting"
whisper-dictate history search "meeting"
# Search with more results
whisper-dictate history search "project" --limit 50whisper-dictate history delete ID [OPTIONS]Delete a transcription and its associated audio file.
Options:
--yes- Skip confirmation prompt
Examples:
# Delete with confirmation
whisper-dictate history delete 42
# Delete without confirmation
whisper-dictate history delete 42 --yeswhisper-dictate history update ID [OPTIONS]Update a transcription's text and optionally language.
Options:
--text "NEW TEXT"- New transcript text (required)--language CODE- New language code (optional, e.g., "en", "es")
Examples:
# Update text only
whisper-dictate history update 123 --text "corrected transcription"
# Update text and language
whisper-dictate history update 123 --text "new text" --language enwhisper-dictate audio cleanup [OPTIONS]Clean up orphaned audio files not referenced in the database.
Options:
--dry-run- Show what would be deleted without actually deleting (default: True)--confirm- Actually delete the orphaned files (default: False)
Examples:
# Preview what would be deleted (default)
whisper-dictate audio cleanup
# Actually delete orphaned files
whisper-dictate audio cleanup --confirmwhisper-dictate migrate [OPTIONS]Migrate legacy state files to the database.
Options:
--force- Force re-migration even if already completed--status- Check migration status only
Examples:
# Check migration status
whisper-dictate migrate --status
# Run migration
whisper-dictate migrate
# Force re-migration
whisper-dictate migrate --forceAll Whisper settings are configurable via environment variables. Switch providers without touching code:
| Variable | Description | Default |
|---|---|---|
WHISPER_PROVIDER |
Provider: openai, groq, together, deepinfra, custom |
openai |
WHISPER_API_KEY |
API key for the selected provider | (required) |
WHISPER_BASE_URL |
Custom API base URL (for custom provider) |
Provider default |
WHISPER_MODEL |
Model name to use | whisper-1 |
WHISPER_LANGUAGE |
Language code (e.g., en, es, auto) |
auto |
WHISPER_TEMPERATURE |
Sampling temperature (0.0-1.0) | 0.0 |
WHISPER_TIMEOUT |
API request timeout in seconds | 60 |
| Provider | WHISPER_PROVIDER |
Default Model | Notes |
|---|---|---|---|
| OpenAI | openai |
whisper-1 |
Default provider |
| Groq | groq |
whisper-large-v3-turbo |
Fast inference |
| Together AI | together |
whisper-large-v3 |
|
| DeepInfra | deepinfra |
Whisper/whisper-large-v3 |
|
| whisper.cpp | custom |
whisper-large-v3 |
Set WHISPER_BASE_URL to your server |
| faster-whisper-server | custom |
large-v3 |
Set WHISPER_BASE_URL to your server |
If WHISPER_API_KEY is not set, the system falls back to provider-specific env vars:
OPENAI_API_KEY(foropenai)GROQ_API_KEY(forgroq)TOGETHER_API_KEY(fortogether)DEEPINFRA_API_KEY(fordeepinfra)
| Variable | Description | Default |
|---|---|---|
LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
LOG_RETENTION_DAYS |
Days to keep logs | 30 |
MIN_FREE_SPACE_MB |
Minimum free disk space for recording | 100 |
MP3_ENABLED |
Enable/disable MP3 conversion | true |
MP3_BITRATE |
MP3 encoding bitrate: 64k, 128k, 192k |
128k |
KEEP_WAV |
Keep original WAV after MP3 conversion | false |
The CLI uses a SQLite database stored at:
~/.local/share/whisper-dictate/whisper-dictate.db
Configuration options (in .env):
LOG_RETENTION_DAYS- Days to keep logs (default: 30)MIN_FREE_SPACE_MB- Minimum free disk space required for recording (default: 100)
The CLI uses sensible defaults:
- Sample rate: 16kHz (optimal for Whisper)
- Channels: 1 (mono)
- Format: 16-bit WAV (converted to MP3 before upload)
WAV files are large (~10MB per minute at 44.1kHz stereo) but the Whisper API supports MP3 natively. By default, audio is automatically converted to MP3 before upload, achieving 80-90% file size reduction with no impact on transcription quality for speech.
Configuration options (in .env):
MP3_ENABLED- Enable/disable MP3 conversion (default: true)MP3_BITRATE- MP3 encoding bitrate: '64k', '128k', '192k' (default: '128k')KEEP_WAV- Keep original WAV after MP3 conversion (default: false)
Example .env configuration:
OPENAI_API_KEY=your-api-key-here
MP3_ENABLED=true
MP3_BITRATE=128k
KEEP_WAV=falseBitrate Guide:
| Bitrate | Quality | Size | Use Case |
|---|---|---|---|
| 64k | Good | Smallest | Low bandwidth, voice notes |
| 128k | Excellent | Moderate | Recommended for most users |
| 192k | Best | Largest | High quality requirements |
See .env.example in the project root for ready-to-use configurations for all supported providers. Simply copy the block for your provider into your .env file.
When recording, the notification displays a "Stop Recording" action button. There are two ways to use it:
-
Install dunst and dmenu (required for action buttons):
# Arch Linux sudo pacman -S dunst dmenu # Debian/Ubuntu sudo apt-get install dunst dmenu
-
Start dunst notification daemon if not already running:
dunst &
To use the notification action button, you need to configure a keybinding for dunst's context menu in your i3 config (~/.config/i3/config):
# Dunst context menu keybinding (required for notification actions)
bindsym Ctrl+Shift+. exec dunstctl contextReload i3 after adding:
i3-msg reloadMethod 1: Click the notification (if supported)
- Some dunst configurations support clicking action buttons directly on the notification
Method 2: Use the context menu
- While recording, press
Ctrl+Shift+.to open dunst's context menu - Select "Stop Recording" from the menu
- The recording will stop and transcription will begin
Note: The context menu keybinding (Ctrl+Shift+.) is configurable in your dunst configuration. Check your dunstrc for the shortcut setting under [global] or [keybind] sections.
# Install missing Python packages
uv sync
# Or: pip install -e .
# Install system packages
sudo pacman -S python-pip ffmpeg portaudioFFmpeg is required for MP3 audio conversion. If you see warnings about FFmpeg not being found, follow these steps:
# Check if FFmpeg is installed and its version
ffmpeg -version
# Should output something like:
# ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers# Arch Linux
sudo pacman -S ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Fedora
sudo dnf install ffmpeg
# macOS
brew install ffmpeg# Check if FFmpeg is in your PATH
which ffmpeg
# Verify FFmpeg can be found by Python
python -c "import subprocess; subprocess.run(['ffmpeg', '-version'])"If FFmpeg cannot be installed or is not working, you can disable MP3 conversion to continue using WAV files (note: this will result in larger file sizes):
# Set in your .env file
MP3_ENABLED=falseThe system will continue to function with WAV files, but uploads will be larger (~10MB per minute at 16kHz mono).
If FFmpeg is missing during runtime:
- The system logs a warning message
- Returns the original WAV path
- Continues to function normally with larger file sizes
- Transcription quality remains unchanged
# List available audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"# Install clipboard tools
sudo pacman -S xclip xsel wl-clipboard# Run with debug logging
whisper-dictate --log-level DEBUG dictate# Tail the log file
tail -f ~/.local/share/whisper-dictate/whisper-dictate.log
# Or use the CLI to query logs
whisper-dictate logs list --level DEBUG# Check database location
ls -la ~/.local/share/whisper-dictate/
# Check migration status
whisper-dictate migrate --status
# Re-run migration if needed
whisper-dictate migrate --forcewhisper_dictate/- Main Python packagecli.py- Click-based CLI interfacedictation.py- Core dictation servicedatabase.py- SQLite database operationstranscription.py- Whisper API abstraction and provider implementationsnotifications.py- System notificationsclipboard.py- Clipboard integration
setup.py- Package installation configurationmain.py- CLI entry point script.env- Environment configuration
- Speak clearly and at normal pace
- Use in quiet environment for best results
- Test with short recordings first
- Check system notifications for status updates
- Use
historycommands to review past transcriptions - Use
logscommands for debugging issues
The CLI is designed to be fast and responsive, with minimal latency between key press and recording start/stop.