Skip to content

seszele64/i3-arch-whisper-dictate

Repository files navigation

Whisper Dictate - Global Key Binding

A Python CLI for voice dictation using Whisper API (OpenAI-compatible providers) with global key binding support on Arch Linux.

Features

  • 🎤 Toggle Recording: Single key press to start/stop recording
  • 🧠 AI Transcription: Uses Whisper API (OpenAI, Groq, Together AI, DeepInfra, local whisper.cpp, and more) for accurate speech-to-text
  • 📋 Clipboard Integration: Automatically copies transcription to clipboard
  • 🔔 System Notifications: Visual feedback via notify-send
  • Fast Response: Minimal latency for real-time usage
  • 📊 Persistent History: SQLite database stores all transcriptions and logs
  • 🔍 CLI Management: Full command-line interface for managing transcriptions and logs

Prerequisites

FFmpeg (Required for MP3 audio support)

FFmpeg is required for MP3 encoding and conversion. Install it via your package manager:

# Arch Linux
sudo pacman -S ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

Quick Start

Package Manager

This project uses uv for fast Python package management. Install it if you haven't already:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

If you prefer pip, you can still use pip install -e . instead of uv sync.

1. Install the Package

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

2. Configure Your Whisper Provider

Create a .env file in the project directory (copy from .env.example):

cp .env.example .env

Then edit .env with your provider settings. By default, it uses OpenAI:

WHISPER_PROVIDER=openai
WHISPER_API_KEY="your-api-key-here"
WHISPER_MODEL=whisper-1

Or use any OpenAI-compatible provider — just change the environment variables, no code changes needed:

# Example: Groq
WHISPER_PROVIDER=groq
WHISPER_API_KEY="your-groq-api-key"
WHISPER_MODEL=whisper-large-v3-turbo

# Example: DeepInfra
WHISPER_PROVIDER=deepinfra
WHISPER_API_KEY="your-deepinfra-api-key"
WHISPER_MODEL=Whisper/whisper-large-v3
WHISPER_BASE_URL=https://api.deepinfra.com/v1/openai

See .env.example for all supported providers and configuration options.

3. Add global bind to the CLI

Run the setup script to configure i3:

# This will modify your i3 config to add the key binding
./setup_i3.sh

Or manually add to your i3 config (~/.config/i3/config):

# Bind whisper dictate (using mod+z)
bindsym $mod+z exec whisper-dictate dictate

4. Test the CLI

# Check system info
whisper-dictate info

# Run a quick dictation test
whisper-dictate dictate

CLI Usage

The whisper-dictate CLI provides multiple subcommands for dictation, logs management, history management, and audio maintenance.

Global Options

Option Description
--log-level LEVEL Set logging level (DEBUG, INFO, WARNING, ERROR). Default: INFO

Main Commands

Dictation

whisper-dictate dictate [--duration SECONDS]

Record audio and transcribe it to text.

Options:

  • --duration SECONDS - Optional recording duration (default: unlimited, stop with Ctrl+C)

Example:

# Record until Ctrl+C
whisper-dictate dictate

# Record for 30 seconds
whisper-dictate dictate --duration 30

# With debug logging
whisper-dictate --log-level DEBUG dictate

System Information

whisper-dictate info

Display system information including audio devices, clipboard tools, and configuration.

Example:

whisper-dictate info
# Output:
# 🔍 System Information:
# ========================================
# 
# 🎤 Audio Devices:
#   • Built-in Microphone
#   • USB Microphone
# 
# 📋 Clipboard Tools:
#   • xclip
# 
# ⚙️  Configuration:
#   • model: base
#   • language: auto
# 
# 📊 Logging:
#   • Log file: /home/user/.local/share/whisper-dictate/whisper-dictate.log
#   • View logs: tail -f /home/user/.local/share/whisper-dictate/whisper-dictate.log

Logs Management

The CLI provides comprehensive log management with database-backed logging and configurable retention.

List Logs

whisper-dictate logs list [OPTIONS]

Query application logs with filters.

Options:

  • --level LEVEL - Filter by log level (DEBUG, INFO, WARNING, ERROR)
  • --source SOURCE - Filter by source module (e.g., whisper_dictate.audio)
  • --from-time TIME - Filter from timestamp (ISO format: YYYY-MM-DD HH:MM:SS)
  • --to-time TIME - Filter to timestamp (ISO format: YYYY-MM-DD HH:MM:SS)
  • --limit N - Maximum number of logs to display (default: 100)

Examples:

# List recent logs
whisper-dictate logs list

# Show only errors
whisper-dictate logs list --level ERROR

# Filter by source module
whisper-dictate logs list --source whisper_dictate.audio --limit 50

# Filter by date range
whisper-dictate logs list --from-time "2024-01-01" --to-time "2024-01-31"

Export Logs

whisper-dictate logs export FILENAME [OPTIONS]

Export logs to a file in text or JSON format.

Options:

  • --format FORMAT - Export format: text or json (default: text)
  • All filter options from logs list are available

Examples:

# Export to text file
whisper-dictate logs export error_logs.txt --level ERROR

# Export to JSON
whisper-dictate logs export logs.json --format json

Cleanup Logs

whisper-dictate logs cleanup [OPTIONS]

Clean up old logs based on retention policy.

Options:

  • --days N - Delete logs older than N days (default: use configured retention)

Examples:

# Use default retention (configured in database)
whisper-dictate logs cleanup

# Delete logs older than 7 days
whisper-dictate logs cleanup --days 7

History Management

Search, view, and manage your transcription history stored in the SQLite database.

List History

whisper-dictate history list [OPTIONS]

List recent transcriptions with pagination.

Options:

  • --limit N - Maximum number to display (default: 20)
  • --date YYYY-MM-DD - Filter by specific date

Examples:

# List recent transcriptions
whisper-dictate history list

# Show last 10
whisper-dictate history list --limit 10

# Filter by date
whisper-dictate history list --date 2024-03-15

Show Transcription Details

whisper-dictate history show ID [OPTIONS]

Show full details of a specific transcription.

Options:

  • --audio - Show the audio file path

Examples:

# Show transcription details
whisper-dictate history show 42

# Include audio file path
whisper-dictate history show 42 --audio

Search Transcriptions

whisper-dictate history search QUERY [OPTIONS]

Search transcriptions by text (case-insensitive).

Options:

  • --limit N - Maximum number of results (default: 20)

Examples:

# Search for "meeting"
whisper-dictate history search "meeting"

# Search with more results
whisper-dictate history search "project" --limit 50

Delete Transcription

whisper-dictate history delete ID [OPTIONS]

Delete a transcription and its associated audio file.

Options:

  • --yes - Skip confirmation prompt

Examples:

# Delete with confirmation
whisper-dictate history delete 42

# Delete without confirmation
whisper-dictate history delete 42 --yes

Update Transcription

whisper-dictate history update ID [OPTIONS]

Update a transcription's text and optionally language.

Options:

  • --text "NEW TEXT" - New transcript text (required)
  • --language CODE - New language code (optional, e.g., "en", "es")

Examples:

# Update text only
whisper-dictate history update 123 --text "corrected transcription"

# Update text and language
whisper-dictate history update 123 --text "new text" --language en

Audio Management

Cleanup Orphaned Files

whisper-dictate audio cleanup [OPTIONS]

Clean up orphaned audio files not referenced in the database.

Options:

  • --dry-run - Show what would be deleted without actually deleting (default: True)
  • --confirm - Actually delete the orphaned files (default: False)

Examples:

# Preview what would be deleted (default)
whisper-dictate audio cleanup

# Actually delete orphaned files
whisper-dictate audio cleanup --confirm

Migration

Migrate Legacy State Files

whisper-dictate migrate [OPTIONS]

Migrate legacy state files to the database.

Options:

  • --force - Force re-migration even if already completed
  • --status - Check migration status only

Examples:

# Check migration status
whisper-dictate migrate --status

# Run migration
whisper-dictate migrate

# Force re-migration
whisper-dictate migrate --force

Configuration

Environment Variables

Whisper Provider Configuration

All Whisper settings are configurable via environment variables. Switch providers without touching code:

Variable Description Default
WHISPER_PROVIDER Provider: openai, groq, together, deepinfra, custom openai
WHISPER_API_KEY API key for the selected provider (required)
WHISPER_BASE_URL Custom API base URL (for custom provider) Provider default
WHISPER_MODEL Model name to use whisper-1
WHISPER_LANGUAGE Language code (e.g., en, es, auto) auto
WHISPER_TEMPERATURE Sampling temperature (0.0-1.0) 0.0
WHISPER_TIMEOUT API request timeout in seconds 60

Supported Providers

Provider WHISPER_PROVIDER Default Model Notes
OpenAI openai whisper-1 Default provider
Groq groq whisper-large-v3-turbo Fast inference
Together AI together whisper-large-v3
DeepInfra deepinfra Whisper/whisper-large-v3
whisper.cpp custom whisper-large-v3 Set WHISPER_BASE_URL to your server
faster-whisper-server custom large-v3 Set WHISPER_BASE_URL to your server

Provider-Specific API Key Fallbacks

If WHISPER_API_KEY is not set, the system falls back to provider-specific env vars:

  • OPENAI_API_KEY (for openai)
  • GROQ_API_KEY (for groq)
  • TOGETHER_API_KEY (for together)
  • DEEPINFRA_API_KEY (for deepinfra)

Other Configuration

Variable Description Default
LOG_LEVEL Logging level (DEBUG, INFO, WARNING, ERROR) INFO
LOG_RETENTION_DAYS Days to keep logs 30
MIN_FREE_SPACE_MB Minimum free disk space for recording 100
MP3_ENABLED Enable/disable MP3 conversion true
MP3_BITRATE MP3 encoding bitrate: 64k, 128k, 192k 128k
KEEP_WAV Keep original WAV after MP3 conversion false

Database Configuration

The CLI uses a SQLite database stored at:

~/.local/share/whisper-dictate/whisper-dictate.db

Configuration options (in .env):

  • LOG_RETENTION_DAYS - Days to keep logs (default: 30)
  • MIN_FREE_SPACE_MB - Minimum free disk space required for recording (default: 100)

Audio Settings

The CLI uses sensible defaults:

  • Sample rate: 16kHz (optimal for Whisper)
  • Channels: 1 (mono)
  • Format: 16-bit WAV (converted to MP3 before upload)

MP3 Conversion

WAV files are large (~10MB per minute at 44.1kHz stereo) but the Whisper API supports MP3 natively. By default, audio is automatically converted to MP3 before upload, achieving 80-90% file size reduction with no impact on transcription quality for speech.

Configuration options (in .env):

  • MP3_ENABLED - Enable/disable MP3 conversion (default: true)
  • MP3_BITRATE - MP3 encoding bitrate: '64k', '128k', '192k' (default: '128k')
  • KEEP_WAV - Keep original WAV after MP3 conversion (default: false)

Example .env configuration:

OPENAI_API_KEY=your-api-key-here
MP3_ENABLED=true
MP3_BITRATE=128k
KEEP_WAV=false

Bitrate Guide:

Bitrate Quality Size Use Case
64k Good Smallest Low bandwidth, voice notes
128k Excellent Moderate Recommended for most users
192k Best Largest High quality requirements

Provider Examples

See .env.example in the project root for ready-to-use configurations for all supported providers. Simply copy the block for your provider into your .env file.


Notification Action Buttons

When recording, the notification displays a "Stop Recording" action button. There are two ways to use it:

Prerequisites

  1. Install dunst and dmenu (required for action buttons):

    # Arch Linux
    sudo pacman -S dunst dmenu
    
    # Debian/Ubuntu
    sudo apt-get install dunst dmenu
  2. Start dunst notification daemon if not already running:

    dunst &

i3 Configuration for Context Menu

To use the notification action button, you need to configure a keybinding for dunst's context menu in your i3 config (~/.config/i3/config):

# Dunst context menu keybinding (required for notification actions)
bindsym Ctrl+Shift+. exec dunstctl context

Reload i3 after adding:

i3-msg reload

How to Use Action Buttons

Method 1: Click the notification (if supported)

  • Some dunst configurations support clicking action buttons directly on the notification

Method 2: Use the context menu

  1. While recording, press Ctrl+Shift+. to open dunst's context menu
  2. Select "Stop Recording" from the menu
  3. The recording will stop and transcription will begin

Note: The context menu keybinding (Ctrl+Shift+.) is configurable in your dunst configuration. Check your dunstrc for the shortcut setting under [global] or [keybind] sections.


Troubleshooting

Dependencies Missing

# Install missing Python packages
uv sync
# Or: pip install -e .

# Install system packages
sudo pacman -S python-pip ffmpeg portaudio

FFmpeg Installation Issues

FFmpeg is required for MP3 audio conversion. If you see warnings about FFmpeg not being found, follow these steps:

Verify FFmpeg is Installed

# Check if FFmpeg is installed and its version
ffmpeg -version

# Should output something like:
# ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers

Install FFmpeg

# Arch Linux
sudo pacman -S ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Fedora
sudo dnf install ffmpeg

# macOS
brew install ffmpeg

If FFmpeg is Installed but Not Working

# Check if FFmpeg is in your PATH
which ffmpeg

# Verify FFmpeg can be found by Python
python -c "import subprocess; subprocess.run(['ffmpeg', '-version'])"

Disable MP3 Conversion if FFmpeg is Unavailable

If FFmpeg cannot be installed or is not working, you can disable MP3 conversion to continue using WAV files (note: this will result in larger file sizes):

# Set in your .env file
MP3_ENABLED=false

The system will continue to function with WAV files, but uploads will be larger (~10MB per minute at 16kHz mono).

Graceful Degradation

If FFmpeg is missing during runtime:

  • The system logs a warning message
  • Returns the original WAV path
  • Continues to function normally with larger file sizes
  • Transcription quality remains unchanged

No Audio Devices

# List available audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"

Clipboard Not Working

# Install clipboard tools
sudo pacman -S xclip xsel wl-clipboard

Debug Mode

# Run with debug logging
whisper-dictate --log-level DEBUG dictate

View Application Logs

# Tail the log file
tail -f ~/.local/share/whisper-dictate/whisper-dictate.log

# Or use the CLI to query logs
whisper-dictate logs list --level DEBUG

Database Issues

# Check database location
ls -la ~/.local/share/whisper-dictate/

# Check migration status
whisper-dictate migrate --status

# Re-run migration if needed
whisper-dictate migrate --force

Key Files

  • whisper_dictate/ - Main Python package
    • cli.py - Click-based CLI interface
    • dictation.py - Core dictation service
    • database.py - SQLite database operations
    • transcription.py - Whisper API abstraction and provider implementations
    • notifications.py - System notifications
    • clipboard.py - Clipboard integration
  • setup.py - Package installation configuration
  • main.py - CLI entry point script
  • .env - Environment configuration

Usage Tips

  • Speak clearly and at normal pace
  • Use in quiet environment for best results
  • Test with short recordings first
  • Check system notifications for status updates
  • Use history commands to review past transcriptions
  • Use logs commands for debugging issues

The CLI is designed to be fast and responsive, with minimal latency between key press and recording start/stop.

About

Voice-to-text dictation tool for i3wm using OpenAI Whisper, with transcription history and persistent storage

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors