Hands-free voice interface for Claude Code (and any terminal).
Talk to your terminal. It hears you.
mic → voice activity detection → local Whisper STT → terminal injection
Full duplex with Claude Code: Pair with voxtral-mcp for two-way voice conversations — you talk, Claude talks back. No cloud APIs. Fully local.
git clone https://github.com/ibliminse/autotalk.git
cd autotalk
# Set up environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
# Grant macOS permissions:
# System Settings > Privacy & Security > Microphone: allow Terminal/iTerm2
# System Settings > Privacy & Security > Accessibility: allow Terminal/iTerm2# Open Mic — always listening, speak naturally
./run.sh
# Target a specific app
./run.sh --target iTerm2
# Dry run — transcribe only, no injection
./run.sh --mode dry-run
# Better accuracy (slower)
./run.sh --model small.en
# List available microphones
./run.sh --list-devicesAlways listening. Speak naturally — 2 seconds of silence triggers transcription and injection. Designed for continuous conversation with Claude Code.
Background noise filtering uses dual-layer detection:
- WebRTC VAD at maximum aggressiveness filters non-speech
- Energy thresholding ignores quiet ambient noise
- Whisper hallucination filter catches phantom transcriptions ("Thank you for watching", etc.)
Hold a hotkey to record, release to transcribe. For noisy environments or when you want explicit control.
You ──mic──> autotalk ──text──> Claude Code ──text──> voxtral-mcp ──audio──> You
(speak) (STT) (paste) (thinks) (speak) (TTS) (listen)
Terminal A — start autotalk:
cd autotalk && ./run.shTerminal B — start Claude Code:
claude- You speak into your mic
- autotalk transcribes and pastes into Claude Code
- Claude Code processes your request
- Claude Code uses voxtral-mcp's
speaktool to read its response aloud - You hear Claude's response through your speakers
Install voxtral-mcp for the TTS half.
| Flag | Default | Description |
|---|---|---|
--mode |
paste |
Delivery method: paste (clipboard), keystroke (type), dry-run (print only) |
--device |
system default | Audio input device index (see --list-devices) |
--model |
base.en |
Whisper model: tiny.en, base.en, small.en, medium.en |
--target |
frontmost app | Target app for injection (e.g., Terminal, iTerm2) |
--vad |
3 |
VAD aggressiveness: 0 (least) to 3 (most) |
--list-devices |
List available audio devices and exit | |
--version |
Show version and exit |
- Mic capture —
sounddevice(PortAudio) captures 16kHz mono audio in 30ms frames - Voice Activity Detection —
webrtcvad(Google WebRTC) classifies each frame as speech/silence. A ring buffer triggers recording when 80% of recent frames contain speech - Energy gating — Per-frame RMS energy check prevents background noise from resetting the silence timer. Overall segment energy check skips quiet captures before they reach Whisper
- Speech-to-text —
faster-whisper(CTranslate2) runs Whisper locally on CPU with int8 quantization. No cloud API, no API key - Hallucination filter — Common Whisper phantom outputs ("You", "Thank you", "Thanks for watching") are caught and discarded
- Terminal injection — AppleScript pastes transcribed text into the target terminal via clipboard (or keystroke simulation)
| Feature | autotalk | hns | whis | speech2type |
|---|---|---|---|---|
| Full duplex (input + output) | Yes (with voxtral-mcp) | No | No | No |
| Local STT | Yes (Whisper) | Yes (Whisper) | Optional | No |
| Always-on mode | Yes | No | No | No |
| Terminal injection | Paste + keystroke | Clipboard | Clipboard | Clipboard |
| Claude Code integration | Native | No | No | No |
| Platform | macOS | macOS/Linux | macOS/Linux/Win | macOS |
- macOS (AppleScript dependency — Linux support planned)
- Python 3.11+
- Microphone access (grant in System Settings > Privacy > Microphone)
- Accessibility permission (grant in System Settings > Privacy > Accessibility)