Skip to content

ibliminse/autotalk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autotalk

Hands-free voice interface for Claude Code (and any terminal).

Talk to your terminal. It hears you.

mic → voice activity detection → local Whisper STT → terminal injection

Full duplex with Claude Code: Pair with voxtral-mcp for two-way voice conversations — you talk, Claude talks back. No cloud APIs. Fully local.

Demo

Install

git clone https://github.com/ibliminse/autotalk.git
cd autotalk

# Set up environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

# Grant macOS permissions:
# System Settings > Privacy & Security > Microphone: allow Terminal/iTerm2
# System Settings > Privacy & Security > Accessibility: allow Terminal/iTerm2

Quick Start

# Open Mic — always listening, speak naturally
./run.sh

# Target a specific app
./run.sh --target iTerm2

# Dry run — transcribe only, no injection
./run.sh --mode dry-run

# Better accuracy (slower)
./run.sh --model small.en

# List available microphones
./run.sh --list-devices

Modes

Open Mic (default)

Always listening. Speak naturally — 2 seconds of silence triggers transcription and injection. Designed for continuous conversation with Claude Code.

Background noise filtering uses dual-layer detection:

  • WebRTC VAD at maximum aggressiveness filters non-speech
  • Energy thresholding ignores quiet ambient noise
  • Whisper hallucination filter catches phantom transcriptions ("Thank you for watching", etc.)

Push (coming in v0.2)

Hold a hotkey to record, release to transcribe. For noisy environments or when you want explicit control.

Full Duplex Setup

You  ──mic──>  autotalk  ──text──>  Claude Code  ──text──>  voxtral-mcp  ──audio──>  You
     (speak)   (STT)      (paste)    (thinks)     (speak)    (TTS)         (listen)

Terminal A — start autotalk:

cd autotalk && ./run.sh

Terminal B — start Claude Code:

claude
  1. You speak into your mic
  2. autotalk transcribes and pastes into Claude Code
  3. Claude Code processes your request
  4. Claude Code uses voxtral-mcp's speak tool to read its response aloud
  5. You hear Claude's response through your speakers

Install voxtral-mcp for the TTS half.

CLI Reference

Flag Default Description
--mode paste Delivery method: paste (clipboard), keystroke (type), dry-run (print only)
--device system default Audio input device index (see --list-devices)
--model base.en Whisper model: tiny.en, base.en, small.en, medium.en
--target frontmost app Target app for injection (e.g., Terminal, iTerm2)
--vad 3 VAD aggressiveness: 0 (least) to 3 (most)
--list-devices List available audio devices and exit
--version Show version and exit

How It Works

  1. Mic capturesounddevice (PortAudio) captures 16kHz mono audio in 30ms frames
  2. Voice Activity Detectionwebrtcvad (Google WebRTC) classifies each frame as speech/silence. A ring buffer triggers recording when 80% of recent frames contain speech
  3. Energy gating — Per-frame RMS energy check prevents background noise from resetting the silence timer. Overall segment energy check skips quiet captures before they reach Whisper
  4. Speech-to-textfaster-whisper (CTranslate2) runs Whisper locally on CPU with int8 quantization. No cloud API, no API key
  5. Hallucination filter — Common Whisper phantom outputs ("You", "Thank you", "Thanks for watching") are caught and discarded
  6. Terminal injection — AppleScript pastes transcribed text into the target terminal via clipboard (or keystroke simulation)

Compared To

Feature autotalk hns whis speech2type
Full duplex (input + output) Yes (with voxtral-mcp) No No No
Local STT Yes (Whisper) Yes (Whisper) Optional No
Always-on mode Yes No No No
Terminal injection Paste + keystroke Clipboard Clipboard Clipboard
Claude Code integration Native No No No
Platform macOS macOS/Linux macOS/Linux/Win macOS

Requirements

  • macOS (AppleScript dependency — Linux support planned)
  • Python 3.11+
  • Microphone access (grant in System Settings > Privacy > Microphone)
  • Accessibility permission (grant in System Settings > Privacy > Accessibility)

License

MIT

About

Hands-free voice interface for Claude Code. Talk to your terminal — it hears you.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors