Conversational Agent Capability #36

Am64r · 2025-02-01T21:50:09Z

Task Description:

Build a latency-optimized conversational AI Agent that can interact with all current tools and accurately discuss podcast topics.

Proposal:

Voice System Architecture
- Integrated ElevenLabs API for voice streaming and Google Cloud Speech-to-Text API for user input transcription
- Developed VoiceIO class with async/await pattern for I/O operations
- Implemented ThreadPoolExecutor-based ResponseQueue with parallel audio generation
Voice Processing Pipeline
- Built generatevoice.py utilizing ElevenLabs API for voice cloning with ~60s of audio input
- Implemented dual-LLM prompting system (Claude Haiku/Sonnet) for latency optimization
- Added concurrent audio stream buffering with asyncio tasks
Podcast Vector Database Integration
- Implemented Milvus with sentence-transformers embeddings
- Optimized vector search with index and cosine similarity
- Added async batch processing for transcript ingestion

Test Plan:

Cloned a user's voice using generatevoice.py, passed the elevenlabs voice id to the new chatbot.py function run_voice_mode, and was able to successfully converse with the agent about podcast topics as well asking the agent to rent a gpu.

Current response latency is ~2-3 seconds (will continue to optimize in coming PRs)

Am64r added 3 commits January 31, 2025 17:13

refinements

d10db07

plain use voice file

e1c02d3

code cleanups

ccd4b79

Kaihuang724 approved these changes Feb 1, 2025

View reviewed changes

Kaihuang724 merged commit d3be4d0 into master Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversational Agent Capability #36

Conversational Agent Capability #36

Am64r commented Feb 1, 2025