Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversational Agent Capability #36

Merged
merged 3 commits into from
Feb 1, 2025
Merged

Conversation

Am64r
Copy link
Collaborator

@Am64r Am64r commented Feb 1, 2025

Task Description:

Build a latency-optimized conversational AI Agent that can interact with all current tools and accurately discuss podcast topics.

Proposal:

  1. Voice System Architecture

    • Integrated ElevenLabs API for voice streaming and Google Cloud Speech-to-Text API for user input transcription
    • Developed VoiceIO class with async/await pattern for I/O operations
    • Implemented ThreadPoolExecutor-based ResponseQueue with parallel audio generation
  2. Voice Processing Pipeline

    • Built generatevoice.py utilizing ElevenLabs API for voice cloning with ~60s of audio input
    • Implemented dual-LLM prompting system (Claude Haiku/Sonnet) for latency optimization
    • Added concurrent audio stream buffering with asyncio tasks
  3. Podcast Vector Database Integration

    • Implemented Milvus with sentence-transformers embeddings
    • Optimized vector search with index and cosine similarity
    • Added async batch processing for transcript ingestion

Test Plan:

Cloned a user's voice using generatevoice.py, passed the elevenlabs voice id to the new chatbot.py function run_voice_mode, and was able to successfully converse with the agent about podcast topics as well asking the agent to rent a gpu.

Current response latency is ~2-3 seconds (will continue to optimize in coming PRs)

@Kaihuang724 Kaihuang724 merged commit d3be4d0 into master Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants