Demo Link --> https://youtu.be/hl1az0yU5tc
AI-powered accounts receivable automation with natural voice interaction
A voice-controlled agent for managing overdue invoices, built with production-grade error handling, input validation, and enterprise-ready architecture patterns.
This project demonstrates an end-to-end implementation of a conversational AI agent that:
- Accepts voice commands through natural speech
- Understands user intent using LLM-powered classification
- Queries invoice data and generates personalized payment reminders
- Responds with voice output optimized for clarity (spelling out currency, emails, dates)
Current Status: Fully functional proof-of-concept with mock data layer, ready for production integration.
- Speech-to-Text: OpenAI Whisper running locally (no API costs)
- Text-to-Speech: Google TTS with automatic cleanup
- Audio Processing: 5-second recording with 16kHz sampling
- Resource Management: Automatic temp file cleanup and error recovery
- Intent Classification: Groq LLaMA 3.1 8B model for understanding commands
- Entity Extraction: Automatically identifies company names from natural language
- Email Generation: Creates polite, contextual payment reminders
- Conversation Memory: Maintains dialogue history for context-aware responses
- Input Validation: Protection against injection attacks and malformed data
- Error Handling: Graceful degradation with custom exception hierarchy
- Retry Logic: Exponential backoff for LLM API failures
- Logging: Structured logging with rotation and colorized console output
- Type Safety: Pydantic models for data validation
- Currency: "500.50" β "500 dollars and 50 cents"
- Email: "john@acme.com" β "john at acme dot com"
- Dates: "2024-02-16" β "February 16, 2024"
βββββββββββββββ
β User β
β Voice β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β VoiceHandler (voice_handler.py) β
β βββββββββββββββββββ βββββββββββββββββββ β
β β Whisper STT β β β gTTS TTS β β
β βββββββββββββββββββ βββββββββββββββββββ β
ββββββββ¬βββββββββββββββββββββββββββββββ²ββββββββββββ
β β
βΌ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β InvoiceAgent (agent.py) β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Intent Classification β β
β β β’ Entity Extraction β β
β β β’ Conversation Management β β
β β β’ Response Generation β β
β βββββββββββββββββββββββββββββββββββββββββββ β
ββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββββββ
β GroqProvider β β MockMCPClient β
β (LLaMA 3.1) β β (Data Layer) β
ββββββββββββββββ ββββββββββββββββββββ
Flow: Voice Input β Transcription β Intent Classification β Data Retrieval β Response Generation β Voice Output
- Python 3.8+ (tested on 3.10)
- ffmpeg - required for audio processing
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Windows - Download from https://ffmpeg.org/download.html- Microphone and speakers for voice interaction
# 1. Clone repository
git clone https://github.com/Hustple/Voice_Agent.git
cd Voice_Agent
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env and add your GROQ_API_KEY- Visit console.groq.com
- Sign up for a free account
- Navigate to API Keys section
- Create new API key
- Add to
.envfile:
GROQ_API_KEY=gsk_your_key_here
Voice Mode (recommended):
python src/main.pyText Mode (for testing without microphone):
python src/main_text.pyCheck Invoices:
- "Check overdue invoices"
- "Show me past due invoices"
- "What invoices are overdue?"
Send Reminders:
- "Send reminder to Acme Corp"
- "Email Beta Industries about their invoice"
- "Remind XYZ Company about payment"
Exit:
- "Exit" / "Quit" / "Goodbye"
π€ Listening...
π€ You: Check overdue invoices
π€ Agent: You have 2 overdue invoices, totaling 1100 dollars.
Acme Corp, 500 dollars, due February 6, 2024.
Beta Industries, 600 dollars, due February 1, 2024.
π€ Listening...
π€ You: Send reminder to Acme Corp
π€ Agent: Email sent to Acme Corp at john at acme dot com.
Voice_Agent/
βββ src/
β βββ agent.py # Core agent logic with invoice handling
β βββ voice_handler.py # Voice I/O with Whisper & gTTS
β βββ llm_provider.py # Groq API client with retry logic
β βββ mcp_client_mock.py # Mock data layer (Stripe/Gmail simulation)
β βββ main.py # Voice mode entry point
β βββ main_text.py # Text-only mode entry point
β βββ constants.py # Application constants
β βββ exceptions.py # Custom exception hierarchy
β β
β βββ prompts/
β β βββ system_prompts.py # LLM system prompts
β β βββ templates.py # Email templates
β β
β βββ utils/
β βββ config.py # Configuration management
β βββ validators.py # Input validation & sanitization
β βββ formatters.py # Voice-optimized formatting
β βββ logger.py # Structured logging setup
β
βββ scripts/ # Setup and deployment scripts
βββ requirements.txt # Python dependencies
βββ .env # Environment variables template
βββ README.md # This file
# Required
GROQ_API_KEY=gsk_xxxxx # Get from console.groq.com
# Optional
WHISPER_MODEL=base # Options: tiny, base, small, medium, large
LOG_LEVEL=INFO # Options: DEBUG, INFO, WARNING, ERROR
PFMCP_BASE_URL=http://localhost # For future MCP integration| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| tiny | 39 MB | Fastest | Good | Testing, low-resource |
| base | 74 MB | Fast | Better | Recommended default |
| small | 244 MB | Moderate | Great | Production quality |
| medium | 769 MB | Slow | Excellent | High accuracy requirement |
| large | 1.5 GB | Slowest | Best | Maximum quality |
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test file
pytest tests/test_agent.py -vThe project uses Groq with the LLaMA 3.1 8B model for:
- Intent Classification (50 tokens, temp=0.1) β classifies into
check_invoices,send_reminder,help,other - Entity Extraction (50 tokens, temp=0.0) β extracts company names from natural language
- Email Generation (400 tokens, temp=0.7) β creates contextual payment reminders
AgentException
βββ LLMError # LLM API failures
βββ MCPError # Data layer issues
βββ VoiceInputError # Audio processing problems
βββ ValidationError # Input sanitization failures
βββ ConfigurationError # Missing/invalid configEach layer catches specific exceptions and provides user-friendly error messages while logging technical details.
- Injection attacks: Regex patterns block
<script>,javascript:, etc. - Length validation: Max 500 chars for input, 100 for company names
- Character whitelist: Only alphanumeric + safe punctuation for company names
- Email safety: Content scanning before sending
Phase 1: Production Data Integration
- Real Stripe API integration
- Gmail OAuth implementation
- Database for conversation history
Phase 2: Enhanced Features
- Multi-language support
- Batch reminder sending
- Scheduled reminder campaigns
Phase 3: Deployment
- Docker containerization
- CI/CD pipeline
- Cloud deployment (AWS/GCP)
- GitHub: @Hustple
- Project: Voice_Agent