- Integration of Google's local Gemma3n models via LMStudio & LiteLLM
- Optimised Gemma3n model by community-tuned MLX version for Mac
- Crisis response loop
A sophisticated real-time AI commentary system that provides live sports-style audio commentary on multi-agent AI workflows using Google's Agent Development Kit (ADK) and Gemini Live API.
Ever wondered what your AI agents are actually doing? Yeah, me too. Turns out watching AI systems work is like trying to follow a chess match through a telescope - technically impressive, but you have no idea what's happening or why.
So I built an AI commentator that watches other AI agents and explains what they're doing in real-time. Like having a sports announcer for your code, except instead of "He shoots, he scores!" it's more "The search agent is querying the database... and it's found something interesting!"
Picture this: You've got AI agents running around doing important stuff, but you're sitting there like a parent watching their kid's soccer game through thick fog. You know something is happening, but good luck explaining it to anyone.
This system gives your AI agents their own play-by-play commentator. It watches what they're doing and translates the technical gibberish into something humans can actually understand. And it does it in real-time with actual audio commentary.
What's more is that with the integration of local Gemma3n models, agents can communicate privately, on device, where it matters, while commentary is communicated globally.
Is it necessary? Probably not. Is it weirdly entertaining? Absolutely.
- Gemini Live Integration: Utilizes Google's Gemini Live API for low-latency, high-quality audio generation
- Smooth Audio Playback: Advanced buffering system using PyAudio for uninterrupted audio streaming
- Text Transcription: Simultaneous text output alongside audio for accessibility and debugging
- Agent Orchestration: Supervisor coordinates multiple specialized agents (Searcher, Summarizer)
- Event-Driven Design: Real-time capture of agent activities via ADK callbacks
- Parallel Execution: Commentator runs alongside main workflow without interference
- Contextual Awareness: Commentary adapts based on agent activities and workflow progression
- Memory System: Avoids repetitive commentary through session state and history tracking
- Dynamic Styles: Rotates between different commentary personas (sports announcer, technical analyst, investigative reporter, etc.)
- Pattern Recognition: Identifies and comments on agent behavior patterns and efficiency
- Asynchronous Processing: Non-blocking event handling with proper timeout management
- Resource Management: Automatic cleanup of audio resources and graceful termination
- Error Handling: Robust fallback systems and comprehensive error management
- Modular Design: Clean separation of concerns following ADK best practices
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Main Runner │ │ Supervisor │ │ Live │
│ │ │ Agent │ │ Commentator │
│ │ │ │ │ │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │
│ │ Parallel │ │ │ │Sequential │ │ │ │ Event │ │
│ │ Agent │ │───▶│ │ Agent │ │ │ │ Monitor │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Tool Events │ │
│ │ Callbacks │ │
│ │ │ │
│ │ ┌─────────────┐ │ │
│ │ │tool_1 │ │ │
│ │ │tool 2 │ │ │
│ │ └─────────────┘ │ │
│ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
└─────────────▶│ Commentator │◀─────────────┘
│ Queue │
│ │
│ ┌─────────────┐ │
│ │asyncio.Queue│ │
│ └─────────────┘ │
└─────────────────┘
│
▼
┌─────────────────┐
│ Gemini Live │
│ Commentary │
│ Generation │
│ │
│ ┌─────────────┐ │
│ │Audio Stream │ │
│ │Transcription│ │
│ └─────────────┘ │
└─────────────────┘
│
▼
┌─────────────────┐
│ Audio │
│ Playback │
│ │
│ ┌─────────────┐ │
│ │ PyAudio │ │
│ │ Buffering │ │
│ └─────────────┘ │
└─────────────────┘
Basically, your agents do stuff, the event system catches it, the commentator translates it into human-speak, and you get to listen to AI agents being explained by another AI agent. It's AI all the way down.
- Python 3.13+
- Google API Key (for Gemini Live)
- Audio output device (speakers/headphones)
- LMStudio
# Clone the repository
https://github.com/datawranglerai/talk-data-to-me.git
cd talk-data-to-me
# Install dependencies
uv init
uv sync
- Download LMStudio
- Download the Gemma3n model appropriate for your setup (on a MacBook Air M2 with 8GB RAM, I found the
gemma-3n-e2b-it-mlx
worked really well, as it is 4-bit quantized and optimised for Apple's Silicon architecture with MLX) - Load the model
- Start the API server
- Integrate the model with your agents, like so
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
# Open LMStudio > Load quantized Gemma3n MLX optimised model > start server
# Can use `curl -X GET http://localhost:1234/v1/models` if not sure of model ID
local_model = LiteLlm(
model="openai/gemma-3n-e2b-it-mlx", # lmstudio-community/gemma-3n-E2B-it-MLX-4bit optimised for Mac M2
api_base="http://localhost:1234/v1", # usually runs on http://localhost:1234 by default
api_key="not-needed" # doesn't require real API key
)
root_agent = LlmAgent(
name="Local_Gemma3n_Search_Agent",
model=local_model,
instruction="Say hello and ask how the user is but brag about how you keep everything private"
)
# Set your Google API key
export GOOGLE_API_KEY="your-google-api-key-here"
Or import the */.env
files as necessary by making a copy of the */.env.example
files and adding your own credentials.
# Fire it up
python demo.py
Now you'll hear an AI commentator explaining what the crisis response AI agents are doing. Welcome to the future, I guess.
The system hooks into ADK's callback mechanism to catch agent activities:
def broadcast_tool_event(
tool: BaseTool,
args: Dict[str, Any],
tool_context: ToolContext
) -> Optional[Dict]:
"""Capture tool calls and send to commentator."""
event_data = {
"agent": tool_context.agent_name,
"tool": tool.name,
"args": args,
"timestamp": "now"
}
commentator_queue.put_nowait(event_data)
return None
# Attach to agents
searcher = LlmAgent(
name="Searcher",
before_tool_callback=broadcast_tool_event,
# ... other config
)
Events flow from agents to commentator via asyncio queue (because threading is for people who like debugging race conditions):
# Global queue for cross-agent communication
commentator_queue = Queue()
# Commentator consumes events
async for event in commentator_queue.get():
await generate_commentary(event)
The system rotates between different personas to keep things interesting:
def _get_commentary_style(self) -> str:
styles = [
"sports announcer with high energy and play-by-play details",
"technical analyst focusing on efficiency and patterns",
"strategic commentator analyzing decision-making",
"investigative reporter uncovering the story behind the actions",
"data scientist explaining the technical implications"
]
return styles[self._event_count % len(styles)]
The commentator remembers what it said before (unlike most AI systems):
class LiveCommentator(BaseAgent):
_commentary_history: Deque[str] = PrivateAttr(default_factory=lambda: deque(maxlen=10))
def _generate_commentary_prompt(self, narration: str) -> str:
recent_commentary = list(self._commentary_history)[-3:]
prompt = f"""Previous Commentary (avoid repeating):
{chr(10).join(recent_commentary) if recent_commentary else "None"}
Current Activities: {narration}
Provide fresh, varied commentary..."""
return prompt
Smooth audio playback through callback-based audio player:
class CallbackAudioPlayer:
def _audio_callback(self, in_data, frame_count, time_info, status):
"""Continuous audio playback callback."""
try:
data = self.audio_queue.get_nowait()
return (data, pyaudio.paContinue)
except queue.Empty:
silence = b'\x00' * (frame_count * 2)
return (silence, pyaudio.paContinue)
Want different personalities? Edit agents/commentator.py
:
def _get_commentary_style(self) -> str:
styles = [
"sarcastic developer who's seen too many standup meetings",
"overly enthusiastic startup founder",
"tired sys admin who just wants to go home",
# ... add whatever personality disorders you prefer
]
return styles[self._event_count % len(styles)]
Tweak audio parameters in utils/audio_player.py
:
self.stream = self.p.open(
format=pyaudio.paInt16,
channels=1,
rate=24000, # Gemini Live sample rate
output=True,
frames_per_buffer=1024, # Smaller = lower latency, higher CPU usage
stream_callback=self._audio_callback
)
Mix and match AI models because why not:
# Use different models for different jobs
supervisor = SequentialAgent(
sub_agents=[
LlmAgent(model="gemini-2.0-flash-live-001"), # Fast for tools
LlmAgent(model=LiteLlm(model="openai/gpt-4o")), # Powerful for analysis
]
)
commentator = LiveCommentator(
model="gemini-2.0-flash-live-001" # Optimized for real-time
)
Hook this into your existing ADK workflows:
# Your existing workflow
your_workflow = SequentialAgent(
name="YourWorkflow",
sub_agents=[
# ... your agents with callbacks
]
)
# Add commentator to the mix
main_system = ParallelAgent(
name="MainSystem",
sub_agents=[
your_workflow,
LiveCommentator(name="Commentator")
]
)
Only comment on the interesting stuff:
def broadcast_tool_event(tool, args, tool_context):
# Only comment on certain tools
if tool.name in ['important_tool', 'critical_operation']:
event_data = {
"agent": tool_context.agent_name,
"tool": tool.name,
"args": args,
"priority": "high"
}
commentator_queue.put_nowait(event_data)
Create domain-specific commentary:
def _generate_commentary_prompt(self, narration: str) -> str:
return f"""You are an expert {self.domain} commentator.
Current system activities: {narration}
Focus on:
- {self.focus_area_1}
- {self.focus_area_2}
- Why this matters
Keep it under 30 words and make it interesting!"""
demo.py
: Entry point and main orchestrationcommentator_agent/supervisor.py
: Main workflow coordinator with callbackscommentator_agent/commentator.py
: Live commentary generation and audio streamingcrisis_response_agent/agent.py
: Main supervisory agent coordinator for the crisis response teamcrisis_response_agent/sub_agents.py
: Individual sub-agents for the crisis response teamcrisis_response_agent/tools.py
: Tools for generating random crisis situations and signalsutils/audio_player.py
: Audio buffering and playback managementtools/
: Tools for use across all agentic systems
- Memory Usage: Commentary history is bounded (adjustable)
- Audio Latency: ~200-500ms from event to audio (not bad for real-time AI)
- CPU Usage: Moderate due to audio processing
- Network: Depends on how chatty Gemini Live gets
- Gemma3n: Largely depends on available RAM and paramters of local model
- Multiple commentary tracks for different audiences
- Audio effects and background music (because why not make it even more extra)
- Voice cloning for personalized commentators
- Performance metrics dashboard
- Commentary quality analysis
- Agent efficiency reporting (so you can judge your AI agents)
- Web dashboard with live visualization
- Slack/Discord bot integration
- REST API for remote commentary triggering
- Commentator debates (let AI argue about AI)
- Specialized domain commentators
- Interactive Q&A with commentators
Want to make this better? Here's how:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-amazing-idea
- Make your changes (and try not to break everything)
- Add tests (yes, really)
- Update docs as needed
- Submit a pull request
- Follow PEP 8 (it's not optional)
- Use type hints throughout
- Write docstrings that humans can understand
- Add tests for new features
- Follow ADK best practices
MIT License - see the LICENSE file for details. Do whatever you want with this code, just don't blame me if it achieves sentience.
- Google ADK Team for building the framework that makes this possible
- Google AI for the Gemini Live API
- OpenAI & Anthropic for additional LLM support via LiteLLM
- ADK Community for patterns and best practices
- Everyone who's ever wished AI would just explain itself - this one's for you
- Google ADK Documentation
- Gemini Live API Guide
- Multi-Agent Systems with ADK
- ADK Custom Agents Tutorial
Built with Google's Agent Development Kit (and a healthy dose of curiosity about what AI agents actually do all day)
Making AI workflows less mysterious, one commentary at a time.