Azure Real-Time (ART) Agent Accelerator

📖 Documentation · 🚀 Quick Start · 🏗️ Architecture · 🎨 Community

TL;DR: Build real-time, multimodal and omnichannel agents on Azure in minutes, not months. Our approach is code-first, modular, ops-friendly & extensible.

You own the agentic design; this repo handles the end-to-end voice plumbing. We keep a clean separation of concerns—telephony (ACS), app middleware, AI inference loop (STT → LLM → TTS), and orchestration—so you can swap parts without starting from zero. Shipping voice agents is more than "voice-to-voice." You need predictable latency budgets, media handoffs, error paths, channel fan-out, barge-in, noise cancellation, and more. This framework gives you the e2e working spine so you can focus on what differentiates you—your tools, agentic design, and orchestration logic (multi-agent ready).

See it in Action

📺 Full Overview 🎬 Demo Walkthrough

💡 What you get

What you get

Omnichannel, including first-class telephony. Azure Communication Services (ACS) integration for PSTN, SIP transfer, IVR/DTMF routing, and number provisioning—extendable for contact centers and custom IVR trees.
Transport that scales. FastAPI + WebSockets for true bidirectional streaming; runs locally and scales out in Kubernetes. Leverages ACS bidirectional media streaming for low-latency ingest/playback (barge-in ready), with helper classes to wire your UI WebSocket client or loop back into ACS— the plumbing is done for you.
Model freedom. Use GPT-family or your provider of choice behind a slim adapter; swap models without touching the transport.
Clear seams for customization. Replace code, switch STT/TTS providers, add tool routers, or inject domain policies—without tearing down the whole app.

Choose your voice inference pipeline (voice‑to‑voice):

Build from scratch (maximum control). Use our AI inference layer and patterns to wire STT → LLM → TTS with your preferred Azure services and assessments. Own the event loop, intercept any step, and tailor latency/quality trade-offs for your use case. Ideal for on‑prem/hybrid, strict compliance, or deep customization.
Managed path (ship fast, enterprise‑ready). Leverage the latest addition to the Azure AI family—Azure Voice Live API (preview)—for voice-to-voice media, and connect to Azure AI Foundry Agents for built-in tool/function calling. Keep your hooks; let Azure AI Foundry handle the media layer, scaling, noise suppression, and barge-in.
Bring your own voice‑to‑voice model. Drop in your model behind(e.g., latest gpt‑realtime or equivalent). Transport/orchestration (including ACS telephony) stays the same—no app changes.

The question of the century: Is it production-ready?

“Production” means different things, but our intent is clear: this is an accelerator—it gets you ~80% of the way with battle-tested plumbing. You bring the last mile: hardening, infrastructure policies, security posture, SRE/DevOps, and your enterprise release process.

We ship the scaffolding to make that last mile fast: structured logging, metrics/tracing hooks, and a load-testing harness so you can profile end-to-end latency and concurrency, then tune or harden as needed to reach your target volume.

The How (Architecture)

Two orchestration modes—same agent framework, different audio paths:

Mode	Path	Latency	Best For
SpeechCascade	Azure Speech STT → LLM → TTS	~400ms	Custom VAD, phrase lists, Azure voices
VoiceLive	Azure VoiceLive SDK (gpt-4o-realtime)	~200ms	Fastest setup, lowest latency

# Select mode via environment variable
export ACS_STREAMING_MODE=MEDIA       # SpeechCascade (default)
export ACS_STREAMING_MODE=VOICE_LIVE  # VoiceLive

🔧 SpeechCascade — Full Control

You own each step: STT → LLM → TTS with granular hooks.

Feature	Description
Custom VAD	Control silence detection, barge-in thresholds
Azure Speech Voices	Full neural TTS catalog, styles, prosody
Phrase Lists	Boost domain-specific recognition
Sentence Streaming	Natural pacing with per-sentence TTS

Best for: On-prem/hybrid, compliance requirements, deep customization.

📖 Cascade Orchestrator Docs

⚡ VoiceLive — Ship Fast

[!NOTE] Uses Azure VoiceLive SDK with gpt-realtime in the backend.

Managed voice-to-voice: Azure-hosted GPT-4o Realtime handles audio in one hop.

Feature	Description
~200ms latency	Direct audio streaming, no separate STT/TTS
Server-side VAD	Automatic turn detection, noise reduction
Native tools	Built-in function calling via Realtime API
Azure Neural Voices	HD voices like `en-US-Ava:DragonHDLatestNeural`

Best for: Speed to production, lowest latency requirements.

📖 VoiceLive Orchestrator Docs · VoiceLive SDK Samples

Getting Started

📋 Prerequisites

Requirement	Quick Check
Azure CLI	`az --version`
Azure Developer CLI	`azd version`
Docker	`docker --version`
Azure Subscription	`az account show`
Contributor Access	Required for resource creation

⚡ Fastest Path (15 minutes)

# 1. Clone the repository
git clone https://github.com/Azure-Samples/art-voice-agent-accelerator.git
cd art-voice-agent-accelerator

# 2. Login to Azure
azd auth login

# 3. Deploy everything
azd up   # ~15 min for complete infra and code deployment

Note

If you encounter any issues, please refer to TROUBLESHOOTING.md

Done! Your voice agent is running. Open the frontend URL shown in the output.

🗺️ Repository Structure

📁 apps/artagent/              # Main application
  ├── 🔧 backend/             # FastAPI + WebSockets voice pipeline
  │   ├── registries/         # Agent & scenario definitions
  │   │   ├── agentstore/     # YAML agent configs + Jinja2 prompts
  │   │   ├── scenariostore/  # Multi-agent orchestration flows
  │   │   └── toolstore/      # Pluggable business tools
  │   └── voice/              # Orchestrators (SpeechCascade, VoiceLive)
  └── 🌐 frontend/            # Vite + React demo client
📁 src/                       # Core libraries (ACS, Speech, AOAI, Redis, Cosmos, VAD)
📁 samples/                   # Tutorials (hello_world, voice_live_sdk, labs)
📁 infra/                     # Infrastructure as Code (Terraform + Bicep)
📁 docs/                      # Guides and references
📁 tests/                     # Pytest suite and load testing
📁 utils/                     # Logging/telemetry helpers

📚 Documentation Guides

Start here: Getting started
Deploy in ~15 minutes: Quick start
Run locally: Local development
Setup: Prerequisites
Try the UI: Demo guide
Production guidance: Deployment guide
Understand the system: Architecture
IaC details (repo): infra/README.md

Community & ARTist Certification

ARTist = Artist + ART (Azure Real-Time Voice Agent Framework)

Join the community of practitioners building real-time voice AI agents! The ARTist Certification Program recognizes builders at three levels:

Level 1: Apprentice — Run the UI, demonstrate the framework, and understand the architecture
Level 2: Creator — Build custom agents with YAML config and tool integrations
Level 3: Maestro — Lead production deployments, optimize performance, and mentor others

Earn your badge, join the Hall of Fame, and connect with fellow ARTists!

👉 Learn about ARTist Certification →

Contributing

PRs & issues welcome—see CONTRIBUTING.md before pushing.

License & Disclaimer

Released under MIT. This sample is not an official Microsoft product—validate compliance (HIPAA, PCI, GDPR, etc.) before production use.

Important

This software is provided for demonstration purposes only. It is not intended to be relied upon for any production workload. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability of the software or related content. Any reliance placed on such information is strictly at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 605 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
apps		apps
config		config
devops		devops
docs		docs
infra		infra
samples		samples
src		src
tests		tests
utils		utils
.dockerignore		.dockerignore
.env.sample		.env.sample
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
azure.yaml		azure.yaml
environment.yaml		environment.yaml
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Azure Real-Time (ART) Agent Accelerator

See it in Action

What you get

Choose your voice inference pipeline (voice‑to‑voice):

The How (Architecture)

Getting Started

📋 Prerequisites

⚡ Fastest Path (15 minutes)

🗺️ Repository Structure

📚 Documentation Guides

Community & ARTist Certification

Contributing

License & Disclaimer

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

Azure-Samples/art-voice-agent-accelerator

Folders and files

Latest commit

History

Repository files navigation

Azure Real-Time (ART) Agent Accelerator

See it in Action

What you get

Choose your voice inference pipeline (voice‑to‑voice):

The How (Architecture)

Getting Started

📋 Prerequisites

⚡ Fastest Path (15 minutes)

🗺️ Repository Structure

📚 Documentation Guides

Community & ARTist Certification

Contributing

License & Disclaimer

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages