Voice-Agent: An Application Pattern for multi-domain agents and the GPT-4o Realtime API for Audio

Heavily based on this excellent repo. Check out their blog: https://techcommunity.microsoft.com/blog/machinelearningblog/automating-real-time-multi-modal-customer-service-with-ai/4354892

Voice-Agent: An Application Pattern for multi-domain agents and the GPT-4o Realtime API for Audio

Demo

Watch this video for a demonstration of this pattern's core feature, using intent detection to seamlessly transition between domain specific agents, in addition to the agent specific features for knowledge retrieval and source system interaction.

voice_agent_demo.mp4

Running this sample

We'll follow 4 steps to get this example running in your own environment: pre-requisites, creating an index, setting up the environment, and running the app.

1. Pre-requisites

You'll need instances of the following Azure services. You can re-use service instances you have already or create new ones.

Azure OpenAI, with 2 model deployments, one of the gpt-4o-realtime-preview model, a regular gpt-4o-mini model.
[Optional] Train an intent_detection model with a SLM using Azure AI Studio. Check the training data

2. Setting up the environment

The app needs to know which service endpoints to use for the Azure OpenAI and Azure AI Search. The following variables can be set as environment variables, or you can create a ".env" file in the "app/backend/" directory with this content.

The voice agent can use a fine-tuned SLM deployment to classify intent to minimize latency. If you do not have this deployment available then you can use the Azure OpenAI gpt-4o-mini deployment, which is fast enough to classify intent with minimal impact on latency. To use gpt-4o-mini leave the INTENT_SHIFT_API_* env variables empty and supply AZURE_OPENAI_4O_MINI_DEPLOYMENT.

AZURE_OPENAI_ENDPOINT="https://.openai.azure.com/"
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-4o-mini
AZURE_OPENAI_EMB_DEPLOYMENT="text-embedding-ada-002"
AZURE_OPENAI_EMB_ENDPOINT= [Optional] if different from your realtime endpoint
AZURE_OPENAI_EMB_API_KEY= [Optional] if providing an embedding endpoint
AZURE_OPENAI_4O_MINI_DEPLOYMENT=YOUR_AZURE_OPENAI_4O_MINI_DEPLOYMENT_NAME
INTENT_SHIFT_API_KEY=
INTENT_SHIFT_API_URL=https://YOUR_ML_DEPLOYMENT.westus2.inference.ml.azure.com/score
INTENT_SHIFT_API_DEPLOYMENT=YOUR_ML_DEPLOYMENT_NAME
AZURE_OPENAI_API_VERSION=2024-10-01-preview
AZURE_OPENAI_REALTIME_DEPLOYMENT_NAME=gpt-4o-realtime-preview

4. Running the app

GitHub Codespaces

You can run this repo virtually by using GitHub Codespaces, which will open a web-based VS Code in your browser:

Once the codespace opens (this may take several minutes), open a new terminal.

VS Code Dev Containers

You can run the project in your local VS Code Dev Container using the Dev Containers extension:

Start Docker Desktop (install it if not already installed)
Open the project:
In the VS Code window that opens, once the project files show up (this may take several minutes), open a new terminal.

Local environment

Install the required tools:
- Node.js
- Python >=3.11
  - Important: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
  - Important: Ensure you can run python --version from console. On Ubuntu, you might need to run sudo apt install python-is-python3 to link python to python3.
- Powershell
Clone the repo (git clone https://github.com/microsoft/multi-modal-customer-service-agent)
Ensure env variables are set per Setting up the environment

Run this command to start the app:

Windows:

cd voice_agent\app
pwsh .\start.ps1

Linux/Mac:

cd voice_agent/app
./start.sh

The app is available on http://localhost:8765

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
.gitignore		.gitignore
.venv_list.txt		.venv_list.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice-Agent: An Application Pattern for multi-domain agents and the GPT-4o Realtime API for Audio

Demo

Running this sample

1. Pre-requisites

2. Setting up the environment

4. Running the app

GitHub Codespaces

VS Code Dev Containers

Local environment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

annaquincy-msft/multi-agent-call-center-gpt4ort

Folders and files

Latest commit

History

Repository files navigation

Voice-Agent: An Application Pattern for multi-domain agents and the GPT-4o Realtime API for Audio

Demo

Running this sample

1. Pre-requisites

2. Setting up the environment

4. Running the app

GitHub Codespaces

VS Code Dev Containers

Local environment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages