Heavily based on this excellent repo. Check out their blog: https://techcommunity.microsoft.com/blog/machinelearningblog/automating-real-time-multi-modal-customer-service-with-ai/4354892
Watch this video for a demonstration of this pattern's core feature, using intent detection to seamlessly transition between domain specific agents, in addition to the agent specific features for knowledge retrieval and source system interaction.
voice_agent_demo.mp4
We'll follow 4 steps to get this example running in your own environment: pre-requisites, creating an index, setting up the environment, and running the app.
You'll need instances of the following Azure services. You can re-use service instances you have already or create new ones.
- Azure OpenAI, with 2 model deployments, one of the gpt-4o-realtime-preview model, a regular gpt-4o-mini model.
- [Optional] Train an intent_detection model with a SLM using Azure AI Studio. Check the training data
The app needs to know which service endpoints to use for the Azure OpenAI and Azure AI Search. The following variables can be set as environment variables, or you can create a ".env" file in the "app/backend/" directory with this content.
The voice agent can use a fine-tuned SLM deployment to classify intent to minimize latency. If you do not have this deployment available then you can use the Azure OpenAI gpt-4o-mini deployment, which is fast enough to classify intent with minimal impact on latency. To use gpt-4o-mini leave the INTENT_SHIFT_API_* env variables empty and supply AZURE_OPENAI_4O_MINI_DEPLOYMENT.
AZURE_OPENAI_ENDPOINT="https://.openai.azure.com/"
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-4o-mini
AZURE_OPENAI_EMB_DEPLOYMENT="text-embedding-ada-002"
AZURE_OPENAI_EMB_ENDPOINT= [Optional] if different from your realtime endpoint
AZURE_OPENAI_EMB_API_KEY= [Optional] if providing an embedding endpoint
AZURE_OPENAI_4O_MINI_DEPLOYMENT=YOUR_AZURE_OPENAI_4O_MINI_DEPLOYMENT_NAME
INTENT_SHIFT_API_KEY=
INTENT_SHIFT_API_URL=https://YOUR_ML_DEPLOYMENT.westus2.inference.ml.azure.com/score
INTENT_SHIFT_API_DEPLOYMENT=YOUR_ML_DEPLOYMENT_NAME
AZURE_OPENAI_API_VERSION=2024-10-01-preview
AZURE_OPENAI_REALTIME_DEPLOYMENT_NAME=gpt-4o-realtime-preview
You can run this repo virtually by using GitHub Codespaces, which will open a web-based VS Code in your browser:
Once the codespace opens (this may take several minutes), open a new terminal.
You can run the project in your local VS Code Dev Container using the Dev Containers extension:
-
Start Docker Desktop (install it if not already installed)
-
Open the project:
-
In the VS Code window that opens, once the project files show up (this may take several minutes), open a new terminal.
-
Install the required tools:
- Node.js
- Python >=3.11
- Important: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
- Important: Ensure you can run
python --versionfrom console. On Ubuntu, you might need to runsudo apt install python-is-python3to linkpythontopython3.
- Powershell
-
Clone the repo (
git clone https://github.com/microsoft/multi-modal-customer-service-agent) -
Ensure env variables are set per Setting up the environment
-
Run this command to start the app:
Windows:
cd voice_agent\app pwsh .\start.ps1
Linux/Mac:
cd voice_agent/app ./start.sh -
The app is available on http://localhost:8765
