Azure AI Speech Application, an interactive application designed for Speech-to-Text (STT) and Text-to-Speech (TTS) functionalities using Azure Cognitive Services and Gradio. This application combines the power of Azure's APIs with an intuitive interface provided by Gradio, enabling efficient speech processing for developers and end-users.
- Speech-to-Text (STT): Convert audio inputs (e.g., microphone recordings) into accurate text transcriptions using Azure's Speech-to-Text API.
- Text-to-Speech (TTS): Generate natural-sounding audio from text input using Azure's Text-to-Speech API with support for customizable voices.
- Interactive Gradio Interface: A user-friendly interface for real-time testing and interaction.
- Demo Examples: Includes example outputs (
demo.png
,demo.mp4
) to showcase the platform's capabilities.
project/
├── app.py # Main application script
├── requirements.txt # List of dependencies
├── README.md # Project documentation
├── .env # Environment variables (not shared in version control)
├── demo/ # Demo files showcasing platform capabilities
│ ├── demo.png # Screenshot of the interface
│ ├── demo.mp4 # Video demonstration
└── utils/ # Utility functions for Azure APIs
└── azure_speech.py # Helper functions for STT and TTS
Clone this repository to your local system:
git clone <repository-url>
cd project
Install the required Python libraries using the requirements.txt
file:
pip install -r requirements.txt
Create a .env
file in the root directory with the following content:
AZURE_SPEECH_REGION=<your-region>
AZURE_SPEECH_KEY=<your-api-key>
Start the application with:
python app.py
- Speech-to-Text (STT):
- Record or upload an audio file using the microphone input.
- View the transcribed text in real-time.
- Text-to-Speech (TTS):
- Enter the text you want to convert to speech.
- Download or play the generated audio file directly from the interface.
This project is licensed under the MIT License.
- Powered by Azure Cognitive Services.
- Interface built with Gradio.