See how the Gradio app work. Watch the demo video.
A language conversation assistant is a tool that allows users to interact with a language model. Users can not only chat with the assistant but also record their voice, which will be transcribed and processed by the assistant to generate responses. Moreover, the program leverages a text-to-speech service to audibly read out the responses.
- Interaction with OpenAI's GPT language models.
- Real-time voice recording and transcription.
- Text-to-speech functionality to audibly present responses.
- Gradio interface for ease of use
Before you begin, ensure you have the following installed:
-
ffmpeg
: This tool is crucial for audio processing.-
Windows: You can download it from the official FFmpeg site or use a package manager like Chocolatey.
choco install ffmpeg
-
macOS: Using Homebrew:
brew install ffmpeg
-
Linux: Depending on your distribution, you can use
apt
,yum
, or another package manager:sudo apt update sudo apt install ffmpeg
-
Ensure ffmpeg
is correctly installed by running:
ffmpeg -version
- Python 3.9 or higher.
- Python Libraries:
numpy
gradio
openai
requests
pydub
- Clone this repository.
- Install the required packages with
pip install -r requirements.txt
- Place your OpenAI API key in a file named
openaiapikey.txt
. - Place your Elevenlabs API key (for the text-to-speech feature) in a file named
elabapikey.txt
. - (Optional) Modify
chatbot_{language}.txt
files if you wish to customize the initial behavior of the chatbot.
- Run the script with
python main.py
. - Once executed, the Gradio interface will be accessible via your local browser at: http://127.0.0.1:7860/
- A Gradio interface will pop up, presenting a user-friendly way to interact with the chatbot.
- Choose your preferred language for the conversation.
- Select a response voice from the available options.
- Start your interaction by pressing the "Move to Chat button"
- Record your voice by using the "Record from microphone" button and send it to the chatbot with "Send Audio"
- The chatbot will respond both in text and audibly using the selected voice type.
- To end the session, simply close the Gradio interface.
If you wish to contribute to this project, please fork the repository and submit a Pull Request.
This project is under the MIT License. Please refer to the LICENSE file for more details.