JarvisChatBot - A Voice-Activated Chatbot using OpenAI and Raspberry Pi

Introduction

This project is a voice-activated Raspberry Pi based system that listens for a wake word, "hey jarvis" or "jarvis". Upon hearing the wake word, the system starts recording audio input until it detects silence. It then sends this audio input to SpeechRecognition (google_recognizer) for transcription. Then it sends that transcription to OpenAI for further processing. The response from OpenAI is then converted to speech using ElevenLabs, a text-to-speech service, and played back to the user.

There is also a button service that can be used to start/stop the JarvisChatBot service with a quick press. And if held will shutdown the pi.

Flask is used to create a web interface that can be used to interact with the chatbot using websockets. The web interface can be accessed by navigating to the IP address of the Raspberry Pi on port 5000 (e.g., http://localhost:5000). From there the chat log can be viewed and the chatbot can be sent text prompts along with images. Imgur is the default for storing the images and the link is sent to OpenAI for analysis.

This was a quick hodge-podge project that still has lots of room for improvement.

Project Structure

The project is divided into four main python files:

main.py: The main script that integrates all other modules, listens for the wake word, manages the audio recording, and handles the interaction with OpenAI and AWS Polly.
tts_service.py: Handles the text-to-speech conversion using ElevenLabs.
input_listener.py: Handles the audio recording and silence detection.
chat_gpt_service.py: Manages the interaction with OpenAI's GPT-3 model.
sound_effect_service.py: Handles playing local sound effects.
led_service.py: Handles controlling the LED lights on the ReSpeaker 2-Mics Pi HAT.
apa102.py: A library for controlling the LED lights on the ReSpeaker 2-Mics Pi HAT.
jarvis_v2.tflite and jarvis_v2.onnx: The wake word model(s) used by OpenWakeWord.

There is also a configuration file, config.json, which stores important parameters and keys.

Dependencies

This project also uses:

OpenWakeWord (OpenWakeWord's wake word engine): This is used to listen for the wake word.
SpeechRecognition (Google's legacy speech recognition engine)
OpenAI: This is used to respond to requests. You will need an API key from OpenAI to use this service.
ElevenLabs: This is used to convert the response from OpenAI into speech. You will need ElevenLabs credentials to use this service.
mpv: This is used to play the audio response from ElevenLabs.

Configuration

All the keys and important parameters are stored in the config.json file. This includes:

OpenAI API key (openai_key, openai_model, system_prompt): Used for interacting with OpenAI.
ElevenLabs credentials (elevenlabs_key, elevenlabs_voice_id): Used for text-to-speech conversion with ElevenLabs.
SpeechRecognition (language, dynamic_energy_threshold, timeout, phrase_time_limit): The language code for the speech recognition engine.
OpenWakeWord (oww_model_path, oww_inference_framework): The wake word model and inference framework to use. For more models see Home Assistant Wake-Word Collection

Running the Project

To run the project, execute the main.py script:

python main.py

Obtaining Required Keys

This project requires keys from imgur, OpenAI and ElevenLabs. Here is how to obtain them:

OpenAI Key: Sign up at openai.com and follow the instructions here to obtain your secret API key.
ElevenLabs Key: Sign up at ElevenLabs.
Imgur Key: Sign up at imgur.com. Then go to this page to create a new application and obtain your client ID.

After obtaining these keys, add them to your config.json file.

Tested Environment and Installation Instructions

This project has been tested on a RaspberryPi 3b+ and a windows 11 desktop. The following sections provide instructions on how to set up the project on these systems.

For raspberry pi, I used the ReSpeaker 2-Mics Pi HAT as the sound card. More information about this sound card can be found here.

For windows, I used a usb webcam mic and standard desktop speakers. (I had difficulty running the openwakeword engine using tflite on windows so onnx was used instead)

To convert tflite models to onnx, you can use the following command: tf2onnx

pip install tf2onnx
python -m tf2onnx.convert --opset 13 --tflite path/to/your/model.tflite --output path/to/your/model.onnx

Raspberry Pi Setup

Below are the commands to set up the project on your Raspberry Pi:

Install the ReSpeaker 2-Mics Pi HAT sound card

sudo apt update -y
sudo apt upgrade -y
sudo apt install portaudio19-dev libatlas-base-dev git python3-venv python3-pip ffmpeg flac espeak mpv build-essential libpython3-dev libdbus-1-dev libglib2.0-dev vlc -y
sudo apt remove wireplumber -y
KERNEL_VERSION=$(uname -r | cut -d'.' -f1,2)
git clone --branch v$KERNEL_VERSION https://github.com/HinTak/seeed-voicecard
cd seeed-voicecard
sudo ./install.sh
sudo reboot now

Clone the JarvisChatBot repository

git clone https://github.com/MrSco/JarvisChatBot.git
cd JarvisChatBot

Upgrade pip and install the required Python packages in a virtual environment

bash ./install.sh

Now, your Raspberry Pi is set up to run the project. Remember to add your API keys to the config.json file before running the main.py script.

Windows install requirements

For Windows

install.bat

Edit the config.json file

Copy the config.json.example file to config.json and edit the file to include your OpenAI and ElevenLabs keys. Also copy the assistants.json.example file to assistants.json.

cp config.json.example config.json
cp assistants.json.example assistants.json

nano config.json

Setting Up the Default Audio Output Device

In some instances, you may need to manually select the default audio output device. Here is how you can do it:

Open the Raspberry Pi configuration settings:

sudo raspi-config

Navigate through the menu options as follows:

1. System options  
S2. Audio

Select the desired audio output option. In our case, it was the seeed-2mic-voicecard option corresponding to the ReSpeaker sound card.

After you've selected the appropriate option, the system should use this device as the default for audio output.

Run alsamixer and set volume levels

alsamixer
mkdir -p ~/.config
alsactl -f ~/.config/asound.state store
sudo cp ~/.config/asound.state /var/lib/alsa/asound.state

(Optional) Adjust alsa-restore.service and add the -U parameter to the exestart restore command:

sudo nano /usr/lib/systemd/system/alsa-restore.service

ExecStart=-/usr/sbin/alsactl -E HOME=/run/alsa -E XDG_RUNTIME_DIR=/run/alsa/runtime -U restore

Using as an AirPlay speaker with ShairPort-Sync

If you want to use the ReSpeaker 2-Mics Pi HAT as an AirPlay speaker, you can use ShairPort-Sync. Here's how to set it up:

Install ShairPort-Sync

sudo apt install shairport-sync -y

Edit the ShairPort-Sync configuration file

sudo nano /etc/shairport-sync.conf

Change the port field to something other then 5000 (I use 5555)
Start the ShairPort-Sync service

sudo systemctl start shairport-sync

Running as a Service

Create a jarvischatbot Service File Use the included jarvischatbot.service file to create a service that will run the JarvisChatBot script on startup. Modify the file to include the correct path to the JarvisChatBot directory.

sudo cp jarvischatbot.service /etc/systemd/system/jarvischatbot.service
sudo nano /etc/systemd/system/jarvischatbot.service

Enable the Service

sudo systemctl enable jarvischatbot.service

Start the Service

sudo systemctl start jarvischatbot.service

Now, the JarvisChatBot script will run as a service on startup.

Running Button Service

Create a button Service File Use the included 2mic_button.service file to create a service that will run the button script on startup. The button will start/stop the JarvisChatBot service with a quick press. And if held will shutdown the pi. Modify the file to include the correct path to the JarvisChatBot directory.

sudo cp 2mic_button.service /etc/systemd/system/2mic_button.service
sudo nano /etc/systemd/system/2mic_button.service

Enable the Service

sudo systemctl enable 2mic_button.service

Start the Service

sudo systemctl start 2mic_button.service

Now, the button script will run as a service on startup.

Running the startup_shutdown_sounds Service

Create a startup_shutdown_sounds Service File Use the included startup_shutdown_sounds.service file to create a service that will play a startup sound and a shutdown sound on startup and shutdown. Modify the file to include the correct path to the JarvisChatBot directory.

sudo cp startup_shutdown_sounds.service /etc/systemd/system/startup_shutdown_sounds.service
sudo nano /etc/systemd/system/startup_shutdown_sounds.service

Enable the Service

sudo systemctl enable startup_shutdown_sounds.service

Start the Service

sudo systemctl start startup_shutdown_sounds.service

Now, the startup_shutdown_sounds script will run as a service on startup.

LED Lights

The ReSpeaker 2-Mics Pi HAT has LED lights that can be controlled. The led_service.py script can be used to control these lights. The script uses the apa102.py library to control the LEDs.

LED Color Legend:

Purple: Starting up
Green: Started (or reinitalizing wake word)
Red: Ready and Listening for wake word
Blue: No Internet Connection
White: Listening for user input
Cyan: Transcribing user input
Pink: Processing user input
Yellow: Responding to user input
Orange: Shutting down
Flashing Orange: Airplay Connected to ShairPort-Sync
Off: Not running

AirPlay Speaker support via ShairPort-Sync

The Raspberry Pi can be used as an AirPlay speaker using ShairPort-Sync. Whether the chatbot is running or not, you can stream audio to the Raspberry Pi using AirPlay. If running, the microphone will be muted and the LED lights will flash orange when connected to ShairPort-Sync. After disconnecting, the microphone will be unmuted and the LED lights will return to normal.

Alarms and Timers

The JarvisChatBot can be used to set alarms and timers. systemd (and scheduled tasks on windows) are used to store the alarms and timers. You can use the phrases set an alarm for 6:30 am or set a timer for 10 minutes to set an alarm or timer.

Limitations

Please note that this project was forked from a project that was developed for hackathon and demo purposes. Therefore, there is no guarantee for its performance or functionality.

Acknowledgements

This documentation was written by ChatGPT with some supervision by the author

Disclaimer

This project is a fan-made, non-commercial project and is not affiliated with, endorsed by, or associated with Marvel, Disney, or any of their subsidiaries. All characters, images, and other related materials are the property of their respective owners. No copyright infringement is intended. This project is for entertainment purposes only, and no monetary gain is being made from its distribution. If any copyright holder feels that their intellectual property has been used inappropriately, please contact me and the content will be removed immediately.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
oww_models		oww_models
sounds		sounds
static		static
templates		templates
.gitignore		.gitignore
2mic_button.service		2mic_button.service
ChatBot_LED_Color_Legend_NoGrid.pdf		ChatBot_LED_Color_Legend_NoGrid.pdf
LICENSE		LICENSE
alarm_timer_service.py		alarm_timer_service.py
apa102.py		apa102.py
assistants.json.example		assistants.json.example
chat_gpt_service.py		chat_gpt_service.py
config.json.example		config.json.example
dynamic_timer.sh		dynamic_timer.sh
gpio_button.py		gpio_button.py
input_listener.py		input_listener.py
install.bat		install.bat
install.sh		install.sh
jarvischatbot.service		jarvischatbot.service
jarvischatbot_ui.jpg		jarvischatbot_ui.jpg
led_service.py		led_service.py
main.py		main.py
main.spec		main.spec
package.bat		package.bat
radio_player.py		radio_player.py
readme.md		readme.md
requirements-win.txt		requirements-win.txt
requirements.txt		requirements.txt
scheduled_task.xml		scheduled_task.xml
sound_effect_service.py		sound_effect_service.py
startup_shutdown_sounds.service		startup_shutdown_sounds.service
trigger_alarm_timer.py		trigger_alarm_timer.py
tts_service.py		tts_service.py
user_guide.md		user_guide.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JarvisChatBot - A Voice-Activated Chatbot using OpenAI and Raspberry Pi

Introduction

Project Structure

Dependencies

Configuration

Running the Project

Obtaining Required Keys

Tested Environment and Installation Instructions

Raspberry Pi Setup

Windows install requirements

Edit the config.json file

Setting Up the Default Audio Output Device

Using as an AirPlay speaker with ShairPort-Sync

Running as a Service

Running Button Service

Running the startup_shutdown_sounds Service

LED Lights

AirPlay Speaker support via ShairPort-Sync

Alarms and Timers

Limitations

Acknowledgements

Disclaimer

About

Releases 11

Packages

Languages

License

MrSco/JarvisChatBot

Folders and files

Latest commit

History

Repository files navigation

JarvisChatBot - A Voice-Activated Chatbot using OpenAI and Raspberry Pi

Introduction

Project Structure

Dependencies

Configuration

Running the Project

Obtaining Required Keys

Tested Environment and Installation Instructions

Raspberry Pi Setup

Windows install requirements

Edit the config.json file

Setting Up the Default Audio Output Device

Using as an AirPlay speaker with ShairPort-Sync

Running as a Service

Running Button Service

Running the startup_shutdown_sounds Service

LED Lights

AirPlay Speaker support via ShairPort-Sync

Alarms and Timers

Limitations

Acknowledgements

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Languages

Packages