Skip to content

A Raspberry Pi-based Voice Interaction System. Press the GPIO button to record audio, transcribe it, and receive AI-generated responses. The OLED display provides real-time feedback, creating an interactive and engaging voice experience.

Notifications You must be signed in to change notification settings

gotesgan/Ai-companion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Interaction System Documentation

Overview

This project integrates various components for audio interaction, combining Automatic Speech Recognition (ASR) using Whisper by OpenAI, Text-to-Speech (TTS) through Coqui AI, and an AI model for generating responses via Ollama.

Table of Contents

  1. Prerequisites
  2. Installation
  3. Configuration
  4. Usage
  5. Components
  6. Dependencies

1. Prerequisites

2. Installation

ASR Server (Whisper)

  1. Clone the whisper repository:

    git clone https://github.com/openai/whisper.git
  2. Follow the installation instructions provided in the Whisper repository to set up the ASR server.

TTS Server (Coqui AI)

  1. Clone the TTS repository (Coqui AI):

    git clone https://github.com/coqui-ai/TTS.git
  2. Follow the installation instructions provided in the Coqui AI repository to set up the TTS server.

AI Model Server (Ollama)

  1. Clone the Ollama repository:

    git clone https://github.com/ollama/ollama.git
  2. Follow the installation instructions provided in the Ollama repository to set up the AI model server.

3. Configuration

ASR Server (Whisper)

  • Modify the ASR script (asr_server.py) to configure the Whisper ASR server URL.

    # ASR Server Configuration
    transcription_server_url = "http://your-whisper-server-address:5000/recognize"
    model = whisper.load_model("path/to/whisper-model")

TTS Server (Coqui AI)

  • Modify the TTS script (tts_server.py) to include the correct Coqui AI TTS server URL and API key.

    # TTS Server Configuration
    tts_server_url = "http://couqi-ai-server-address:5001/synthesize"
    api_key = "your-coqui-ai-api-key"

AI Model Server (Ollama)

  • Modify the AI model script (ollama_server.py) to include the correct Ollama AI model server URL.

    # AI Model Server Configuration
    ai_url = "http://ollama-ai-server-address:5002/api/generate"
    headers = {'Content-Type': 'application/json'}

4. Usage

  1. Start the ASR server (Whisper).

    python asr_server.py
  2. Start the TTS server (Coqui AI).

    python tts_server.py
  3. Start the AI model server (Ollama).

    python ollama_server.py
  4. Run the main script for continuous audio processing.

    python continuous_audio_processing.py

5. Components

ASR Server (Whisper)

The ASR server uses the Whisper library by OpenAI for Automatic Speech Recognition. Ensure that the Whisper model is correctly loaded in the transcribe_audio function.

# ASR Server Logic
def transcribe_audio(filename):
    model = whisper.load_model("path/to/whisper-model")
    result = model.transcribe(filename)
    return result["text"]

TTS Server (Coqui AI)

The TTS server uses Coqui AI for Text-to-Speech synthesis. Modify the synthesize function in the TTS script based on your TTS server implementation.

# TTS Server Logic
def synthesize():
    # ...
    tts.tts_to_file(text=text_to_speak, file_path=temp_output_file.name)
    # ...

AI Model Server (Ollama)

The AI model server communicates with Ollama for generating responses. Configure the AI model URL and headers in the generate_response function.

# AI Model Server Logic
def generate_response(prompt):
    data = {
        "model": "your-ollama-model",
        "stream": False,
        "prompt": prompt,
    }
    # ...

6. Dependencies

  • Flask: Web framework for serving APIs.
  • Requests: Library for making HTTP requests.
  • Torch: PyTorch library for deep learning.
  • TTS: Library for Text-to-Speech synthesis.

Ensure all dependencies are installed, and configurations are set up properly.

Feel free to customize the documentation according to your project's specific requirements, and provide additional details about each component, setup instructions, and any other relevant information.


This format presents the information in a more human-readable way, allowing users to follow step-by-step instructions without relying heavily on code blocks. Adjustments can be made based on your preferences and specific project details.

About

A Raspberry Pi-based Voice Interaction System. Press the GPIO button to record audio, transcribe it, and receive AI-generated responses. The OLED display provides real-time feedback, creating an interactive and engaging voice experience.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages