Audio Interaction System Documentation

Overview

This project integrates various components for audio interaction, combining Automatic Speech Recognition (ASR) using Whisper by OpenAI, Text-to-Speech (TTS) through Coqui AI, and an AI model for generating responses via Ollama.

1. Prerequisites

Python (version 3.x)
Flask
Requests
Torch
Whisper ASR (configured and running) - Whisper Repository
Coqui AI credentials (API key, etc.) - Coqui AI Repository
TTS library (Coqui AI or other)
Ollama AI model (configured and running) - Ollama Repository

2. Installation

ASR Server (Whisper)

Clone the whisper repository:

git clone https://github.com/openai/whisper.git

Follow the installation instructions provided in the Whisper repository to set up the ASR server.

TTS Server (Coqui AI)

Clone the TTS repository (Coqui AI):

git clone https://github.com/coqui-ai/TTS.git

Follow the installation instructions provided in the Coqui AI repository to set up the TTS server.

AI Model Server (Ollama)

Clone the Ollama repository:

git clone https://github.com/ollama/ollama.git

Follow the installation instructions provided in the Ollama repository to set up the AI model server.

3. Configuration

ASR Server (Whisper)

Modify the ASR script (asr_server.py) to configure the Whisper ASR server URL.

# ASR Server Configuration
transcription_server_url = "http://your-whisper-server-address:5000/recognize"
model = whisper.load_model("path/to/whisper-model")

TTS Server (Coqui AI)

Modify the TTS script (tts_server.py) to include the correct Coqui AI TTS server URL and API key.

# TTS Server Configuration
tts_server_url = "http://couqi-ai-server-address:5001/synthesize"
api_key = "your-coqui-ai-api-key"

AI Model Server (Ollama)

Modify the AI model script (ollama_server.py) to include the correct Ollama AI model server URL.

# AI Model Server Configuration
ai_url = "http://ollama-ai-server-address:5002/api/generate"
headers = {'Content-Type': 'application/json'}

4. Usage

Start the ASR server (Whisper).
```
python asr_server.py
```
Start the TTS server (Coqui AI).
```
python tts_server.py
```
Start the AI model server (Ollama).
```
python ollama_server.py
```
Run the main script for continuous audio processing.
```
python continuous_audio_processing.py
```

5. Components

ASR Server (Whisper)

The ASR server uses the Whisper library by OpenAI for Automatic Speech Recognition. Ensure that the Whisper model is correctly loaded in the transcribe_audio function.

# ASR Server Logic
def transcribe_audio(filename):
    model = whisper.load_model("path/to/whisper-model")
    result = model.transcribe(filename)
    return result["text"]

TTS Server (Coqui AI)

The TTS server uses Coqui AI for Text-to-Speech synthesis. Modify the synthesize function in the TTS script based on your TTS server implementation.

# TTS Server Logic
def synthesize():
    # ...
    tts.tts_to_file(text=text_to_speak, file_path=temp_output_file.name)
    # ...

AI Model Server (Ollama)

The AI model server communicates with Ollama for generating responses. Configure the AI model URL and headers in the generate_response function.

# AI Model Server Logic
def generate_response(prompt):
    data = {
        "model": "your-ollama-model",
        "stream": False,
        "prompt": prompt,
    }
    # ...

6. Dependencies

Flask: Web framework for serving APIs.
Requests: Library for making HTTP requests.
Torch: PyTorch library for deep learning.
TTS: Library for Text-to-Speech synthesis.

Ensure all dependencies are installed, and configurations are set up properly.

Feel free to customize the documentation according to your project's specific requirements, and provide additional details about each component, setup instructions, and any other relevant information.

This format presents the information in a more human-readable way, allowing users to follow step-by-step instructions without relying heavily on code blocks. Adjustments can be made based on your preferences and specific project details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
STTSERVE.py		STTSERVE.py
TTSSERVE.py		TTSSERVE.py
requirements.txt		requirements.txt
voice_interaction_system.py		voice_interaction_system.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Interaction System Documentation

Overview

Table of Contents

1. Prerequisites

2. Installation

ASR Server (Whisper)

TTS Server (Coqui AI)

AI Model Server (Ollama)

3. Configuration

ASR Server (Whisper)

TTS Server (Coqui AI)

AI Model Server (Ollama)

4. Usage

5. Components

ASR Server (Whisper)

TTS Server (Coqui AI)

AI Model Server (Ollama)

6. Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

gotesgan/Ai-companion

Folders and files

Latest commit

History

Repository files navigation

Audio Interaction System Documentation

Overview

Table of Contents

1. Prerequisites

2. Installation

ASR Server (Whisper)

TTS Server (Coqui AI)

AI Model Server (Ollama)

3. Configuration

ASR Server (Whisper)

TTS Server (Coqui AI)

AI Model Server (Ollama)

4. Usage

5. Components

ASR Server (Whisper)

TTS Server (Coqui AI)

AI Model Server (Ollama)

6. Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages