🤖 VoiceForge AI

Intelligent Emotion Detection for Text-to-Speech

VoiceForge AI combines the power of Mistral 7B (via Ollama) with OpenAudio S1 Mini to create emotionally intelligent speech synthesis. The AI automatically analyzes your text for context, sarcasm, implied emotions, and nuances, then generates natural-sounding speech with appropriate emotional markers.

✨ Features

🧠 Intelligent AI Analysis: Mistral 7B understands context, sarcasm, and subtle emotions
🎭 50+ Emotion Markers: Complete OpenAudio S1 Mini emotion set
🎵 6 Tone Variations: Whispering, shouting, soft tone, etc.
🎪 11 Special Effects: Laughing, crying, sighing, and more
📝 Two-Step Process: Review and edit AI analysis before audio generation
🍎 Apple Silicon Optimized: Native MPS support for M1/M2/M3 Macs
🔄 Fallback Analysis: Keyword-based analysis when AI unavailable

🚀 Quick Start

Prerequisites

Python 3.10+ with conda/miniconda
Ollama installed and running
Mistral 7B model (or any other Ollama model)

Installation

Clone the repository:

git clone https://github.com/yourusername/voiceforge-ai.git
cd voiceforge-ai

Install dependencies:

pip install gradio soundfile loguru torchaudio librosa transformers
pip install descript-audiotools descript-audio-codec hydra-core omegaconf einops
pip install tiktoken loralib pyrootutils resampy zstandard pyaudio
pip install modelscope opencc-python-reimplemented silero-vad ormsgpack cachetools

Download OpenAudio S1 Mini models:

# The models should be placed in: models/openaudio-s1-mini/
# Download from: https://huggingface.co/fishaudio/fish-speech-1.5

Install and start Ollama:

# Install Ollama from: https://ollama.ai
ollama pull mistral:7b-instruct
ollama serve

Run VoiceForge AI:
```
python openaudio_mistral_ai.py
```
Open your browser: http://localhost:7866

🔄 How to Use

Two-Step Process

Step 1: AI Analysis
- Enter your natural language text
- Click "Analyze with Mistral AI"
- The AI will analyze context, emotions, and nuances
Step 2: Review & Edit
- Review the AI-enhanced text with emotion markers
- Edit the text if needed to adjust emotions/effects
- Click "Generate Audio" to create speech
Step 3: Generated Speech
- Download your emotionally intelligent audio file

Example Inputs

Sarcasm: "Oh great, another meeting." → AI detects sarcasm
Nuanced Emotion: "I'm cautiously optimistic." → AI detects hesitation + hope
Complex Context: "Well, isn't that just perfect." → AI detects sarcasm + frustration

🤖 Customizing the AI Model

VoiceForge AI uses Mistral 7B as an example, but you can easily switch to any Ollama-compatible model:

Using Different Models

Install your preferred model:

ollama pull llama3.1:8b
# or
ollama pull codellama:13b
# or any other model

Edit the model name in the code:

# In openaudio_mistral_ai.py, line ~150
response = requests.post(
    f"{self.ollama_url}/api/generate",
    json={
        "model": "llama3.1:8b",  # Change this line
        "prompt": prompt,
        # ... rest of config
    }
)

Restart the interface:
```
python openaudio_mistral_ai.py
```

Recommended Models

Mistral 7B: Excellent balance of speed and intelligence
Llama 3.1 8B: Great for nuanced understanding
CodeLlama 13B: Good for technical content
Phi-3: Lightweight and fast

📁 Project Structure

voiceforge-ai/
├── openaudio_mistral_ai.py    # Main intelligent interface
├── check_status.py            # System status checker
├── models/                    # OpenAudio S1 Mini models
│   └── openaudio-s1-mini/
├── fish-speech/              # Fish Speech library
└── README.md                 # This file

🎭 Available Emotion Markers

Emotions (50+)

angry, sad, excited, surprised, satisfied, unhappy, anxious, delighted, scared, worried, confident, curious, joyful, sarcastic, empathetic, frustrated, proud, grateful, and many more...

Tones (6)

whispering, shouting, screaming, soft tone, in a hurry tone

Special Effects (11)

laughing, chuckling, sobbing, crying loudly, sighing, panting, groaning, crowd laughing, background laughter, audience laughing

🛠️ Troubleshooting

Common Issues

"Ollama not running"
```
ollama serve
```
"Mistral not found"
```
ollama pull mistral:7b-instruct
```
"Model files not found"
- Download OpenAudio S1 Mini models to models/openaudio-s1-mini/

Import errors

pip install -r requirements.txt  # If available
# Or install dependencies manually as shown above

Status Check

Run the status checker to verify all components:

python check_status.py

🔧 Advanced Configuration

Adjusting AI Analysis

Edit the prompt in openaudio_mistral_ai.py (around line 140) to customize how the AI analyzes text:

prompt = f"""You are an expert emotion analyst. Analyze the following text...
# Customize this prompt for your specific needs
"""

Performance Tuning

Apple Silicon: Automatically uses MPS acceleration
NVIDIA GPU: Set device to "cuda" in the code
CPU Only: Set device to "cpu" for compatibility

📄 License

This project is open source and available under the MIT License.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

🙏 Acknowledgments

Fish Speech Team: For the excellent OpenAudio S1 Mini model
Mistral AI: For the powerful Mistral 7B language model
Ollama Team: For making local AI models accessible
Gradio Team: For the beautiful web interface framework

📞 Support

If you encounter any issues or have questions:

Check the troubleshooting section above
Run python check_status.py to verify your setup
Open an issue on GitHub with detailed error information

VoiceForge AI - Where artificial intelligence meets emotional expression! 🎭🤖

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_status.py		check_status.py
openaudio_mistral_ai.py		openaudio_mistral_ai.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 VoiceForge AI

✨ Features

🚀 Quick Start

Prerequisites

Installation

🔄 How to Use

Two-Step Process

Example Inputs

🤖 Customizing the AI Model

Using Different Models

Recommended Models

📁 Project Structure

🎭 Available Emotion Markers

Emotions (50+)

Tones (6)

Special Effects (11)

🛠️ Troubleshooting

Common Issues

Status Check

🔧 Advanced Configuration

Adjusting AI Analysis

Performance Tuning

📄 License

🤝 Contributing

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 VoiceForge AI

✨ Features

🚀 Quick Start

Prerequisites

Installation

🔄 How to Use

Two-Step Process

Example Inputs

🤖 Customizing the AI Model

Using Different Models

Recommended Models

📁 Project Structure

🎭 Available Emotion Markers

Emotions (50+)

Tones (6)

Special Effects (11)

🛠️ Troubleshooting

Common Issues

Status Check

🔧 Advanced Configuration

Adjusting AI Analysis

Performance Tuning

📄 License

🤝 Contributing

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages