Intelligent Emotion Detection for Text-to-Speech
VoiceForge AI combines the power of Mistral 7B (via Ollama) with OpenAudio S1 Mini to create emotionally intelligent speech synthesis. The AI automatically analyzes your text for context, sarcasm, implied emotions, and nuances, then generates natural-sounding speech with appropriate emotional markers.
- π§ Intelligent AI Analysis: Mistral 7B understands context, sarcasm, and subtle emotions
- π 50+ Emotion Markers: Complete OpenAudio S1 Mini emotion set
- π΅ 6 Tone Variations: Whispering, shouting, soft tone, etc.
- πͺ 11 Special Effects: Laughing, crying, sighing, and more
- π Two-Step Process: Review and edit AI analysis before audio generation
- π Apple Silicon Optimized: Native MPS support for M1/M2/M3 Macs
- π Fallback Analysis: Keyword-based analysis when AI unavailable
- Python 3.10+ with conda/miniconda
- Ollama installed and running
- Mistral 7B model (or any other Ollama model)
-
Clone the repository:
git clone https://github.com/yourusername/voiceforge-ai.git cd voiceforge-ai -
Install dependencies:
pip install gradio soundfile loguru torchaudio librosa transformers pip install descript-audiotools descript-audio-codec hydra-core omegaconf einops pip install tiktoken loralib pyrootutils resampy zstandard pyaudio pip install modelscope opencc-python-reimplemented silero-vad ormsgpack cachetools
-
Download OpenAudio S1 Mini models:
# The models should be placed in: models/openaudio-s1-mini/ # Download from: https://huggingface.co/fishaudio/fish-speech-1.5
-
Install and start Ollama:
# Install Ollama from: https://ollama.ai ollama pull mistral:7b-instruct ollama serve -
Run VoiceForge AI:
python openaudio_mistral_ai.py
-
Open your browser: http://localhost:7866
-
Step 1: AI Analysis
- Enter your natural language text
- Click "Analyze with Mistral AI"
- The AI will analyze context, emotions, and nuances
-
Step 2: Review & Edit
- Review the AI-enhanced text with emotion markers
- Edit the text if needed to adjust emotions/effects
- Click "Generate Audio" to create speech
-
Step 3: Generated Speech
- Download your emotionally intelligent audio file
- Sarcasm: "Oh great, another meeting." β AI detects sarcasm
- Nuanced Emotion: "I'm cautiously optimistic." β AI detects hesitation + hope
- Complex Context: "Well, isn't that just perfect." β AI detects sarcasm + frustration
VoiceForge AI uses Mistral 7B as an example, but you can easily switch to any Ollama-compatible model:
-
Install your preferred model:
ollama pull llama3.1:8b # or ollama pull codellama:13b # or any other model
-
Edit the model name in the code:
# In openaudio_mistral_ai.py, line ~150 response = requests.post( f"{self.ollama_url}/api/generate", json={ "model": "llama3.1:8b", # Change this line "prompt": prompt, # ... rest of config } )
-
Restart the interface:
python openaudio_mistral_ai.py
- Mistral 7B: Excellent balance of speed and intelligence
- Llama 3.1 8B: Great for nuanced understanding
- CodeLlama 13B: Good for technical content
- Phi-3: Lightweight and fast
voiceforge-ai/
βββ openaudio_mistral_ai.py # Main intelligent interface
βββ check_status.py # System status checker
βββ models/ # OpenAudio S1 Mini models
β βββ openaudio-s1-mini/
βββ fish-speech/ # Fish Speech library
βββ README.md # This file
angry, sad, excited, surprised, satisfied, unhappy, anxious, delighted, scared, worried, confident, curious, joyful, sarcastic, empathetic, frustrated, proud, grateful, and many more...
whispering, shouting, screaming, soft tone, in a hurry tone
laughing, chuckling, sobbing, crying loudly, sighing, panting, groaning, crowd laughing, background laughter, audience laughing
-
"Ollama not running"
ollama serve
-
"Mistral not found"
ollama pull mistral:7b-instruct
-
"Model files not found"
- Download OpenAudio S1 Mini models to
models/openaudio-s1-mini/
- Download OpenAudio S1 Mini models to
-
Import errors
pip install -r requirements.txt # If available # Or install dependencies manually as shown above
Run the status checker to verify all components:
python check_status.pyEdit the prompt in openaudio_mistral_ai.py (around line 140) to customize how the AI analyzes text:
prompt = f"""You are an expert emotion analyst. Analyze the following text...
# Customize this prompt for your specific needs
"""- Apple Silicon: Automatically uses MPS acceleration
- NVIDIA GPU: Set device to "cuda" in the code
- CPU Only: Set device to "cpu" for compatibility
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fish Speech Team: For the excellent OpenAudio S1 Mini model
- Mistral AI: For the powerful Mistral 7B language model
- Ollama Team: For making local AI models accessible
- Gradio Team: For the beautiful web interface framework
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Run
python check_status.pyto verify your setup - Open an issue on GitHub with detailed error information
VoiceForge AI - Where artificial intelligence meets emotional expression! ππ€