Skip to content

jlov7/voiceforge-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– VoiceForge AI

Intelligent Emotion Detection for Text-to-Speech

VoiceForge AI combines the power of Mistral 7B (via Ollama) with OpenAudio S1 Mini to create emotionally intelligent speech synthesis. The AI automatically analyzes your text for context, sarcasm, implied emotions, and nuances, then generates natural-sounding speech with appropriate emotional markers.

✨ Features

  • 🧠 Intelligent AI Analysis: Mistral 7B understands context, sarcasm, and subtle emotions
  • 🎭 50+ Emotion Markers: Complete OpenAudio S1 Mini emotion set
  • 🎡 6 Tone Variations: Whispering, shouting, soft tone, etc.
  • πŸŽͺ 11 Special Effects: Laughing, crying, sighing, and more
  • πŸ“ Two-Step Process: Review and edit AI analysis before audio generation
  • 🍎 Apple Silicon Optimized: Native MPS support for M1/M2/M3 Macs
  • πŸ”„ Fallback Analysis: Keyword-based analysis when AI unavailable

πŸš€ Quick Start

Prerequisites

  1. Python 3.10+ with conda/miniconda
  2. Ollama installed and running
  3. Mistral 7B model (or any other Ollama model)

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/voiceforge-ai.git
    cd voiceforge-ai
  2. Install dependencies:

    pip install gradio soundfile loguru torchaudio librosa transformers
    pip install descript-audiotools descript-audio-codec hydra-core omegaconf einops
    pip install tiktoken loralib pyrootutils resampy zstandard pyaudio
    pip install modelscope opencc-python-reimplemented silero-vad ormsgpack cachetools
  3. Download OpenAudio S1 Mini models:

    # The models should be placed in: models/openaudio-s1-mini/
    # Download from: https://huggingface.co/fishaudio/fish-speech-1.5
  4. Install and start Ollama:

    # Install Ollama from: https://ollama.ai
    ollama pull mistral:7b-instruct
    ollama serve
  5. Run VoiceForge AI:

    python openaudio_mistral_ai.py
  6. Open your browser: http://localhost:7866

πŸ”„ How to Use

Two-Step Process

  1. Step 1: AI Analysis

    • Enter your natural language text
    • Click "Analyze with Mistral AI"
    • The AI will analyze context, emotions, and nuances
  2. Step 2: Review & Edit

    • Review the AI-enhanced text with emotion markers
    • Edit the text if needed to adjust emotions/effects
    • Click "Generate Audio" to create speech
  3. Step 3: Generated Speech

    • Download your emotionally intelligent audio file

Example Inputs

  • Sarcasm: "Oh great, another meeting." β†’ AI detects sarcasm
  • Nuanced Emotion: "I'm cautiously optimistic." β†’ AI detects hesitation + hope
  • Complex Context: "Well, isn't that just perfect." β†’ AI detects sarcasm + frustration

πŸ€– Customizing the AI Model

VoiceForge AI uses Mistral 7B as an example, but you can easily switch to any Ollama-compatible model:

Using Different Models

  1. Install your preferred model:

    ollama pull llama3.1:8b
    # or
    ollama pull codellama:13b
    # or any other model
  2. Edit the model name in the code:

    # In openaudio_mistral_ai.py, line ~150
    response = requests.post(
        f"{self.ollama_url}/api/generate",
        json={
            "model": "llama3.1:8b",  # Change this line
            "prompt": prompt,
            # ... rest of config
        }
    )
  3. Restart the interface:

    python openaudio_mistral_ai.py

Recommended Models

  • Mistral 7B: Excellent balance of speed and intelligence
  • Llama 3.1 8B: Great for nuanced understanding
  • CodeLlama 13B: Good for technical content
  • Phi-3: Lightweight and fast

πŸ“ Project Structure

voiceforge-ai/
β”œβ”€β”€ openaudio_mistral_ai.py    # Main intelligent interface
β”œβ”€β”€ check_status.py            # System status checker
β”œβ”€β”€ models/                    # OpenAudio S1 Mini models
β”‚   └── openaudio-s1-mini/
β”œβ”€β”€ fish-speech/              # Fish Speech library
└── README.md                 # This file

🎭 Available Emotion Markers

Emotions (50+)

angry, sad, excited, surprised, satisfied, unhappy, anxious, delighted, scared, worried, confident, curious, joyful, sarcastic, empathetic, frustrated, proud, grateful, and many more...

Tones (6)

whispering, shouting, screaming, soft tone, in a hurry tone

Special Effects (11)

laughing, chuckling, sobbing, crying loudly, sighing, panting, groaning, crowd laughing, background laughter, audience laughing

πŸ› οΈ Troubleshooting

Common Issues

  1. "Ollama not running"

    ollama serve
  2. "Mistral not found"

    ollama pull mistral:7b-instruct
  3. "Model files not found"

    • Download OpenAudio S1 Mini models to models/openaudio-s1-mini/
  4. Import errors

    pip install -r requirements.txt  # If available
    # Or install dependencies manually as shown above

Status Check

Run the status checker to verify all components:

python check_status.py

πŸ”§ Advanced Configuration

Adjusting AI Analysis

Edit the prompt in openaudio_mistral_ai.py (around line 140) to customize how the AI analyzes text:

prompt = f"""You are an expert emotion analyst. Analyze the following text...
# Customize this prompt for your specific needs
"""

Performance Tuning

  • Apple Silicon: Automatically uses MPS acceleration
  • NVIDIA GPU: Set device to "cuda" in the code
  • CPU Only: Set device to "cpu" for compatibility

πŸ“„ License

This project is open source and available under the MIT License.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ™ Acknowledgments

  • Fish Speech Team: For the excellent OpenAudio S1 Mini model
  • Mistral AI: For the powerful Mistral 7B language model
  • Ollama Team: For making local AI models accessible
  • Gradio Team: For the beautiful web interface framework

πŸ“ž Support

If you encounter any issues or have questions:

  1. Check the troubleshooting section above
  2. Run python check_status.py to verify your setup
  3. Open an issue on GitHub with detailed error information

VoiceForge AI - Where artificial intelligence meets emotional expression! πŸŽ­πŸ€–

About

πŸ€– Intelligent Emotion Detection for Text-to-Speech - Combines Mistral 7B AI with OpenAudio S1 Mini for emotionally intelligent speech synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages