Skip to content

Beautiful voice app: record or upload to train a voice, generate speech from text or files, save & download voices.

License

Notifications You must be signed in to change notification settings

shamspias/vibevoice-studio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

71 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ VibeVoice Studio

A beautiful, modern web application for AI-powered voice synthesis using Microsoft's VibeVoice model. Generate natural-sounding speech from text with custom voice profiles.

VibeVoice Studio Python FastAPI

✨ Features

  • 🎀 Voice Training: Upload audio files or record your voice directly
  • πŸ“ Text-to-Speech: Convert text or text files to natural speech
  • 🎭 Multiple Speakers: Support for up to 4 distinct speakers
  • πŸ’Ύ Voice Library: Save and manage custom voice profiles
  • 🎨 Beautiful UI: Modern, responsive design with dark/light themes
  • ⚑ Real-time Processing: Fast speech generation with streaming support
  • πŸ“Š Audio Visualization: Live waveform display during recording
  • πŸ’Ύ Download & Save: Export generated audio files

🎬 Demo

vibevoice-demo.mp4
VibeVoice Studio end-to-end TTS, voice library, and multi-speaker demo.

πŸš€ Quick Start

Prerequisites

  • Python 3.9 or higher
  • CUDA-capable GPU (recommended)
  • 8GB+ RAM

Installation

  1. Clone the repository
git clone https://github.com/shamspias/vibevoice-studio.git
cd vibevoice-studio
  1. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install VibeVoice
git clone https://github.com/shamspias/VibeVoice
cd VibeVoice
pip install -e .
cd ..
  1. Install dependencies
pip install -r requirements.txt
  1. Configure environment
cp .env.example .env
# Edit .env with your settings
  1. Run the application
python -m app.main
  1. Open in browser
http://localhost:8000

🎨 Features Overview

Voice Management

  • Upload or record voices
  • Support for WAV, MP3, M4A, FLAC
  • Organized voice library

Text Processing

  • Manual input or upload .txt files
  • Multi-speaker support for conversations

Generation Settings

  • Voice strength (CFG scale 1.0–2.0)
  • Up to 4 speakers
  • Adjustable inference steps

Output Options

  • Play in browser
  • Download WAV file
  • Save to library

πŸ”§ Configuration

Edit .env:

HOST=0.0.0.0
PORT=8000
DEBUG=False
MODEL_PATH=microsoft/VibeVoice-1.5B
DEVICE=cuda
CFG_SCALE=1.3
SAMPLE_RATE=24000

🎯 Usage Examples

Basic TTS

  1. Select/upload a voice
  2. Enter text
  3. Click "Generate Speech"

Multi-Speaker

Speaker 1: Hello, welcome!
Speaker 2: Thanks, glad to be here.

Voice Cloning

  1. Record 10–30s of clear speech
  2. Save with name
  3. Use for TTS generation

πŸ› οΈ API Documentation

Endpoints

  • GET /api/voices β€” list voices
  • POST /api/voices/upload β€” upload voice
  • POST /api/voices/record β€” record voice
  • POST /api/generate β€” generate speech
  • GET /api/audio/{filename} β€” download audio

🚦 System Requirements

Minimum: Python 3.9+, 8GB RAM, CPU with AVX Recommended: Python 3.10+, 16GB RAM, NVIDIA GPU (8GB+ VRAM)

πŸ› Troubleshooting

  • OOM: Use smaller model, reduce batch size
  • Low quality: Use better voice samples, adjust CFG scale
  • Slow generation: Enable GPU, shorten text

πŸ“ˆ Performance Tips

  • Use GPU for 10–20Γ— speed
  • Batch texts
  • Cache voices
  • Try quantized models

🀝 Contributing

  1. Fork repo
  2. Create feature branch
  3. Commit & push
  4. Open PR

πŸ“„ License

MIT License

πŸ™ Acknowledgments

  • Microsoft VibeVoice team
  • FastAPI community
  • Contributors & users

πŸ“ž Support

πŸ”— Links