🎙️ VibeVoice Studio

A beautiful, modern web application for AI-powered voice synthesis using Microsoft's VibeVoice model. Generate natural-sounding speech from text with custom voice profiles.

✨ Features

🎤 Voice Training: Upload audio files or record your voice directly
📝 Text-to-Speech: Convert text or text files to natural speech
🎭 Multiple Speakers: Support for up to 4 distinct speakers
💾 Voice Library: Save and manage custom voice profiles
🎨 Beautiful UI: Modern, responsive design with dark/light themes
⚡ Real-time Processing: Fast speech generation with streaming support
📊 Audio Visualization: Live waveform display during recording
💾 Download & Save: Export generated audio files

🎬 Demo

vibevoice-demo.mp4

VibeVoice Studio end-to-end TTS, voice library, and multi-speaker demo.

🚀 Quick Start

Prerequisites

Python 3.9 or higher
CUDA-capable GPU (recommended)
8GB+ RAM

Installation

Clone the repository

git clone https://github.com/shamspias/vibevoice-studio.git
cd vibevoice-studio

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install VibeVoice

git clone https://github.com/shamspias/VibeVoice
cd VibeVoice
pip install -e .
cd ..

Install dependencies

pip install -r requirements.txt

Configure environment

cp .env.example .env
# Edit .env with your settings

Run the application

python -m app.main

Open in browser

http://localhost:8000

🎨 Features Overview

Voice Management

Upload or record voices
Support for WAV, MP3, M4A, FLAC
Organized voice library

Text Processing

Manual input or upload .txt files
Multi-speaker support for conversations

Generation Settings

Voice strength (CFG scale 1.0–2.0)
Up to 4 speakers
Adjustable inference steps

Output Options

Play in browser
Download WAV file
Save to library

🔧 Configuration

Edit .env:

HOST=0.0.0.0
PORT=8000
DEBUG=False
MODEL_PATH=microsoft/VibeVoice-1.5B
DEVICE=cuda
CFG_SCALE=1.3
SAMPLE_RATE=24000

🎯 Usage Examples

Basic TTS

Select/upload a voice
Enter text
Click "Generate Speech"

Multi-Speaker

Speaker 1: Hello, welcome!
Speaker 2: Thanks, glad to be here.

Voice Cloning

Record 10–30s of clear speech
Save with name
Use for TTS generation

🛠️ API Documentation

Endpoints

GET /api/voices — list voices
POST /api/voices/upload — upload voice
POST /api/voices/record — record voice
POST /api/generate — generate speech
GET /api/audio/{filename} — download audio

🚦 System Requirements

Minimum: Python 3.9+, 8GB RAM, CPU with AVX Recommended: Python 3.10+, 16GB RAM, NVIDIA GPU (8GB+ VRAM)

🐛 Troubleshooting

OOM: Use smaller model, reduce batch size
Low quality: Use better voice samples, adjust CFG scale
Slow generation: Enable GPU, shorten text

📈 Performance Tips

Use GPU for 10–20× speed
Batch texts
Cache voices
Try quantized models

🤝 Contributing

Fork repo
Create feature branch
Commit & push
Open PR

📄 License

MIT License

🙏 Acknowledgments

Microsoft VibeVoice team
FastAPI community
Contributors & users

📞 Support

Issues: GitHub Issues

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
app		app
assets/demo		assets/demo
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ VibeVoice Studio

✨ Features

🎬 Demo

🚀 Quick Start

Prerequisites

Installation

🎨 Features Overview

Voice Management

Text Processing

Generation Settings

Output Options

🔧 Configuration

🎯 Usage Examples

Basic TTS

Multi-Speaker

Voice Cloning

🛠️ API Documentation

Endpoints

🚦 System Requirements

🐛 Troubleshooting

📈 Performance Tips

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🔗 Links

About

Uh oh!

Releases

Packages

Languages

License

shamspias/vibevoice-studio

Folders and files

Latest commit

History

Repository files navigation

🎙️ VibeVoice Studio

✨ Features

🎬 Demo

🚀 Quick Start

Prerequisites

Installation

🎨 Features Overview

Voice Management

Text Processing

Generation Settings

Output Options

🔧 Configuration

🎯 Usage Examples

Basic TTS

Multi-Speaker

Voice Cloning

🛠️ API Documentation

Endpoints

🚦 System Requirements

🐛 Troubleshooting

📈 Performance Tips

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

🔗 Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages