AI-Powered Voice Dubbing for Short Videos (Reels/TikTok/YouTube Shorts)
Transform your videos with professional AI voiceovers in multiple languages with advanced lip-sync technology. Perfect for content creators who want to scale their video production without spending hours on manual voiceover recording.
- AI Video Analysis - Gemini 2.5 Flash automatically analyzes your video content
- Smart Script Generation - Auto-generates engaging scripts based on video content
- 5 Voice Personalities - Professional, Friendly, Humorous, Dynamic, Soothing
- Multi-Language Support - 9+ languages with automatic translation
- Advanced Lip-Sync - Natural-looking lip synchronization
- Social Media Optimization - Auto-format for Reels, TikTok, YouTube Shorts
- Real-time Processing - Live progress tracking with ETA
- Audio Enhancement - Professional audio normalization and compression
- Custom Script Editing - Override AI-generated scripts
- Multi-format Downloads - Separate video and audio file downloads
- Video Analytics - Duration, resolution, and format analysis
- Batch Processing Ready - Architecture supports multiple video processing
- React 18 - Modern UI framework
- Vite - Lightning-fast build tool
- Tailwind CSS - Utility-first styling
- Lucide React - Beautiful icons
- React Dropzone - Drag & drop file uploads
- Axios - HTTP client for API calls
- FastAPI - High-performance Python web framework
- Python 3.8+ - Core backend language
- Pydantic - Data validation and serialization
- AsyncIO - Asynchronous processing
- Uvicorn - ASGI server
- Google Gemini 2.5 Flash - Video analysis and script generation
- Murf AI - Professional text-to-speech and dubbing
- MoviePy - Video processing and manipulation
- OpenCV - Computer vision for lip-sync
- Pydub - Audio processing and enhancement
- Node.js 16 or higher
- Python 3.8 or higher
- Murf AI API Key
- Google Gemini API Key
git clone https://github.com/yourusername/voicecraft-studio.git
cd voicecraft-studiocd backend
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtcd frontend
npm installBackend Environment (.env file in backend folder):
MURF_API_KEY=your_murf_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
UPLOAD_DIR=uploads
MAX_FILE_SIZE=100000000
ALLOWED_EXTENSIONS=mp4,avi,mov,mkvFrontend Environment (optional .env file in frontend folder):
VITE_API_URL=http://localhost:8000mkdir -p uploads/videos uploads/audio uploads/processed uploads/tempcd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm run devThe application will be available at http://localhost:5173
- Visit Murf.ai
- Create an account and subscribe to a plan
- Navigate to API section in your dashboard
- Generate your API key
- Add to backend/.env file
- Visit Google AI Studio
- Create new project or use existing one
- Generate API key
- Enable Gemini Pro API
- Add to backend/.env file
- Drag and drop your video file or click to browse
- Supported formats: MP4, AVI, MOV, MKV
- Maximum file size: 100MB
- Automatic format validation and video analysis
- Gemini 2.5 Flash analyzes your video frame by frame
- Identifies key visual elements and actions
- Generates contextually appropriate script
- Estimates optimal voice timing and duration
- Choose from 5 distinct voice personalities
- Select target language (supports 9+ languages)
- Edit the AI-generated script if needed
- Configure lip-sync and audio enhancement options
- Murf AI generates professional-quality voiceover
- Advanced audio processing and normalization
- Lip-sync generation with facial landmark detection
- Social media format optimization
- Download high-quality dubbed video
- Download separate audio file
- Files optimized for social media platforms
voicecraft-studio/
├── frontend/ # React + Vite Frontend
│ ├── src/
│ │ ├── components/ # Reusable UI components
│ │ │ ├── VideoUploader.jsx
│ │ │ ├── VoiceSelector.jsx
│ │ │ ├── ProgressTracker.jsx
│ │ │ ├── VideoPlayer.jsx
│ │ │ └── LanguageSelector.jsx
│ │ ├── services/ # API communication layer
│ │ │ └── api.js
│ │ ├── styles/ # CSS and styling
│ │ │ └── index.css
│ │ ├── App.jsx # Main application component
│ │ └── main.jsx # Application entry point
│ ├── index.html
│ ├── package.json
│ ├── vite.config.js
│ └── tailwind.config.js
│
├── backend/ # FastAPI Backend
│ ├── app/
│ │ ├── models/ # Pydantic data models
│ │ │ └── schemas.py
│ │ ├── services/ # Business logic services
│ │ │ ├── gemini_client.py # Google Gemini integration
│ │ │ ├── murf_client.py # Murf AI integration
│ │ │ ├── video_processor.py # Video processing logic
│ │ │ └── audio_processor.py # Audio enhancement
│ │ ├── utils/ # Helper utilities
│ │ │ └── file_handler.py
│ │ ├── main.py # FastAPI application
│ │ └── config.py # Configuration management
│ ├── requirements.txt
│ └── .env.example
│
├── uploads/ # File storage directory
│ ├── videos/ # Uploaded video files
│ ├── audio/ # Generated audio files
│ ├── processed/ # Final processed videos
│ └── temp/ # Temporary processing files
│
├── README.md
├── .gitignore
└── docker-compose.yml
POST /api/upload-video- Upload and analyze video fileGET /api/video/{video_id}- Retrieve video information
POST /api/generate-dubbing- Start dubbing generation processGET /api/status/{task_id}- Check processing statusGET /api/download/{task_id}/{file_type}- Download processed files
GET /api/languages- Get list of supported languagesGET /api/voices- Get available voice personalities
curl -X POST -F "[email protected]" http://localhost:8000/api/upload-videocurl -X POST \
-H "Content-Type: application/json" \
-d '{
"video_id": "uuid-here",
"voice_personality": "casual",
"target_language": "es",
"lip_sync": true
}' \
http://localhost:8000/api/generate-dubbing| Personality | Description | Best Use Cases |
|---|---|---|
| Professional | Clear, authoritative, trustworthy tone | Business content, tutorials, educational videos |
| Friendly | Warm, approachable, conversational style | Lifestyle vlogs, personal content, casual tutorials |
| Humorous | Light-hearted, entertaining, engaging | Comedy content, memes, fun educational content |
| Dynamic | High-energy, motivational, exciting | Sports content, fitness videos, motivational content |
| Soothing | Gentle, peaceful, calming tone | Meditation videos, ASMR content, relaxation content |
- English (en) - Native support with all voice personalities
- Spanish (es) - Auto-translation with native voice synthesis
- French (fr) - Auto-translation with native voice synthesis
- German (de) - Auto-translation with native voice synthesis
- Italian (it) - Auto-translation with native voice synthesis
- Portuguese (pt) - Auto-translation with native voice synthesis
- Hindi (hi) - Auto-translation with native voice synthesis
- Japanese (ja) - Auto-translation with native voice synthesis
- Korean (ko) - Auto-translation with native voice synthesis
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000cd frontend
npm install
npm run dev# Backend tests (if implemented)
cd backend
python -m pytest
# Frontend tests
cd frontend
npm testcd frontend
npm run buildcd backend
uvicorn app.main:app --host 0.0.0.0 --port 8000# Build and start all services
docker-compose up --build
# Run in background
docker-compose up -d
# Stop services
docker-compose down# Build backend
docker build -t voicecraft-backend ./backend
# Build frontend
docker build -t voicecraft-frontend ./frontend
# Run backend
docker run -p 8000:8000 voicecraft-backend
# Run frontend
docker run -p 5173:5173 voicecraft-frontend- Maximum Video Size: 100MB per upload
- Supported Formats: MP4, AVI, MOV, MKV
- Processing Time: 30-60 seconds for typical short video (15-60 seconds)
- Audio Quality: 192kbps MP3 with professional enhancement
- Concurrent Processing: Multiple users supported
- Output Optimization: Automatic social media format optimization
- Temporary Storage: Files are processed and cleaned up automatically
- No Permanent Storage: Videos are not permanently stored on servers
- API Security: Rate limiting and input validation implemented
- File Validation: Strict file type and size checking
- CORS Protection: Secure cross-origin resource sharing
- Local Processing: Video analysis happens on your server
- No Data Mining: Content is not used for training or analytics
- Secure Uploads: Files are processed in isolated environment
- Auto Cleanup: Temporary files are automatically removed
# Check Node.js version
node --version # Should be 16+
# Check Python version
python --version # Should be 3.8+
# Reinstall dependencies
cd frontend && npm install
cd backend && pip install -r requirements.txt# Check browser console for errors
# Verify dev server is running on correct port
# Check if index.html exists in frontend root directory# Test backend connectivity
curl http://localhost:8000/api/languages
# Check CORS configuration
# Verify proxy settings in vite.config.js# Verify file format is supported
# Check file size is under 100MB
# Ensure upload directory exists and has write permissions# Verify API keys are correctly set in .env file
# Check API key quotas and limits
# Review backend logs for detailed error messages# Test API endpoints
curl http://localhost:8000/docs # FastAPI documentation
curl http://localhost:8000/api/languages # Language support check
# Check file permissions
ls -la uploads/
ls -la backend/.env
# View backend logs
cd backend
uvicorn app.main:app --reload --log-level debug- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature - Make your changes and commit:
git commit -m 'Add new feature' - Push to your branch:
git push origin feature/new-feature - Create a Pull Request
- Follow PEP 8 style guide for Python code
- Use ESLint configuration for JavaScript/React code
- Add unit tests for new features
- Update documentation for API changes
- Test across different video formats and sizes
- All changes require review before merging
- Automated tests must pass
- Documentation must be updated for new features
- Performance impact should be considered
- Basic video upload and processing
- AI script generation with Gemini
- Multi-personality voice synthesis
- Language translation capabilities
- Basic lip-sync implementation
- Advanced lip-sync with facial landmark detection
- Emotion-based voice modulation
- Background music integration
- Batch video processing capabilities
- Cloud storage integration (AWS S3, Google Cloud)
- Real-time collaboration tools
- Built-in video editing capabilities
- Custom voice cloning functionality
- Comprehensive analytics dashboard
- Mobile application (React Native)
- API rate limiting and user management
- White-label solutions
- Custom voice training
- Enterprise SSO integration
- Advanced analytics and reporting
- Custom deployment options
- Murf AI: Approximately $0.10-0.50 per minute of generated audio
- Google Gemini: Approximately $0.001 per 1,000 tokens for analysis
- Storage: Minimal cost for temporary file storage
- Compute: Variable based on video processing requirements
- Implement caching for frequently generated scripts
- Use audio compression to reduce API costs
- Implement background job queues for efficient batch processing
- Monitor and set API usage limits
- Consider bulk API pricing for high-volume usage
MURF_API_KEY=production_murf_key
GEMINI_API_KEY=production_gemini_key
UPLOAD_DIR=/app/uploads
MAX_FILE_SIZE=100000000
ALLOWED_EXTENSIONS=mp4,avi,mov,mkv
CORS_ORIGINS=https://yourdomain.com
DATABASE_URL=postgresql://user:pass@localhost/dbname
REDIS_URL=redis://localhost:6379VITE_API_URL=https://api.yourdomain.com
VITE_APP_NAME=VoiceCraft Studio
VITE_MAX_FILE_SIZE=100000000# Build production images
docker-compose -f docker-compose.prod.yml build
# Deploy to production
docker-compose -f docker-compose.prod.yml up -d# Build frontend
cd frontend
npm run build
# Deploy backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000
# Serve frontend with nginx/apache
# Point web server to frontend/dist directory- Video upload with various formats (MP4, AVI, MOV, MKV)
- AI script generation accuracy
- Voice personality selection and audio generation
- Language translation functionality
- Dubbing process completion
- Lip-sync quality assessment
- File download functionality
- Error handling for invalid inputs
- Performance with large video files
- Cross-browser compatibility
# Run backend tests
cd backend
python -m pytest tests/ -v
# Run frontend tests
cd frontend
npm test
# Run integration tests
npm run test:integration- File Upload: Test with 5s, 30s, and 60s videos
- Format Support: Test MP4, AVI, MOV, MKV files
- Size Limits: Test files near 100MB limit
- Error Handling: Test invalid formats and oversized files
- API Integration: Test all voice personalities and languages
- Upload Success Rate - Percentage of successful video uploads
- Processing Time - Average time from upload to completion
- API Usage - Murf and Gemini API call volumes and costs
- Error Rates - Failed processing attempts and reasons
- User Engagement - Most popular voice personalities and languages
# Backend logging setup
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('voicecraft.log'),
logging.StreamHandler()
]
)- File type validation using magic numbers
- File size limits strictly enforced
- Virus scanning for uploaded files (recommended)
- Temporary file cleanup after processing
- Rate limiting to prevent abuse
- Input validation and sanitization
- CORS properly configured
- API key rotation strategy
- No permanent storage of user videos
- Automatic cleanup of temporary files
- No logging of video content
- GDPR compliance considerations
- GitHub Issues: Report bugs and request features
- Documentation: Comprehensive API and setup guides
- Community: Join discussions and share tips
When reporting bugs, please include:
- Operating system and version
- Node.js and Python versions
- Complete error messages and stack traces
- Steps to reproduce the issue
- Sample video files (if applicable and appropriate)
- Use GitHub Issues with "enhancement" label
- Provide detailed description of proposed feature
- Include use cases and potential implementation approach
- Consider contributing the feature yourself
This project is licensed under the MIT License. See the LICENSE file for full details.
- Murf AI for professional voice synthesis capabilities
- Google Gemini for advanced AI video analysis
- OpenAI for natural language processing research
- FFmpeg for video processing foundations
- React and Vite for modern frontend development
- FastAPI for high-performance backend framework
- MoviePy for Python video processing
- Tailwind CSS for utility-first styling
- Lucide React for beautiful iconography
- All contributors who help improve this project
- Beta testers who provide valuable feedback
- Content creators who inspire new features
Ready to transform your video content? Get started by uploading your first video and experience the power of AI voice dubbing.