A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Sep 11, 2025 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A ComfyUI custom node integration for multi-engine multi-language High-quality Text-to-Speech and Voice Conversion. Supports: RVC, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and Microsoft VibeVoice with unlimited text length, SRT timing, Character support, Audio Analyzer, Silent Speech Analyzer, audio edit and more
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
Music Generation Using Deep Learning🎶🎵
Real-Time Deepfake Pipeline
AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧
AudioInsight is a web application that processes audio, generates transcriptions, and allows users to ask questions about the related audio.
An approach to Andrej Karpathy's LLM challenge, as outlined here: https://twitter.com/karpathy/status/1760740503614836917
Professional Yocto BSP Layer for Dynamic Devices Edge Computing Platforms - AI Audio Processing, E-Ink Displays, Power Management, Wireless Connectivity, i.MX8MM/i.MX93 Support
AI Audio Framework 🎵
A project attempting to generate and extract features from music to make comparisons with popular artists, and examine where and with what demographics those artists are popular in order to craft a DIY marketing solution for aspiring artists.
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
Open source AI speech generation solution
Acoustic Space Analyzer AI Pro is a professional acoustic analysis tool that leverages artificial intelligence to generate optimized DSP processing chains for any acoustic environment. This innovative application combines real-time spectral analysis, 3D spatial scanning, and AI-powered audio processing to deliver precise acoustic corrections.
🎤 Create high-quality, longform conversations with VibeVoice, an advanced, open-source text-to-speech model for the speech synthesis community.
🎙️ Transform long texts into natural-sounding speech with VibeVoice, the advanced conversational text-to-speech model designed for engaging interactions.
This repository implements Unsupervised Domain Adaptation using Gradient Reversal Layer with PaSST feature extractors for cross-device acoustic scene classification on DCASE TAU 2020 dataset.
Add a description, image, and links to the ai-audio topic page so that developers can more easily learn about it.
To associate your repository with the ai-audio topic, visit your repo's landing page and select "manage topics."