-
TU Darmstadt
- Darmstadt
-
15:06
(UTC +01:00)
Stars
Expressive Anechoic Recordings of Speech (EARS)
Fast and accurate automatic speech recognition (ASR) for edge devices
HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement (ICASSP 2023)
Variational Bayes HMM over x-vectors diarization
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
This is the implementation of the paper ''Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement'', which was accepted by IJCAI-ECAI2022 (Long oral)
Implementation of "Perceptual Losses for Real-Time Style Transfer and Super-Resolution" in PyTorch
A New Padding Scheme: Partial Convolution based Padding
Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions' [Liu+, ECCV2018]
Re-Implementation of "Image Inpainting for Irregular Holes using Partial Convolution"
Image Inpainting for Irregular Holes Using Partial Convolutions
official implementation of Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals
Application of MB-iSTFT-VITS components to vits2_pytorch
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compa…
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
This document contains the functions that are currently available in the RobustSP toolbox: a Matlab toolbox for robust signal processing. The toolbox can be freely used for non-commercial use only.…
Moodle - the world's open source learning platform
A feature-rich command-line audio/video downloader
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
Apply Score diffusion to improve speech signals recorded under various adverse conditions and distortions, including noise, reverberation, clipping, equalization (EQ) distortion, packet loss, codec…
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
NeMo text processing for ASR and TTS
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
QwenLM / vllm-gptq
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs