an open-source framework for detecting audio generated from generative systems
-
Updated
Feb 22, 2024 - Python
an open-source framework for detecting audio generated from generative systems
voXify is a Streamlit-powered speech-to-text web application, enabling to generate transcripts from various audio sources and download in PDF or Word format.
Code from the ASR tutorial https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5
This project leverages Python, computer vision, and deep learning techniques, utilizing pre-trained models such as RetinaNet_ResNet-50 for image-based object detection. It is designed with a primary focus on enhancing security across various sectors. The RetinaNet_ResNet-50 model enables both image and video-based detection functionalities.
In this notebook, we aim to recognize speech commands using classification. For this purpose, we used the SPEECHCOMMANDS dataset and the deep convolutional model M5. The code is written in Python and designed for the PyTorch platform.
Signal Separation API
Classifying Music Genre with Urban Sound Dataset, Preprocessing with Librosa and Torch audio, Model made in Tensorflow and PyTorch
The core of my graduation project that uses convolutional neural networks to extract the vocal part from a song by removing the sound of musical instruments. The project is rather academic, it did not achieve too great real results, but this is expected. I'm not going to develop it further.
The unmix model trained to separate guitar playing from audio samples using a custom-built dataset.
Тестовое задание на дипломный проект в Huawei
Find how similar your voice is to Taylor Swift (WIP) ✨
I have recently gained knowledge on how to utilize PyTorch, an open-source machine learning framework that is known for its simplicity, performance, and APIs.
Convolutional Neural Net trained on over two hours of audio data, capable of differentiating between guitarists playing solos.
Generating unique one-shot audio samples with Stable Diffusion.
A Speech Recognition Framework for Banking Interactions using Convolutional Recurrent Dense Neural Networks and Language Models
Automatic Speech Recognition using torchaudio
CNN-LSTM model for audio emotion detection in children with adverse childhood events.
cnn-based model for audio trained on cpu using pytorch
The road sign recognition system of the Russian Federation, which uses an already prepared model for object detection and image segmentation in real time to improve road safety
Add a description, image, and links to the torchaudio topic page so that developers can more easily learn about it.
To associate your repository with the torchaudio topic, visit your repo's landing page and select "manage topics."