Skip to content

Latest commit

 

History

History
88 lines (78 loc) · 4.76 KB

README.md

File metadata and controls

88 lines (78 loc) · 4.76 KB

DNN-based source separation

A PyTorch implementation of DNN-based source separation.

Model

Model Reference Done
WaveNet WaveNet: A Generative Model for Raw Audio
Wave-U-Net Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Deep clustering Single-Channel Multi-Speaker Separation using Deep Clustering
Chimera++ Alternative Objective Functions for Deep Clustering
DANet Deep Attractor Network for Single-microphone Apeaker Aeparation
ADANet Speaker-independent Speech Separation with Deep Attractor Network
TasNet TasNet: Time-domain Audio Separation Network for Real-time, Single-channel Speech Separation
Conv-TasNet Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
DPRNN-TasNet Dual-path RNN: Efficient Long Sequence Modeling for Time-domain Single-channel Speech Separation
Gated DPRNN-TasNet Voice Separation with an Unknown Number of Multiple Speakers
FurcaNet FurcaNet: An End-to-End Deep Gated Convolutional, Long Short-term Memory, Deep Neural Networks for Single Channel Speech Separation
FurcaNeXt FurcaNeXt: End-to-End Monaural Speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks
DeepCASA Divide and Conquer: A Deep Casa Approach to Talker-independent Monaural Speaker Separation
Wavesplit Wavesplit: End-to-End Speech Separation by Speaker Clustering
DPTNet Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
D3Net D3Net: Densely connected multidilated DenseNet for music source separation
GALR Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Reccurent networks

Method related to training

Method Reference Done
Pemutation invariant training (PIT) Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
One-and-rest PIT Recursive Speech Separation for Unknown Number of Speakers

Example

Open In Colab

LibriSpeech example using Conv-TasNet

cd <REPOSITORY_ROOT>/egs/tutorials/

0. Preparation

cd <REPOSITORY_ROOT>/egs/tutorials/common/
. ./prepare.sh <DATASET_DIR> <#SPEAKERS>

1. Training

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./train.sh --exp_dir <OUTPUT_DIR>

If you want to resume training,

. ./train.sh --exp_dir <OUTPUT_DIR> --continue_from <MODEL_PATH>

2. Evaluation

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./test.sh --exp_dir <OUTPUT_DIR>

3. Demo

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./demo.sh

Version

  • v0.0.0: Initial version. LibriSpeech Conv-TasNet & DPRNN-TasNet examples are included.
  • v0.0.1: Dataset is renamed.
  • v0.1.0: Dataset structure is changed.
  • v0.1.1: DANet is included.
  • v0.1.2: Layer name is changed. Input feature for DANet is replaced by log-magnitude.
  • v0.1.3: Add scripts for Wall Street Journal 0 (WSJ0) dataset.
  • v0.1.4: Add non-nagative matrix factorization (NMF).
  • v0.2.0: Change the representation of short time Fourier transform (STFT).
  • v0.2.1: conv_tasnet directory is renamed to conv-tasnet. Add one-and-rest PIT (ORPIT).
  • v0.3.0: wsj0 is renamed to wsj0-mix. The result is updated.
  • v0.3.1: Implement Linear encoder for TasNet.
  • v0.3.2: Change the definition of hidden_channels in dual-path RNN.
  • v0.3.3: Fix trained models due to the update v0.3.2.
  • v0.4.0: Fix the network architecture of DPRNN-TasNet.
  • v0.4.1: Add DPTNet and GALRNet. Re-fix DPRNN-TasNet.
  • v0.4.2: Add training script for GALRNet.
  • v0.4.3: Re-fix DPRNN-TasNet.
  • v0.5.0: Add parse_options.sh.
  • v0.5.1: Multichannel support.
  • v0.5.2: Add metric learning tutorials.