DNN-based source separation

A PyTorch implementation of DNN-based source separation.

Model

Model	Reference	Done
WaveNet	WaveNet: A Generative Model for Raw Audio	✔
Wave-U-Net	Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Deep clustering	Single-Channel Multi-Speaker Separation using Deep Clustering
Chimera++	Alternative Objective Functions for Deep Clustering
DANet	Deep Attractor Network for Single-microphone Apeaker Aeparation	✔
ADANet	Speaker-independent Speech Separation with Deep Attractor Network
TasNet	TasNet: Time-domain Audio Separation Network for Real-time, Single-channel Speech Separation	✔
Conv-TasNet	Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation	✔
DPRNN-TasNet	Dual-path RNN: Efficient Long Sequence Modeling for Time-domain Single-channel Speech Separation	✔
Gated DPRNN-TasNet	Voice Separation with an Unknown Number of Multiple Speakers
FurcaNet	FurcaNet: An End-to-End Deep Gated Convolutional, Long Short-term Memory, Deep Neural Networks for Single Channel Speech Separation
FurcaNeXt	FurcaNeXt: End-to-End Monaural Speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks
DeepCASA	Divide and Conquer: A Deep Casa Approach to Talker-independent Monaural Speaker Separation
Wavesplit	Wavesplit: End-to-End Speech Separation by Speaker Clustering
DPTNet	Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation	✔
D3Net	D3Net: Densely connected multidilated DenseNet for music source separation
GALR	Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Reccurent networks	✔

Method related to training

Method	Reference	Done
Pemutation invariant training (PIT)	Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks	✔
One-and-rest PIT	Recursive Speech Separation for Unknown Number of Speakers	✔

Example

LibriSpeech example using Conv-TasNet

cd <REPOSITORY_ROOT>/egs/tutorials/

0. Preparation

cd <REPOSITORY_ROOT>/egs/tutorials/common/
. ./prepare.sh <DATASET_DIR> <#SPEAKERS>

1. Training

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./train.sh --exp_dir <OUTPUT_DIR>

If you want to resume training,

. ./train.sh --exp_dir <OUTPUT_DIR> --continue_from <MODEL_PATH>

2. Evaluation

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./test.sh --exp_dir <OUTPUT_DIR>

3. Demo

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./demo.sh

Version

v0.0.0: Initial version. LibriSpeech Conv-TasNet & DPRNN-TasNet examples are included.
v0.0.1: Dataset is renamed.
v0.1.0: Dataset structure is changed.
v0.1.1: DANet is included.
v0.1.2: Layer name is changed. Input feature for DANet is replaced by log-magnitude.
v0.1.3: Add scripts for Wall Street Journal 0 (WSJ0) dataset.
v0.1.4: Add non-nagative matrix factorization (NMF).
v0.2.0: Change the representation of short time Fourier transform (STFT).
v0.2.1: conv_tasnet directory is renamed to conv-tasnet. Add one-and-rest PIT (ORPIT).
v0.3.0: wsj0 is renamed to wsj0-mix. The result is updated.
v0.3.1: Implement Linear encoder for TasNet.
v0.3.2: Change the definition of hidden_channels in dual-path RNN.
v0.3.3: Fix trained models due to the update v0.3.2.
v0.4.0: Fix the network architecture of DPRNN-TasNet.
v0.4.1: Add DPTNet and GALRNet. Re-fix DPRNN-TasNet.
v0.4.2: Add training script for GALRNet.
v0.4.3: Re-fix DPRNN-TasNet.
v0.5.0: Add parse_options.sh.
v0.5.1: Multichannel support.
v0.5.2: Add metric learning tutorials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DNN-based source separation

Model

Method related to training

Example

0. Preparation

1. Training

2. Evaluation

3. Demo

Version

Files

README.md

Latest commit

History

README.md

File metadata and controls

DNN-based source separation

Model

Method related to training

Example

0. Preparation

1. Training

2. Evaluation

3. Demo

Version