This is the source code for the experiments related to our work on a differentiable model for unsupervised singing voice separation.
We proposed to extend the work of Schultze-Foster et al., and to build a complete, fully differentiable model by integrating a multipitch estimator and a novel differentiable voice assignment module within the core model.
Note 1: This project builds upon the model of Schultze-Foster et al. and parts of the code are taken/adapted from their repository.
Note 2: The trained models of Cuesta et al. (multiple-f0 estimation) and, Cuesta and Gómez (voice assignment) have been used in our experiments.
📄 Schultze-Forster et al. paper
📄 Multiple-f0 estimation paper | Multiple-f0 Assignment paper
📁 CSD Database | Cantoría Database
The following packages are required:
pytorch=1.6.0
matplotlib=3.3.1
python-sounddevice=0.4.0
scipy=1.5.2
torchaudio=0.6.0
tqdm=4.49.0
pysoundfile=0.10.3
librosa=0.8.0
scikit-learn=0.23.2
tensorboard=2.3.0
resampy=0.2.2
pandas=1.2.3
These packages can be found using the conda-forge and pytorch channels. Python 3.7 or 3.8 is recommended. From a new conda environment:
conda update conda
conda config --add channels conda-forge
conda config --set channel_priority strict
conda config --add channels pytorch
conda install pytorch=1.6.0
conda install numpy=1.23.5 matplotlib=3.3.1 python-sounddevice=0.4.0 scipy=1.5.2 torchaudio=0.6.0 tqdm=4.49.0 pysoundfile=0.10.3 librosa=0.8.0 scikit-learn=0.23.2 tensorboard=2.3.0 resampy=0.2.2 pandas=1.2.3 configargparse=0.13.0
pip install pumpp==0.6.0 nnAudio=0.3.2
or you can use the provided environment.yml file:
conda env create -f environment.yml
python train.py -c config.txt
python train_u_nets.py -c unet_config.txt
python eval.py --tag 'TAG' --f0-from-mix --test-set 'CSD'
Note: 'TAG' is the evaluated model's name. (Example: UMSS_4s_bcbq)
The trained models used in our experiments are available here.