Skip to content

Latest commit

 

History

History
34 lines (19 loc) · 2.05 KB

description.md

File metadata and controls

34 lines (19 loc) · 2.05 KB

MDL1

Stylianos Ioannis Mimilakis¹ and Konstantinos Drossos²

¹Fraunhofer-IDMT, Ilmenau, Germany

²Tampere University of Technology, Tampere, Finland

Contact: mis [at] idmt.fraunhofer.de

Additional Info

  • is_blind: no
  • additional_training_data: no

Supplemental Material

Method

Task: Singing voice separation.

We used the Masker and Denoiser (MaD) architecture presented in the references below. Our method operates on single-channel mixture magnitude spectrograms and yields single-channel estimates for the singing voice. The accompaniment source is estimated by time-domain subtraction. To avoid the computational complexities of the recurrent inference, we introduced to the overall cost a unit matrix norm penalty for the latent representation of the target source time-frequency mask (denoted as "H_j_dec" in our paper). In MDL1 a scalar of 2e-7 is applied to the aforementioned matrix norm. For training we only used the training subset of MUSDB18, without any augmentation, normalisation or dropout. At test time, we applied our method to each available mixture channel independently.

More details can be found here: https://js-mim.github.io/mss_pytorch/

References

1 S.I. Mimilakis, K. Drossos, T. Virtanen, and G. Schuller: A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation, in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), September 2017.

2 S.I. Mimilakis, K. Drossos, J.F. Santos, G. Schuller, T. Virtanen, and Y. Bengio: Monaural singing voice separation with skip-filtering connections and recurrent inference of time-frequency mask, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018.