Stylianos Ioannis Mimilakis¹ and Konstantinos Drossos²
¹Fraunhofer-IDMT, Ilmenau, Germany
²Tampere University of Technology, Tampere, Finland
Contact: mis [at] idmt.fraunhofer.de
- is_blind: no
- additional_training_data: no
Task: Singing voice separation.
We used the Masker and Denoiser (MaD) architecture presented in the references below. Our method operates on single-channel
mixture magnitude spectrograms and yields single-channel estimates for the singing voice. The main difference between the MDL1 is that a thresholding algorithm is applied to the latent space that controls the time-frequency mask generation ("denoted as "H-j-dec" in our paper"). Values less or equal than
More details can be found here: https://js-mim.github.io/mss_pytorch/
1 S.I. Mimilakis, K. Drossos, T. Virtanen, and G. Schuller: A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation, in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), September 2017.
2 S.I. Mimilakis, K. Drossos, J.F. Santos, G. Schuller, T. Virtanen, and Y. Bengio: Monaural singing voice separation with skip-filtering connections and recurrent inference of time-frequency mask, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018.