Skip to content

Latest commit

 

History

History
38 lines (24 loc) · 1.39 KB

description.md

File metadata and controls

38 lines (24 loc) · 1.39 KB

TAU1

Naoya Takahashi¹, Stefan Uhlich², Franck Giron², Michael Enenkl², Thomas Kemp², Nabarun Goswami³, Yuki Mitsufuji¹

¹Sony Corporation, Audio Technology Development Department, Tokyo, Japan
²Sony European Technology Center (EuTEC), Stuttgart, Germany
³Sony India Software Center, Bangalore, India

Naoya.Takahashi [at] sony.com

Additional Info

  • is_blind: no
  • additional_training_data: yes

Supplemental Material

  • Code: not available
  • Demos: not available

Method

This submission blends two systems as described in [1]. The first system is TAK3 (MMDenseNets[2]) and the second system is UHL3 (LSTM). We linearly blend the raw outputs of the two systems using

$$\hat{s}i(n) = 0.5 * ( \hat{s}{i, 1}(n) + \hat{s}_{i, 2}(n)$$

where $i = {B, D, O, V}$. From these estimates, we then compute the power spectral densities and spatial covariance matrices of the multichannel Wiener filter(MWF) [3].

References

  1. S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi and Y. Mitsufuji: Improving music source separation based on deep neural networks through data augmentation and network blending, Proc. ICASSP, 2017
  2. N. Takahashi and Y. Mitsufuji: Multi-scale multi-band DenseNets for audio source separation, Proc. WASPAA, 2017
  3. A. A. Nugraha, A. Liutkus, and E. Vincent. "Multichannel music separation with deep neural networks." EUSIPCO, 2016.