Gerard Roma, Owen Green, Pierre Alexandre Tremblay University of Huddersfield [email protected]
- is_blind: no
- additional_training_data: no
This system employs a Convolutional Neural Network with fully-connected output layers. The input of the network is a slice of 11 STFT frames (about 200ms). The output is a binary mask corresponding to one spectral frame. We trained the network by optimizing the negative log likelihood loss from a 2D softmax output layer. The target vector was encoded with class labels corresponding to the source with highest magnitude for each time-frequency bin.
- G. Roma, O. Green, P.A. Tremblay, Improving single-network single-channel separation of musical audio with convolutional layers. Proceedings of LVA/ICA, 2018