Blind Speech Separation with GCC-NMF: Demo

Sean UN Wood, September 2016

To play the audio examples in this notebook, please view it with the Jupyter Notebook Viewer.

Abstract

This iPython Notebook is a demo of GCC-NMF blind source separation algorithm, combining:

Non-negative matrix factorization (NMF): unsupervised dictionary learning algorithm
Generalized cross-correlation (GCC): source localization method

Separation is performed directly on the stereo mixture signal using no additional data:

An NMF dictionary is first learned from the left and right mixture spectrograms, concatenated in time.
Dictionary atoms are then attributed to individual sources over time according to their time difference of arrival (TDOA) estimates, defining a binary coefficient mask for each source.
Masked coefficients are then used to reconstruct each source independently.

This demo separates the speech sources from the data/dev1_female3_liverec_130ms_1m_mix.wav mixture, taken from the SiSEC 2016 Underdetermined speech mixtures "dev1" dataset, and saves results to the data directory.

Overview

Preliminary setup
Input mixture signal
Complex mixture spectrogram
GCC-PHAT source localization
NMF decomposition
GCC-NMF coefficient mask generation
Source spectrogram estimates reconstruction
Source signal estimates reconstruction

1. Preliminary setup

Python imports

from gccNMF.gccNMFFunctions import *
from gccNMF.gccNMFPlotting import *

from IPython import display

%matplotlib inline

Parameter definitions

# Preprocessing params
windowSize = 1024
fftSize = windowSize
hopSize = 128
windowFunction = hanning

# TDOA params
numTDOAs = 128

# NMF params
dictionarySize = 128
numIterations = 100
sparsityAlpha = 0

# Input params    
mixtureFileNamePrefix = 'data/dev1_female3_liverec_130ms_1m'
microphoneSeparationInMetres = 1.0
numSources = 3

2. Input mixture signal

Load mixture signal from the wav file

mixtureFileName = getMixtureFileName(mixtureFileNamePrefix)
stereoSamples, sampleRate = loadMixtureSignal(mixtureFileName)

numChannels, numSamples = stereoSamples.shape
durationInSeconds = numSamples / float(sampleRate)

Plot mixture signal

describeMixtureSignal(stereoSamples, sampleRate)

figure(figsize=(14, 6))
plotMixtureSignal(stereoSamples, sampleRate)

Input mixture signal:
	sampleRate: 16000 samples/sec
	numChannels: 2
	numSamples: 160000
	dtype: float32
	duration: 10.00 seconds

3. Complex mixture spectrogram

Compute complex mixture spectrograms from input signal with an STFT

complexMixtureSpectrogram = computeComplexMixtureSpectrogram( stereoSamples, windowSize,
                                                              hopSize, windowFunction ) 
numChannels, numFrequencies, numTime = complexMixtureSpectrogram.shape
frequenciesInHz = getFrequenciesInHz(sampleRate, numFrequencies)
frequenciesInkHz = frequenciesInHz / 1000.0

Plot magnitude mixture spectrograms

describeMixtureSpectrograms(windowSize, hopSize, windowFunction, complexMixtureSpectrogram)

figure(figsize=(12, 8))
plotMixtureSpectrograms(complexMixtureSpectrogram, frequenciesInkHz, durationInSeconds)

STFT:
	windowSize: 1024
	hopSize: 128
	windowFunction: <function hanning at 0x1075e8140>
	complexMixtureSpectrogram.shape = (numChannels, numFreq, numWindows): (2, 513, 1243)
	complexMixtureSpectrogram.dtype = complex64

4. Perform source localization with GCC-PHAT

Localize target TDOA indexes

spectralCoherenceV = complexMixtureSpectrogram[0] * complexMixtureSpectrogram[1].conj() \
                     / abs(complexMixtureSpectrogram[0]) / abs(complexMixtureSpectrogram[1])
angularSpectrogram = getAngularSpectrogram( spectralCoherenceV, frequenciesInHz,
                                            microphoneSeparationInMetres, numTDOAs )
meanAngularSpectrum = mean(angularSpectrogram, axis=-1) 
targetTDOAIndexes = estimateTargetTDOAIndexesFromAngularSpectrum( meanAngularSpectrum,
                                                                  microphoneSeparationInMetres,
                                                                  numTDOAs, numSources)

Plot target localization

figure(figsize=(14, 6))
plotGCCPHATLocalization( spectralCoherenceV, angularSpectrogram, meanAngularSpectrum,
                         targetTDOAIndexes, microphoneSeparationInMetres, numTDOAs,
                         durationInSeconds )

5. Perform NMF decomposition of mixture spectrograms

Compute NMF decomposition, with left and right magnitude spectrograms concatenated in time

V = concatenate( abs(complexMixtureSpectrogram), axis=-1 )
W, H = performKLNMF(V, dictionarySize, numIterations, sparsityAlpha)

numChannels = stereoSamples.shape[0]
stereoH = array( hsplit(H, numChannels) )

Plot NMF decomposition

describeNMFDecomposition(V, W, H)

figure(figsize=(12, 12))
plotNMFDecomposition(V, W, H, frequenciesInkHz, durationInSeconds, numAtomsToPlot=15)

Input V:
    V.shape = (numFreq, numWindows): (513, 2486)
    V.dtype = float32
Dictionary W:
    W.shape = (numFreq, numAtoms): (513, 128)
    W.dtype = float32
Coefficients H:
    H.shape = (numAtoms, numWindows): (128, 2486)
    H.dtype = float32

6. Compute GCC-NMF target coefficient masks

Compute NMF coefficient masks for each target

targetTDOAGCCNMFs = getTargetTDOAGCCNMFs( spectralCoherenceV, microphoneSeparationInMetres,
                                          numTDOAs, frequenciesInHz, targetTDOAIndexes, W,
                                          stereoH )
targetCoefficientMasks = getTargetCoefficientMasks(targetTDOAGCCNMFs, numSources)

Plot NMF coefficient masks for each target, and resulting masked coefficients for each channel

figure(figsize=(12, 12))
plotCoefficientMasks(targetCoefficientMasks, stereoH, durationInSeconds)

7. Reconstruct source spectrogram estimates

Reconstruct source spectrogram estimates using masked NMF coefficients for each target, and each channel

targetSpectrogramEstimates = getTargetSpectrogramEstimates( targetCoefficientMasks,
                                                            complexMixtureSpectrogram, W,
                                                            stereoH )

Plot reconstructed source estimate spectrograms

figure(figsize=(12, 12))
plotTargetSpectrogramEstimates(targetSpectrogramEstimates, durationInSeconds, frequenciesInkHz)

8. Reconstruct time domain target signal estimates

Combine source estimate spectrograms with the input mixture spectrogram's phase, and perform the inverse STFT

targetSignalEstimates = getTargetSignalEstimates( targetSpectrogramEstimates, windowSize,
                                                  hopSize, windowFunction )
saveTargetSignalEstimates(targetSignalEstimates, sampleRate, mixtureFileNamePrefix)

Plot time domain source signal estimates

for sourceIndex in xrange(numSources):
    figure(figsize=(14, 2))
    fileName = getSourceEstimateFileName(mixtureFileNamePrefix, sourceIndex)
    plotTargetSignalEstimate( targetSignalEstimates[sourceIndex], sampleRate,
                              'Source %d' % (sourceIndex+1) )

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README_files		README_files
data		data
gccNMF		gccNMF
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gcc-nmf.ipynb		gcc-nmf.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blind Speech Separation with GCC-NMF: Demo

Sean UN Wood, September 2016

Abstract

Overview

1. Preliminary setup

Python imports

Parameter definitions

2. Input mixture signal

Load mixture signal from the wav file

Plot mixture signal

3. Complex mixture spectrogram

Compute complex mixture spectrograms from input signal with an STFT

Plot magnitude mixture spectrograms

4. Perform source localization with GCC-PHAT

Localize target TDOA indexes

Plot target localization

5. Perform NMF decomposition of mixture spectrograms

Compute NMF decomposition, with left and right magnitude spectrograms concatenated in time

Plot NMF decomposition

6. Compute GCC-NMF target coefficient masks

Compute NMF coefficient masks for each target

Plot NMF coefficient masks for each target, and resulting masked coefficients for each channel

7. Reconstruct source spectrogram estimates

Reconstruct source spectrogram estimates using masked NMF coefficients for each target, and each channel

Plot reconstructed source estimate spectrograms

8. Reconstruct time domain target signal estimates

Combine source estimate spectrograms with the input mixture spectrogram's phase, and perform the inverse STFT

Plot time domain source signal estimates

About

Releases

Packages

Languages

License

lihao0214/gcc-nmf

Folders and files

Latest commit

History

Repository files navigation

Blind Speech Separation with GCC-NMF: Demo

Sean UN Wood, September 2016

Abstract

Overview

1. Preliminary setup

Python imports

Parameter definitions

2. Input mixture signal

Load mixture signal from the wav file

Plot mixture signal

3. Complex mixture spectrogram

Compute complex mixture spectrograms from input signal with an STFT

Plot magnitude mixture spectrograms

4. Perform source localization with GCC-PHAT

Localize target TDOA indexes

Plot target localization

5. Perform NMF decomposition of mixture spectrograms

Compute NMF decomposition, with left and right magnitude spectrograms concatenated in time

Plot NMF decomposition

6. Compute GCC-NMF target coefficient masks

Compute NMF coefficient masks for each target

Plot NMF coefficient masks for each target, and resulting masked coefficients for each channel

7. Reconstruct source spectrogram estimates

Reconstruct source spectrogram estimates using masked NMF coefficients for each target, and each channel

Plot reconstructed source estimate spectrograms

8. Reconstruct time domain target signal estimates

Combine source estimate spectrograms with the input mixture spectrogram's phase, and perform the inverse STFT

Plot time domain source signal estimates

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages