Does visqol use gpu? Best settings for evaluating noise supression? #80

opooladz · 2022-12-10T16:09:24Z

Hi thanks for the repo.

Quick question, when I am running visqol I am not seeing any gpu usage. Should I be? Perhaps my bazel did not installed correctly or the version of TF being used is not utilizing the gpu. I am running over thousands of files and it's taking quite some time...

Also just wanted to check what the best settings are for evaluating noise suppression using visqol? I see the two flags
--use_speech_mode --use_unscaled_speech_mos_mapping, if I use this might it ignore some bands of noise that may be present in the file (I see its sensitive up to 8kHz)? Should I run visqol in audio mode and speech mode and average the two (perhaps a weighted avg)?

Thanks for your guidance in advance.

mchinen · 2022-12-12T23:16:46Z

Hi, thanks for the question! ViSQOL does have a TFLite model, but it runs on CPU and is not the main bottleneck. Even in batch mode, it evaluates the list of files serially. This could be improved.

I don't recommend averaging the two modes, because they are quite different in scale. We don't yet have support for greater than wideband speech, and it's a limitation. For noise suppression, ViSQOL will require the clean reference, which isn't always available. If you're looking for a no-reference model specifically for noise suppression, I'd recommend DNSMOS.

opooladz · 2022-12-16T20:34:15Z

Hi, thanks for the quick response. I actually have access to the clean speech as well as the noisy speech, so I can use a reference metric. I will look into DNSMOS as well. Right now, I am using PESQ (sample referential), as well as Fréchet Audio Distance (reference-free or dataset referential).

Assume a model $X = S + N$. $S$ is speech and $N$ is noise.

I feed $X$ into a noise suppressor and get $\hat{S}$
So we have $X$ and $\hat{S},$ if we do ViSQOL( $S,X$ ) under speech settings might it actually ignore certain frequencies where noise occurs in $X$ (since it's only sensitive up to 8khz)? Same with ViSQOL( $S,\hat{S}$ )

Right now, I am getting the following results averaged over 10k samples.

Audio Settings:
ViSQOL( $S,X$ ) = 3.1
ViSQOL( $S,\hat{S}$ ) = 3.7

Speech Settings:
ViSQOL( $S,X$ ) = 1.2
ViSQOL( $S,\hat{S}$ ) = 1.9

Just wondering what the recommended settings for is using ViSQOL in my task. Perhaps they are both inciteful in different ways. If so maybe, you can help me understand the intuition/meaning of the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does visqol use gpu? Best settings for evaluating noise supression? #80

Does visqol use gpu? Best settings for evaluating noise supression? #80

opooladz commented Dec 10, 2022

mchinen commented Dec 12, 2022

opooladz commented Dec 16, 2022

Does visqol use gpu? Best settings for evaluating noise supression? #80

Does visqol use gpu? Best settings for evaluating noise supression? #80

Comments

opooladz commented Dec 10, 2022

mchinen commented Dec 12, 2022

opooladz commented Dec 16, 2022