You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quick question, when I am running visqol I am not seeing any gpu usage. Should I be? Perhaps my bazel did not installed correctly or the version of TF being used is not utilizing the gpu. I am running over thousands of files and it's taking quite some time...
Also just wanted to check what the best settings are for evaluating noise suppression using visqol? I see the two flags
--use_speech_mode --use_unscaled_speech_mos_mapping, if I use this might it ignore some bands of noise that may be present in the file (I see its sensitive up to 8kHz)? Should I run visqol in audio mode and speech mode and average the two (perhaps a weighted avg)?
Thanks for your guidance in advance.
The text was updated successfully, but these errors were encountered:
Hi, thanks for the question! ViSQOL does have a TFLite model, but it runs on CPU and is not the main bottleneck. Even in batch mode, it evaluates the list of files serially. This could be improved.
I don't recommend averaging the two modes, because they are quite different in scale. We don't yet have support for greater than wideband speech, and it's a limitation. For noise suppression, ViSQOL will require the clean reference, which isn't always available. If you're looking for a no-reference model specifically for noise suppression, I'd recommend DNSMOS.
Hi, thanks for the quick response. I actually have access to the clean speech as well as the noisy speech, so I can use a reference metric. I will look into DNSMOS as well. Right now, I am using PESQ (sample referential), as well as Fréchet Audio Distance (reference-free or dataset referential).
Assume a model $X = S + N$. $S$ is speech and $N$ is noise.
I feed $X$ into a noise suppressor and get $\hat{S}$
So we have $X$ and $\hat{S},$ if we do ViSQOL( $S,X$ ) under speech settings might it actually ignore certain frequencies where noise occurs in $X$ (since it's only sensitive up to 8khz)? Same with ViSQOL( $S,\hat{S}$ )
Right now, I am getting the following results averaged over 10k samples.
Just wondering what the recommended settings for is using ViSQOL in my task. Perhaps they are both inciteful in different ways. If so maybe, you can help me understand the intuition/meaning of the results.
Hi thanks for the repo.
Quick question, when I am running visqol I am not seeing any gpu usage. Should I be? Perhaps my bazel did not installed correctly or the version of TF being used is not utilizing the gpu. I am running over thousands of files and it's taking quite some time...
Also just wanted to check what the best settings are for evaluating noise suppression using visqol? I see the two flags
--use_speech_mode --use_unscaled_speech_mos_mapping, if I use this might it ignore some bands of noise that may be present in the file (I see its sensitive up to 8kHz)? Should I run visqol in audio mode and speech mode and average the two (perhaps a weighted avg)?
Thanks for your guidance in advance.
The text was updated successfully, but these errors were encountered: