Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not get the maximum of MOS value using two same audio under speech mode #89

Open
Ximoo123 opened this issue Mar 13, 2023 · 2 comments

Comments

@Ximoo123
Copy link

Hi,Thanks to the good job!
When I running in the speech mode with two same audio sampled at 16KHz, the MOS values of many results are around 4.4-4.6, and it did not reach the maximum value of 5.0. However, the NSIM score and similarity of all audio segments are 1.0. Is this a normal phenomenon?
I got these results using the SVR model you provided:"lattice_tcditugenmeetpackhref_ls2_nl60_lr12_bs2048_learn.005_ep2400_train1_7_raw.tflite"

@mchinen
Copy link
Collaborator

mchinen commented Mar 31, 2023

Hi, thanks for the question. Yes, this is expected. If you run a subjective test with the same audio, you will not see 5.0. It depends on the content and raters, but the typical score for ground truth clean wideband audio is 4.5 to 4.75. There is a flag called --use_unscaled_speech_mos_mapping which allowed scaling to 5.0 when set to false, but I think this is has been depricated with recent models (we should open a bug for that).

@rsanchezpizani
Copy link

rsanchezpizani commented Mar 31, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants