Do not get the maximum of MOS value using two same audio under speech mode #89

Ximoo123 · 2023-03-13T04:56:38Z

Hi,Thanks to the good job！
When I running in the speech mode with two same audio sampled at 16KHz, the MOS values of many results are around 4.4-4.6, and it did not reach the maximum value of 5.0. However, the NSIM score and similarity of all audio segments are 1.0. Is this a normal phenomenon?
I got these results using the SVR model you provided："lattice_tcditugenmeetpackhref_ls2_nl60_lr12_bs2048_learn.005_ep2400_train1_7_raw.tflite"

mchinen · 2023-03-31T00:07:58Z

Hi, thanks for the question. Yes, this is expected. If you run a subjective test with the same audio, you will not see 5.0. It depends on the content and raters, but the typical score for ground truth clean wideband audio is 4.5 to 4.75. There is a flag called --use_unscaled_speech_mos_mapping which allowed scaling to 5.0 when set to false, but I think this is has been depricated with recent models (we should open a bug for that).

rsanchezpizani · 2023-03-31T17:02:51Z

I would say 4.6-4.7 is actually in agreement with ITUs standard. Getting a 5 for a single score from 1 person is possible. Getting a MOS score of 5 is not normal. I am right in thinking that the score assume that this is the MOS score and it is equivalent to run the test with many people? If that is the case a score of 4.6-4.8 is the maximum. if you interview 1000 people and you get a MOS of 5 then it is likely that the comparison/experiment is wrong. So I think a value lower than 5 is correct. Rodrigo Sanchez-Pizani Sent from Pixel XL Please accept apologies for brevity and spelling

…

On Fri, 31 Mar 2023, 01:08 Michael Chinen, ***@***.***> wrote: Hi, thanks for the question. Yes, this is expected. If you run a subjective test with the same audio, you will not see 5.0. It depends on the content and raters, but the typical score for ground truth clean wideband audio is 4.5 to 4.75. There is a flag called --use_unscaled_speech_mos_mapping which allowed scaling to 5.0 when set to false, but I think this is has been depricated with recent models (we should open a bug for that). — Reply to this email directly, view it on GitHub <#89 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJI6NJXTFNTBX6TB66UFKRDW6YN6VANCNFSM6AAAAAAVYSVN3M> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not get the maximum of MOS value using two same audio under speech mode #89

Do not get the maximum of MOS value using two same audio under speech mode #89

Ximoo123 commented Mar 13, 2023

mchinen commented Mar 31, 2023

rsanchezpizani commented Mar 31, 2023 via email

Do not get the maximum of MOS value using two same audio under speech mode #89

Do not get the maximum of MOS value using two same audio under speech mode #89

Comments

Ximoo123 commented Mar 13, 2023

mchinen commented Mar 31, 2023

rsanchezpizani commented Mar 31, 2023 via email