You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,Thanks to the good job!
When I running in the speech mode with two same audio sampled at 16KHz, the MOS values of many results are around 4.4-4.6, and it did not reach the maximum value of 5.0. However, the NSIM score and similarity of all audio segments are 1.0. Is this a normal phenomenon?
I got these results using the SVR model you provided:"lattice_tcditugenmeetpackhref_ls2_nl60_lr12_bs2048_learn.005_ep2400_train1_7_raw.tflite"
The text was updated successfully, but these errors were encountered:
Hi, thanks for the question. Yes, this is expected. If you run a subjective test with the same audio, you will not see 5.0. It depends on the content and raters, but the typical score for ground truth clean wideband audio is 4.5 to 4.75. There is a flag called --use_unscaled_speech_mos_mapping which allowed scaling to 5.0 when set to false, but I think this is has been depricated with recent models (we should open a bug for that).
I would say 4.6-4.7 is actually in agreement with ITUs standard. Getting a
5 for a single score from 1 person is possible. Getting a MOS score of 5 is
not normal. I am right in thinking that the score assume that this is the
MOS score and it is equivalent to run the test with many people? If that is
the case a score of 4.6-4.8 is the maximum. if you interview 1000 people
and you get a MOS of 5 then it is likely that the comparison/experiment is
wrong.
So I think a value lower than 5 is correct.
Rodrigo Sanchez-Pizani
Sent from Pixel XL
Please accept apologies for brevity and spelling
On Fri, 31 Mar 2023, 01:08 Michael Chinen, ***@***.***> wrote:
Hi, thanks for the question. Yes, this is expected. If you run a
subjective test with the same audio, you will not see 5.0. It depends on
the content and raters, but the typical score for ground truth clean
wideband audio is 4.5 to 4.75. There is a flag called
--use_unscaled_speech_mos_mapping which allowed scaling to 5.0 when set
to false, but I think this is has been depricated with recent models (we
should open a bug for that).
—
Reply to this email directly, view it on GitHub
<#89 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJI6NJXTFNTBX6TB66UFKRDW6YN6VANCNFSM6AAAAAAVYSVN3M>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
Hi,Thanks to the good job!
When I running in the speech mode with two same audio sampled at 16KHz, the MOS values of many results are around 4.4-4.6, and it did not reach the maximum value of 5.0. However, the NSIM score and similarity of all audio segments are 1.0. Is this a normal phenomenon?
I got these results using the SVR model you provided:"lattice_tcditugenmeetpackhref_ls2_nl60_lr12_bs2048_learn.005_ep2400_train1_7_raw.tflite"
The text was updated successfully, but these errors were encountered: