-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about the ABX error rate #9
Comments
Could you please specify the version of the xcodec model? |
Thank you for your reply. I test with the model named xcodec_hubert_librispeech |
Maybe you can try the continuous representation here xcodec/models/soundstream_semantic.py Line 114 in 60cf204
|
Thank you for your reply! I have tested the XCodec model with o_semnatic representation and got ABX error rate 4.4% and 5.5%, which is still a little different from the result reported in your paper. (3.3% and 4.3%) When I extracted the o_semnatic representation with SoundStream.forward method, I got the error "e_acoustic and e_semantic have different shape in dim2" at https://github.com/zhenye234/xcodec/blob/main/models/soundstream_semantic.py#L102. Thus, I added the pad operation the same as in the encode method. Although I don't think this is the cause of the inconsistent results, I don't make any other changes to the source code. Do you have any other suggestions? Thanks for your reply again. |
Thanks for your amazing work.
I evaluate the released xcodec model on LibriSpeech test-clean set using ABX error rate metric. I perform the evaluation with the continuous representations before RVQ and after RVQ, but get the result 9.9% and 13.2% for within ABX and cross ABX respectively, which are much higher than those reported in the paper. However, I get the consistent results 3.6 and 4.7 for SpeechTokenzier in the same way.
Could you please give me some suggestions? Thank you so much!
The text was updated successfully, but these errors were encountered: