Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer_step selection #18

Open
yxlu-0102 opened this issue Oct 20, 2023 · 5 comments
Open

Infer_step selection #18

yxlu-0102 opened this issue Oct 20, 2023 · 5 comments

Comments

@yxlu-0102
Copy link

I have been using your open-source code to perform 16k to 48k speech reconstruction. I utilized the default 8-step inference process and tested it on the untrimmed test set using your provided checkpoint.

However, I've encountered some issues with the reconstructed speech quality. Specifically, there appears to be a significant amount of noise in the high-frequency components of the reconstructed speech. The SNR I obtained is 19.472, and the LSD is 1.212. In contrast, the results in the research paper show SNR as 24.0 and LSD as 0.92.

I suspect that the issue might be related to the inadequacy of the inference steps. Therefore, I would like to understand how to better configure the infer_steps and infer_schedule to improve the quality of the reconstructed speech. Could you please provide guidance on how to adjust these parameters to get closer to the results mentioned in the research paper?

@jjunak-yun
Copy link

Hello, @yxlu-0102 !

When I executed the 8-step inference process like you did, I noticed significant noise in the high-frequency range of the reconstructed speech. How many steps of the inference process did you perform to eliminate the noise or to achieve results similar to those in the paper?

I sincerely appreciate your help and hope that your advice will lead to better results.

Have a good day :)

@yxlu-0102
Copy link
Author

Hello, @yxlu-0102 !

When I executed the 8-step inference process like you did, I noticed significant noise in the high-frequency range of the reconstructed speech. How many steps of the inference process did you perform to eliminate the noise or to achieve results similar to those in the paper?

I sincerely appreciate your help and hope that your advice will lead to better results.

Have a good day :)

Hi,

I used the checkpoint and inference codes of nu-wave2 provided by the author of the UDM+ in their repository.
The performance was much better.

@jjunak-yun
Copy link

Hello, @yxlu-0102 !

Thank you so much for your quick and valuable response. I believe that by experimenting with the checkpoint you provided, I might achieve better results. 😊

Did you still set the inference steps to 8 with the new checkpoint?

In my case, when I set the inference steps to 8 with the new checkpoint, there is noise. When I increase the inference steps to over 50, the sound quality improves, but the LSD value exceeds 2, indicating a tendency towards excessive denoising. 🥲

I'm curious to know the number of inference steps you used with the new checkpoint!

Thank you once again for your response. Have a great day. :)

@yxlu-0102
Copy link
Author

Hi @naknak-Yun,

I set the inference step to 50, and below are the results I reproduced:
截屏2024-07-11 10 27 21

@jjunak-yun
Copy link

Hi @yxlu-0102 ,

Thank you so much for your kind and quick response. Your answer has been very helpful for my research. Have a great day. 👍👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants