Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance when using for speech enhancement #33

Open
jkzhang7 opened this issue Aug 11, 2020 · 9 comments
Open

Bad performance when using for speech enhancement #33

jkzhang7 opened this issue Aug 11, 2020 · 9 comments

Comments

@jkzhang7
Copy link

jkzhang7 commented Aug 11, 2020

Hi, very nice work. I noticed that some people are using Conv-TasNet for speech enhancement and get good results. While I encountered some problem while using this code for speech enhancement... I am trying to split clean speech and noise from a noisy speech. I am using VCTK dataset. The waveform of the results seem very weird...

GetImage

When I changed the activation of mask to sigmoid, the result is still not good.

GetImage (1)

I wonder anyone has a thought how to solve this problem. Thanks in advance!

@Andong-Li-speech
Copy link

It seems to be caused by the choice of loss function, i.e., SI-SDR. SI-SDR does not restrict the magnitude of waveform, which may cause the the chopping effect. I think you can replace SI-SDR loss with other loss options like SNR or wave-L_1.

@jkzhang7
Copy link
Author

jkzhang7 commented Aug 18, 2020

@Andong-Li-speech Hi, thanks for your suggestions! While the result seems still not very good after changing the loss function to SNR loss... But it works much better! I wonder if you are also working on this part, what kind of loss function are you using? Thanks a lot in advance!

@LittleFlyingSheep
Copy link

@jkzhang7 Hi, do you get a better performance? I face the same problem now. Best wishes to you!

@forestlee95
Copy link

@LittleFlyingSheep Hi~ Did you solved this problem now? seem to meet the same problem , the magnitude of separate waveform is too big and sounds not very well, thanks a lot if you could give me some advice~

@LittleFlyingSheep
Copy link

LittleFlyingSheep commented Jun 8, 2021

@forestlee95 One way I choose to solve it is to scale the waveform artificially. I choose the max value of the input noisy and divide it with the output. This method will get a relatively good performance. This is just my helpless action. If you have any other methods, please letter me.

@sewichou
Copy link

sewichou commented Mar 22, 2022

@LittleFlyingSheep @jkzhang7 Hi, I am looking for the speech enhancement performance of conv-tasnet on vctk dataset, do you guys have any performance data about it? Much appreciated.

@LittleFlyingSheep
Copy link

LittleFlyingSheep commented Mar 22, 2022 via email

@yyd19948
Copy link

Hi, very nice work. I noticed that some people are using Conv-TasNet for speech enhancement and get good results. While I encountered some problem while using this code for speech enhancement... I am trying to split clean speech and noise from a noisy speech. I am using VCTK dataset. The waveform of the results seem very weird...

GetImage

When I changed the activation of mask to sigmoid, the result is still not good.

GetImage (1)

I wonder anyone has a thought how to solve this problem. Thanks in advance!
How did you solve it?i meet the same bug while testing

@LittleFlyingSheep
Copy link

LittleFlyingSheep commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants