Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对cRM没有通过sigmoid将数值压缩到0-1? #28

Open
zehaoj opened this issue Jun 10, 2021 · 3 comments
Open

对cRM没有通过sigmoid将数值压缩到0-1? #28

zehaoj opened this issue Jun 10, 2021 · 3 comments

Comments

@zehaoj
Copy link

zehaoj commented Jun 10, 2021

您好,首先感谢您对论文的复现。我在阅读文章中发现作者提到:

Real and imaginary parts of the complex mask will typically lie between -1 and 1, however, we use sigmoidal compression to bound these complex mask values between 0 and 1.

而我看您的代码中并没有这部分,有tanh compression但没有sigmoid讲cRM值压缩到0-1范围。是您发现这样效果不好嘛?还是另有原因呢?多谢

@JusperLee
Copy link
Owner

我对这进行测试发现,tanh和sigmoid差不多,而且tanh更容易收敛

@zehaoj
Copy link
Author

zehaoj commented Jun 11, 2021

多谢解答,但是tanh操作后会导致复数部分(及cRM的第二层)数据更分散导致训练效果不好?我看了训练出来的cRM,实数部分训练很好,而复数部分的效果就差了很多。请问您有什么方法解决此类问题嘛?我尝试了改loss给复数层加更多权重但还是不太行

@JusperLee
Copy link
Owner

多谢解答,但是tanh操作后会导致复数部分(及cRM的第二层)数据更分散导致训练效果不好?我看了训练出来的cRM,实数部分训练很好,而复数部分的效果就差了很多。请问您有什么方法解决此类问题嘛?我尝试了改loss给复数层加更多权重但还是不太行

相位谱本身就不是很好去估计,你可以看看纯语音分离任务的复数网络,或者是语音增强的复数网络是怎么设计的。我有一个思路是可以将loss定义为时域的sisnr,然后stft和istft是可导的,因此可以通过幅度谱和相位谱直接估计时域语音。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants