You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw that you calculates the gradient for log(sigma)*2/2048.0, I guess that it's for numerical stability but I'm not sure. In my implementation I directly calculate the gradient of the varience since it's directly in the paper, I didn't well test my code so I'm not sure if anything will break.
The text was updated successfully, but these errors were encountered:
I see the problem, the sum of square of a vector do not equal the square of sum, so the trick solve the problem or the problem actually still presisits. hope to hear from you and I'll do some math first.
Ok, After I apply the chain rule, I see that this trick didn't solve the sum of square problem. So you experienced underflow and this trick solved it?
okay, It's hard to get gradient for each sample on autograd format. I saw that many other paper on HME recognition also mentioned that they use weight noise. Do you think they also use this version of 'weight noise' that is in fact knid of different from original one. I don't feel square of sum is a good approximation for sum of square.
I saw that you calculates the gradient for log(sigma)*2/2048.0, I guess that it's for numerical stability but I'm not sure. In my implementation I directly calculate the gradient of the varience since it's directly in the paper, I didn't well test my code so I'm not sure if anything will break.
The text was updated successfully, but these errors were encountered: