Skip to content

a maybe forgotten log function #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
diweiqiang opened this issue Aug 3, 2019 · 4 comments
Open

a maybe forgotten log function #3

diweiqiang opened this issue Aug 3, 2019 · 4 comments

Comments

@diweiqiang
Copy link

diweiqiang commented Aug 3, 2019

in function "learn_mine" of file MINE.ipynb with expression"loss = -(torch.mean(t) - (1/ma_et.mean()).detach()*torch.mean(et))", do you forget a torch.log in torch.mean(et) or not, sorry if it is as it is

@yassouali
Copy link

yassouali commented Aug 9, 2019

I have a similar question about the loss calculation, we have:

et = torch.mean(torch.exp(M(z_bar, x_tilde)))
M.ma_et += ma_rate * (et.detach().item() - M.ma_et)
mutual_information = torch.mean(M(z, x_tilde)) - torch.log(et) * et.detach() /M.ma_et

What I don't understand is the addition of et.detach() /M.ma_et where we divide the current error et by the moving average, it is the same as the correcting the bias from the stochastic gradients
A
step in the paper,

thanks.

@yrchen92
Copy link

I have also noticed this similar question. I am confused about the item of et.detach() /M.ma_et and it will be very kind of you to help me understand it

@GarfieldF
Copy link

Maybe I can reply the questions above.I think "loss = -(torch.mean(t) - (1/ma_et.mean()).detach()*torch.mean(et))"in the MINE.ipynb is equal to
"loss= -(torch.mean(M(z, x_tilde))- torch.log(et) * et.detach() /M.ma_et)" in the GAN_MINE.ipynb when its derivative acts as the gradient for updating MINE.
when MI acts as the measure of mutual information, the expression is "mi = torch.mean(M(z, x_tilde)) - torch.log(torch.mean(torch.exp(M(z_bar, x_tilde)))+1e-8)"which doesn't need unbiased estimation.

@HobbitLong
Copy link

Anyone still looks at this? @diweiqiang @yassouali @yrchen92

I think, from the perspective of gradients:

loss = -(torch.mean(t) - torch.mean(et) / ma_et.detach())

and

loss = -(torch.mean(t) - torch.log(torch.mean(et)) * (et.mean().detach()) / ma_et.detach())

should deliver the same gradients. Both have considered the correction of the biased gradients. But the loss values are different. The first one is not a lower bound of MI, while the second one is.

Pleas correct me if I am wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants