Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the prior distribution #6

Open
yuffon opened this issue Jul 23, 2019 · 3 comments
Open

about the prior distribution #6

yuffon opened this issue Jul 23, 2019 · 3 comments

Comments

@yuffon
Copy link

yuffon commented Jul 23, 2019

In the top layer, the prior distribution here use h=conv(0) + embedding as the mean and std in the case of 'ycond=True'.
It seems that the conv layer is unnecessary.

@naturomics
Copy link
Owner

naturomics commented Jul 23, 2019

In the top prior layer, the mean and logs are shared in spacial dimension in case of non-conditioning (ycond=False), meaning (mean, logs) = tensor(1, 1, 1, 2n). In the implementation, we set (mean, logs)=bias of that conv layer, and let the conv(0) broadcasts the bias to shape (batch_size, height, width, 2n). Nothing more than that. So you can replace it with (mean, logs)= tf.get_variable([1,1,1, 2*n]) and tf.tile() to get the right shape.

So, it's just a programming trick.

@yuffon
Copy link
Author

yuffon commented Jul 24, 2019

I see that the code uses
rescale = tf.get_variable("rescale", [], initializer=tf.constant_initializer(1.))
scale_shift = tf.get_variable("scale_shift", [], initializer=tf.constant_initializer(0.))
logsd = tf.tanh(logsd) * rescale + scale_shift

for mean and logstd.
Is that necessary?

@naturomics
Copy link
Owner

It's for training stability, see the experiments section in our paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants