Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing activation for the time embedding inside ResidualBlock for DDPM? #165

Open
EliasNehme opened this issue Jan 30, 2023 · 3 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@EliasNehme
Copy link

In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation:


However, the positionally encoded time embedding is already the result of a linear layer:

Hence, both these layers collapse to a single linear layer with no non-linear mapping per residual block.

In the original tensorflow implementation by the author, the time embedding is first passed through a nonlinearity and only then through a linear layer:
https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L49

@vpj
Copy link
Member

vpj commented Feb 2, 2023

Yes you are correct, we have missed activation layer

@vpj vpj self-assigned this Feb 2, 2023
@vpj vpj added the bug Something isn't working label Feb 2, 2023
@Siimarras

This comment was marked as abuse.

@Siimarras

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants