Missing activation for the time embedding inside ResidualBlock for DDPM? #165

EliasNehme · 2023-01-30T10:08:50Z

In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation:

Line 130 in b1f5c8e

h += self.time_emb(t)[:, :, None, None]

However, the positionally encoded time embedding is already the result of a linear layer:

Line 80 in b1f5c8e

emb = self.lin2(emb)

Hence, both these layers collapse to a single linear layer with no non-linear mapping per residual block.

In the original tensorflow implementation by the author, the time embedding is first passed through a nonlinearity and only then through a linear layer:
https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L49

vpj · 2023-02-02T08:23:05Z

Yes you are correct, we have missed activation layer

vpj self-assigned this Feb 2, 2023

vpj added the bug Something isn't working label Feb 2, 2023

This comment was marked as abuse.

Sign in to view

Provide feedback