A short review and a barebones implementation of diffusion models in pytorch based on my understanding. Diffusion models are generative models, ie they try to imitate a distribution
Take
The diffusion process is a Markov chain that consists of a sequence of steps of the form
With
Taking it a step further, suppose
Then, given
Noting that
Using
A short deviation to a relevant property that is not that obvious at first sight. Given a markov chain
The future is independent of the past given the present but it also goes the other direction. The past is independent of the future given the present.
Proof by induction on k
By the markov propety
And so
The markov property indicates that the future is independent of the past given the present. However, it is not necessarily independent of a further future even when knowing the present.
This property might be easier to grasp with a simple example. Take the markov chain
With dynamics given by
$A_1 \sim N(0,I) $ $A_2 = A_1$ $A_3|A_2 \sim N(0,I)$ $A_4 = A_3$
That is, this markov chain copies the random variable from step 1 to step 2 and from step 3 to step 4. Step 3 is independent of what happened before.
Clearly,
The same reasoning, by symmetry, applies to the backward direction
So, in general, when random variables in a markov chain are given the "boundary conditions" for some past and some future, they are independent of what happens before or after that boundary
The markov chain gives us a tuple of random variables
$X_0 \sim u $ $X_T \sim N(0,I)$
A cool way to see this is that the diffusion process gradually transforms the original distribution into the normal distribution.
Now, to get the distribution we are interested in, we could take advantage that
Sample XT from Normal(0,I)
for t in T-1...0
Sample Xt from p(Xt|X(t+1))
Output X0
However, while the forward distributions
The end goal is to have a model that can approximate the backward dynamics
The strategy can be seen as a VAE with a "dumb" encoder actually (non-parametrized)
Let's forget about that for now. Suppose we had some backward markov model with parameters
We'd like our backward markov model to learn to generate the samples. That is
Our proxy for learning this, as usual, is going to be the log-likelihood. Using the ELBO we have
We can expand both the true markov chain and our parametrized one using the (backward) markov property
-
$p_\theta(x_0, X_{1:T}) = p_\theta(x_0|X_1) ( \prod_{t=1:T-1} p_\theta(X_t|X_{t+1}) ) p_\theta(X_T)$ -
$p(X_{1:T}|x_0)= ( \prod{t=1:T-1} p(X_t|X_{t+1}, x_0) ) p(X_T|x_0)$
Note that the products start at
Then we use the log to change the product into a sum
Note that, we always have that
Add
Let's redo the development of the last section but using
$$p(X_{1:T}|X_0)= \frac{p(X_{0:T})} { p(X_0)} = \prod_{t=0:T-1} p(X_t|X_{t+1}) \frac{p(X_T)}{ p(X_0)}$$
Which leads to
So we are truly learning the markov model backward dynamics!
! commented because I did not expanded this
!## The
Suppose we effectively train the model. How are we going to actually sample it?
Well, it's a markov model so once we have a trained model, as long as
A bit odd thing to notice is that
Why are there square roots in there?
A helpful insight is that this preserves variance.
Assuming
If we where to remove the square root and simply do
We would get a variance
Which would vary as you advance in the markov chain. It would have less variance between the extremes. How exactly? It depends on the specific schedule we use.
Samples generated with our trained model. They are OK!
Here is a video of a diffusion process
sample.mp4
You can also do conditional generation. For simple image conditional generation you can simply add some extra info somewhere in the neural network. In this example, we add 3 extra channels to the input with some low-resolution version of the denoised image. This trick is also a good idea when you want training at high resolution is not feasible due to compute limitations. We can train a large model for a lower resolution and then a smaller model for a higher resolution (64x64 and 128x128 respectively this case)
Below our current trained example (its pretty bad right now)
Run the script with the command python src/train.py
.
The script is structured to initialize the model with specified hyperparameters. You can define your model's hyperparameters in the model_hyperparameters
dictionary. This includes settings such as the number of blocks, channels, and timesteps for the diffusion process.
The model consists of a simple U-Net with residual connections. Contrary to popular implementations, we feed the variance of the diffusion step directly to the model instead of the timestep itself.
Optional Model Loading. If you wish to continue training from a pretrained model you can do so using python src/train.py resume <path.pth>
with the appropiate unconditional model path
TensorBoard logging losses. This allows for real-time monitoring of the model's performance. To view these logs, run tensorboard --logdir=runs
from your terminal.
Image Saving at specified intervals, the script saves generated images to a designated directory. Ensure to set the sample_every
variable to control how frequently images are saved during training.
We provide several pretrained models for use. See python src/inference.py
and python src/utils.py
for example usages.
Name | Task | Number of Parameters | Description | Model |
---|---|---|---|---|
Celeb1 | Unconditional generation | ~122 million | Trained on CelebA for unconditional generation. | celeb1 |
Celeb2 | Unconditional generation | ~122 million | Further training of Celeb1. | celeb2 |
PathMinst | Unconditional generation | ~122 million | Style transfer of Celeb2 into a medical dataset. | medmnist |
Super | Super-resolution | ~30 million | 64x64 to 128x128 CelebA super-resolution. | super |
- Denoising Diffusion Probabilistic Models. Jonathan Ho, Ajay Jain, Pieter Abbeel. Paper
- https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
- Denoising Diffusion-based Generative Modeling: Foundations and Applications. Karsten Kreis Ruiqi Gao Arash Vahdat. Link. This is a really good one
- Step by Step visual introduction to Diffusion Models.Kemal Erdem. Link. Our architecture is inspired in this one (but with important simplifications/differences)