Tutorial on generative diffusion models

A generative model is a type of machine learning model that can generate new data that is similar to the data it has been trained on ¹². This tutorial will give you a short introduction to diffusion models, which are a specific type of generative model.

A diffusion model is based on the idea of incrementally inverting a forward process of adding noise to data. In this forward process, the input data $x(0)$ is gradually transformed into a standard Gaussian distribution $x(T)$ by scaling and adding Gaussian noise at every time step $t \in \mathcal{N} \subset [0, T]$. Specifically, the forward process can be defined as:

$$ q(x_t | x_0) = \mathcal{N}(x_t; \alpha_t x_0, \sigma_t^2 I) $$

Overview

We will use denoising-diffusion-pytorch which is a diffusion model library written by lucidrains. For training we will use lightning.

Quickstart

First, clone and cd into the repository:

git clone https://github.com/weigertlab/diffusion_model_tutorial.git
cd diffusion_model_tutorial

If using BARD, simply open the notebook regularly in VSCode and choose the kernel diffusion.

Otherwise all the setup, including environment creation as well as data downloading, can be done by simply running this command:

source setup.sh

Example data

Generative models are typically trained on a dataset of (unlabeled) images. In this tutorial we will use two example datasets, showing images of

flywing membrane, or
dual color zebrafish retina.

Please choose one of the two datasets to train your own models. If you want to train a model on your own data, you can skip these steps. Note that the 2D images are given as a npz file, which is a compressed numpy array. For your custom data, you could as well use a folder with tiff files.

1. Flywing membrane³

2. Dual color zebrafish retina⁴

Training your own diffusion model

Please see diffusion.ipynb to see how to train a diffusion model on one of the two datasets.

References

Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851. ↩
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv preprint arXiv:2011.13456, 2020. ↩
Prakash, M., Buchholz, T.-O., Schmidt, D., Krull, A., & Jug, F. (2020). Flywing (noise 0) dataset for microscopy image denoising and segmentation benchmark as used in DenoiSeg paper. Zenodo dataset ↩
Martin Weigert, Uwe Schmidt, Tobias Boothe, Andreas Müller, Alexandr Dibrov, Akanksha Jain, Benjamin Wilhelm, Deborah Schmidt, Coleman Broaddus, Sian Culley, Mauricio Rocha-Martins, Fabián Segovia-Miranda, Caren Norden, Ricardo Henriques, Marino Zerial, Michele Solimena, Jochen Rink, Pavel Tomancak, Loic Royer, Florian Jug, Eugene W. Myers. Content Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy. Dataset ↩

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
figs		figs
README.md		README.md
diffusion.ipynb		diffusion.ipynb
requirements.txt		requirements.txt
setup.sh		setup.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tutorial on generative diffusion models

Overview

Quickstart

Example data

1. Flywing membrane³

2. Dual color zebrafish retina⁴

Training your own diffusion model

References

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

weigertlab/diffusion_model_tutorial

Folders and files

Latest commit

History

Repository files navigation

Tutorial on generative diffusion models

Overview

Quickstart

Example data

1. Flywing membrane3

2. Dual color zebrafish retina4

Training your own diffusion model

References

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

1. Flywing membrane³

2. Dual color zebrafish retina⁴

Packages