Landscape Image Generation Using Diffusion Model

This project implements a diffusion model based on the architecture proposed in the DDPM paper. The model is designed to generate 32x32 landscape images using the UNet architecture, with self-attention, upsampling, and downsampling blocks implemented.

Sample Generated Images

Final Results

Here are some landscape images generated by the model after training. Notice that each single generated image is a 32*32 image, and they are placed together in the following images.

Sampling 1	Sampling 2

Generated Images in Different Epochs

The flow of learning to generate realistic images of landscapes can be easily seen in the following table:

Epoch #	Generated Images
Epoch 1
Epoch 50
Epoch 100
Epoch 150
Epoch 300
Epoch 500

Implementation Details

1- The Diffusion class: This class is a wrapper around the Unet class and implements the forward and reverse process in diffusion models (adding noise and denoising) which at last generates new images.

Noise Steps: 1000 steps of noise addition
Noise Schedule: Linear schedule for beta values between 1e-4 and 0.02
Image Size: 32x32 input and output
Sampling: Images are sampled from random noise from normal distribution and denoised through reverse diffusion

2- UNet Architecture: The UNet class is the concrete model that learns to predict the noise in images. It includes several blocks like downsampling, upsampling, and self-attention blocks to capture both local and global features in the images.

DoubleConv: Two convolutional layers with GroupNorm and GELU activation, used throughout the network.
Down: Downsampling block that reduces the spatial resolution while increasing the feature maps.
Up: Upsampling block to restore the spatial resolution.
SelfAttention: Responsible for implementing attention mechanism in the model.

Training Details

The diffusion model was trained on a landscape image dataset using the following configurations:

Learning Rate: 3e-4
Optimizer: AdamW
Loss Function: Mean Squared Error (MSE)
Number of Epochs: 500
Batch Size: 24
Input Shape: 32 x 32 (RGB)
Output Shape: 32 x 32 (RGB)
Beta Noise Schedule: Linear schedule between 1e-4 and 0.02 for the forward process of adding noise during diffusion.
Model Architecture: UNet + Self-Attentions layers

Hardware and Time

Hardware: The model was trained on a Kaggle notebook with a P100 GPU.
Training Time: Around 13 hours

Dataset

Dataset Used: The dataset consisted around 4200 landscape images, resized to 32x32 pixels due to limited resource and time for training. The dataset can be found here.

Trainin loss

The training loss over around 90000 iterations can be seen in the following plot:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
landscape-image-generation.ipynb		landscape-image-generation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Landscape Image Generation Using Diffusion Model

Sample Generated Images

Final Results

Generated Images in Different Epochs

Implementation Details

Training Details

Hardware and Time

Dataset

Trainin loss

About

Releases

Packages

Languages

SepehrNoey/Landscape-Image-Generation-Using-Diffusion-Models

Folders and files

Latest commit

History

Repository files navigation

Landscape Image Generation Using Diffusion Model

Sample Generated Images

Final Results

Generated Images in Different Epochs

Implementation Details

Training Details

Hardware and Time

Dataset

Trainin loss

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages