Skip to content

SepehrNoey/Landscape-Image-Generation-Using-Diffusion-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Landscape Image Generation Using Diffusion Model

This project implements a diffusion model based on the architecture proposed in the DDPM paper. The model is designed to generate 32x32 landscape images using the UNet architecture, with self-attention, upsampling, and downsampling blocks implemented.

Sample Generated Images

Final Results

Here are some landscape images generated by the model after training. Notice that each single generated image is a 32*32 image, and they are placed together in the following images.

Sampling 1 Sampling 2
3 4

Generated Images in Different Epochs

The flow of learning to generate realistic images of landscapes can be easily seen in the following table:

Epoch # Generated Images
Epoch 1 0 (2)
Epoch 50 0 (3)
Epoch 100 0 (5)
Epoch 150 0 (4)
Epoch 300 0 (6)
Epoch 500 0 (7)

Implementation Details

1- The Diffusion class: This class is a wrapper around the Unet class and implements the forward and reverse process in diffusion models (adding noise and denoising) which at last generates new images.

  • Noise Steps: 1000 steps of noise addition
  • Noise Schedule: Linear schedule for beta values between 1e-4 and 0.02
  • Image Size: 32x32 input and output
  • Sampling: Images are sampled from random noise from normal distribution and denoised through reverse diffusion

2- UNet Architecture: The UNet class is the concrete model that learns to predict the noise in images. It includes several blocks like downsampling, upsampling, and self-attention blocks to capture both local and global features in the images.

  • DoubleConv: Two convolutional layers with GroupNorm and GELU activation, used throughout the network.
  • Down: Downsampling block that reduces the spatial resolution while increasing the feature maps.
  • Up: Upsampling block to restore the spatial resolution.
  • SelfAttention: Responsible for implementing attention mechanism in the model.

Training Details

The diffusion model was trained on a landscape image dataset using the following configurations:

  • Learning Rate: 3e-4
  • Optimizer: AdamW
  • Loss Function: Mean Squared Error (MSE)
  • Number of Epochs: 500
  • Batch Size: 24
  • Input Shape: 32 x 32 (RGB)
  • Output Shape: 32 x 32 (RGB)
  • Beta Noise Schedule: Linear schedule between 1e-4 and 0.02 for the forward process of adding noise during diffusion.
  • Model Architecture: UNet + Self-Attentions layers

Hardware and Time

  • Hardware: The model was trained on a Kaggle notebook with a P100 GPU.
  • Training Time: Around 13 hours

Dataset

  • Dataset Used: The dataset consisted around 4200 landscape images, resized to 32x32 pixels due to limited resource and time for training. The dataset can be found here.

Trainin loss

The training loss over around 90000 iterations can be seen in the following plot:

training_loss

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published