Skip to content

addobosz/image-deblurring

Repository files navigation

Image Deblurring

Authors: addobosz, Wector1

Description of the data set

We used A Curated List of Image Deblurring Datasets from kaggle, which is "a list of popular image deblurring datasets". We experimented only with a subset of those datasets, due to an enormous amount of images in some of them. Nonetheless, aside from smaller amount of training data, our model most likely didn't suffer a lot from this issue, because almost all of those datasets used kernels proposed by Shent et al. 2018.

Exemplative pairs of sharp and blurred images, respectively:

Image 1 Image 2
Image 3 Image 4
Image 3 Image 4

Description of the problem

The image deblurring problem is a fundamental challenge in computer vision and image processing, involving the restoration of sharp images from their blurred counterparts. This task is complicated by the highly divergent and unpredictable nature of blur, which can stem from a variety of sources, such as motion, camera shake, or defocused lenses.

Blurring is typically modeled as the convolution of a sharp image with a blur kernel, which encapsulates the nature and extent of the degradation. However, in real-world scenarios, the blur kernels often vary significantly across different images—or even within the same image—making the deblurring process inherently ill-posed. Motion blur, for example, introduces additional complexity as it depends on the trajectory and speed of movement, leading to non-uniform and dynamic degradation patterns.

To address this problem, deblurring approaches commonly leverage pairs of sharp and blurred images. These pairs provide crucial information for learning the intricate relationships between the two states, enabling the development of models capable of predicting sharp reconstructions. Despite advancements, the diversity of blur kernels and the unpredictable characteristics of real-world blur remain significant hurdles, pushing the boundaries of research in this field.

Description of used architectures

The final architecture

Architecture Layers

Image Deblurring Architecture

Overview

This architecture is designed for the challenging task of image deblurring, combining key features from two influential models: U-Net and ResNet. It leverages an encoder-decoder structure inspired by U-Net and incorporates residual blocks for efficient gradient flow and feature refinement.


Key Features

1. U-Net Inspiration
  • Encoder-Decoder Structure: The architecture uses a hierarchical feature learning approach with downsampling (via max pooling) and upsampling (via transposed convolutions).
  • Transposed Convolutions: These layers are used during the decoding phase to restore the spatial resolution of the image, making the model adept at reconstructing sharp images from blurred inputs.
2. Residual Blocks
  • Deep Feature Learning: Residual blocks form the core of the architecture, allowing the model to learn complex features without gradient vanishing issues.
  • Feature Reuse: Skip connections in residual blocks enable efficient feature propagation, critical for recovering fine image details.

Layer Breakdown

Name Type # Parameters Purpose
conv2d Conv2D 448 Initial feature extraction
conv2d_1 Conv2D 11,008 Deeper feature extraction
max_pooling2d MaxPooling2D 0 Downsampling for hierarchical learning
conv2d_2 Conv2D 92,288 Learning higher-level features
max_pooling2d_1 MaxPooling2D 0 Further downsampling
residual_block ResidualBlock 333,184 Complex feature learning (ResNet-inspired)
residual_block_1 ResidualBlock 312,704 Additional feature refinement
conv2d_transpose Conv2DTranspose 147,584 Upsampling (U-Net-inspired)
conv2d_transpose_1 Conv2DTranspose 73,792 Final upsampling
conv2d_9 Conv2D 1,731 Output refinement

Highlights

  • U-Net Architecture: The encoder-decoder framework with transposed convolutions enables effective deblurring by processing features hierarchically and reconstructing images with high fidelity.
  • Residual Blocks: Inspired by ResNet, these blocks ensure robust learning of both local details and global context, critical for handling diverse and unpredictable blur patterns.

This hybrid architecture effectively combines the strengths of U-Net and ResNet, making it highly suited for the intricate task of image deblurring.

Model analysis: size in memory, number of parameters

Size in memory: 14.1 MB Number of parameters: 972739

Description of the training and the required commands to run it

To run the training, the following commands are required:

python run.py --model_path --dataset_dir --dataset_name

Description of used metrics, loss, and evaluation

When it comes to used metrics, we chose a number of loss functions and optimizers that we found reasonable, and tested them via grid search (comparing the result of each combination of those 2). It turned out that the best results were achieved by RMSprop optimizer and MAE loss function. From other metrics MSE looked quite promising. Ultimately we tested the final model with each of those metrics, both achieving satifactory results

Furthermore, in the way of iterative development, we introduced more advanced loss functions, merging them at the end. Here is a quick description:

  • Sobel Loss: we apply the sobel filter to both x and y dimensions and take the absolute difference between y and $\hat{y}$,
  • Fourier Loss: in a similar fashion to its predecessor, we compute the difference between fourier transform of y and $\hat{y}$.

The aforementioned losses provide an intuitive way to counteract the blurring effect. Sobel filter puts emphasis on edges, which is crucial in reconstructing fuzzy images, in particular motion blur. Fourier loss works in a similar spirit, although its not exactly the same. Thorugh analyzing the frequency dimension we can assess not only edges but also the specific distribution of frequencies in the ground truth image.

  • Perceptual Loss:

Perceptual loss functions are designed to capture perceptual differences between images, such as content and style discrepancies, which are not always evident at the pixel level. They are often employed in tasks where the goal is to generate images that are visually pleasing to humans, such as in neural style transfer, super-resolution, and image synthesis.

The core idea behind perceptual loss is to use the feature maps from various layers of a CNN, which has been pre-trained on a large dataset like ImageNet. By extracting these feature maps from both the target image and the generated image, we can compute the difference in the high-level features that the network has learned to detect, such as edges, textures, and patterns.

Source: Perceptual Loss Functions by DeepAI

Plots: training and validation loss, metrics

We monitored our model's training process via Weight & Biases. Comprehensive plots and other insightful metrics are contained in the following report.

Used hyperparameters

Hyperparameters were chosen experimentally, via comparison between results of different combinations of learning rates with rho parameter values (RMSprop optimizer's parameter). The best hyperparameter combination turned out to be as follows: Learning Rate: 0.001 Rho: 0.9

List of libraries and tools is available in the requirements.txt file

Description of the runtime environment

Runtime environment: Kaggle Notebook GPU count: 2 GPU type: Tesla T4

Training and inference time

Training duration: 2h 13m 17s Inference time: 28ms

Preparation of a bibliography

Inspiration and Scientific works:

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Perceptual Loss Deep AI

Fourier Transform-Based U-Shaped Network for Single Image Motion Deblrring

Dataset

A Curated List of Image Deblurring Datasets

Helen

CelebA

Link to Git

Link to github repository

About

We designed an architecture targeted at the image deblurring task. Proposed a tailored loss function, optimised hyperparameters, trained and compared with different models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors