We used A Curated List of Image Deblurring Datasets from kaggle, which is "a list of popular image deblurring datasets". We experimented only with a subset of those datasets, due to an enormous amount of images in some of them. Nonetheless, aside from smaller amount of training data, our model most likely didn't suffer a lot from this issue, because almost all of those datasets used kernels proposed by Shent et al. 2018.
Exemplative pairs of sharp and blurred images, respectively:
The image deblurring problem is a fundamental challenge in computer vision and image processing, involving the restoration of sharp images from their blurred counterparts. This task is complicated by the highly divergent and unpredictable nature of blur, which can stem from a variety of sources, such as motion, camera shake, or defocused lenses.
Blurring is typically modeled as the convolution of a sharp image with a blur kernel, which encapsulates the nature and extent of the degradation. However, in real-world scenarios, the blur kernels often vary significantly across different images—or even within the same image—making the deblurring process inherently ill-posed. Motion blur, for example, introduces additional complexity as it depends on the trajectory and speed of movement, leading to non-uniform and dynamic degradation patterns.
To address this problem, deblurring approaches commonly leverage pairs of sharp and blurred images. These pairs provide crucial information for learning the intricate relationships between the two states, enabling the development of models capable of predicting sharp reconstructions. Despite advancements, the diversity of blur kernels and the unpredictable characteristics of real-world blur remain significant hurdles, pushing the boundaries of research in this field.
This architecture is designed for the challenging task of image deblurring, combining key features from two influential models: U-Net and ResNet. It leverages an encoder-decoder structure inspired by U-Net and incorporates residual blocks for efficient gradient flow and feature refinement.
- Encoder-Decoder Structure: The architecture uses a hierarchical feature learning approach with downsampling (via max pooling) and upsampling (via transposed convolutions).
- Transposed Convolutions: These layers are used during the decoding phase to restore the spatial resolution of the image, making the model adept at reconstructing sharp images from blurred inputs.
- Deep Feature Learning: Residual blocks form the core of the architecture, allowing the model to learn complex features without gradient vanishing issues.
- Feature Reuse: Skip connections in residual blocks enable efficient feature propagation, critical for recovering fine image details.
| Name | Type | # Parameters | Purpose |
|---|---|---|---|
conv2d |
Conv2D | 448 | Initial feature extraction |
conv2d_1 |
Conv2D | 11,008 | Deeper feature extraction |
max_pooling2d |
MaxPooling2D | 0 | Downsampling for hierarchical learning |
conv2d_2 |
Conv2D | 92,288 | Learning higher-level features |
max_pooling2d_1 |
MaxPooling2D | 0 | Further downsampling |
residual_block |
ResidualBlock | 333,184 | Complex feature learning (ResNet-inspired) |
residual_block_1 |
ResidualBlock | 312,704 | Additional feature refinement |
conv2d_transpose |
Conv2DTranspose | 147,584 | Upsampling (U-Net-inspired) |
conv2d_transpose_1 |
Conv2DTranspose | 73,792 | Final upsampling |
conv2d_9 |
Conv2D | 1,731 | Output refinement |
- U-Net Architecture: The encoder-decoder framework with transposed convolutions enables effective deblurring by processing features hierarchically and reconstructing images with high fidelity.
- Residual Blocks: Inspired by ResNet, these blocks ensure robust learning of both local details and global context, critical for handling diverse and unpredictable blur patterns.
This hybrid architecture effectively combines the strengths of U-Net and ResNet, making it highly suited for the intricate task of image deblurring.
Size in memory: 14.1 MB Number of parameters: 972739
To run the training, the following commands are required:
python run.py --model_path --dataset_dir --dataset_name
When it comes to used metrics, we chose a number of loss functions and optimizers that we found reasonable, and tested them via grid search (comparing the result of each combination of those 2). It turned out that the best results were achieved by RMSprop optimizer and MAE loss function. From other metrics MSE looked quite promising. Ultimately we tested the final model with each of those metrics, both achieving satifactory results
Furthermore, in the way of iterative development, we introduced more advanced loss functions, merging them at the end. Here is a quick description:
-
Sobel Loss: we apply the sobel filter to both x and y dimensions and take the absolute difference between y and
$\hat{y}$ , -
Fourier Loss: in a similar fashion to its predecessor, we compute the difference between fourier transform of y and
$\hat{y}$ .
The aforementioned losses provide an intuitive way to counteract the blurring effect. Sobel filter puts emphasis on edges, which is crucial in reconstructing fuzzy images, in particular motion blur. Fourier loss works in a similar spirit, although its not exactly the same. Thorugh analyzing the frequency dimension we can assess not only edges but also the specific distribution of frequencies in the ground truth image.
- Perceptual Loss:
Perceptual loss functions are designed to capture perceptual differences between images, such as content and style discrepancies, which are not always evident at the pixel level. They are often employed in tasks where the goal is to generate images that are visually pleasing to humans, such as in neural style transfer, super-resolution, and image synthesis.
The core idea behind perceptual loss is to use the feature maps from various layers of a CNN, which has been pre-trained on a large dataset like ImageNet. By extracting these feature maps from both the target image and the generated image, we can compute the difference in the high-level features that the network has learned to detect, such as edges, textures, and patterns.
Source: Perceptual Loss Functions by DeepAI
We monitored our model's training process via Weight & Biases. Comprehensive plots and other insightful metrics are contained in the following report.
Hyperparameters were chosen experimentally, via comparison between results of different combinations of learning rates with rho parameter values (RMSprop optimizer's parameter). The best hyperparameter combination turned out to be as follows: Learning Rate: 0.001 Rho: 0.9
List of libraries and tools is available in the requirements.txt file
Runtime environment: Kaggle Notebook GPU count: 2 GPU type: Tesla T4
Training duration: 2h 13m 17s Inference time: 28ms
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Fourier Transform-Based U-Shaped Network for Single Image Motion Deblrring







