- The project involves using a dataset of 100 high-resolution retinal fundus images for blood vessel segmentation to aid in early detection of retinal pathologies.
- Implemented U-Net architecture from scratch, known for its efficiency in semantic segmentation with limited data.
- Hyperparameter tuning is applied using parallelism with multiple GPUs to efficiently handle the computational demands.
- The final model achieved 86% IoU score on validation set and 70% on test set.
- Data augmentation techniques were applied to the training dataset to enhance model robustness and generalization by artificially expanding the diversity of training samples.
- Going modular with a series of different python scripts for more reproducibility using pytorch lightning for simplified codebase, scalability, and advanced features.
- This dataset contains a comprehensive collection of retinal fundus images, meticulously annotated for blood vessel segmentation. Accurate segmentation of blood vessels is a critical task in ophthalmology as it aids in the early detection and management of various retinal pathologies, such as diabetic retinopathy & glaucoma. The dataset comprises a total of 100 high-resolution retinal fundus images captured using state-of-the-art imaging equipment. Each image comes with corresponding pixel-level ground truth annotations indicating the exact location of blood vessels. These annotations facilitate the development and evaluation of advanced segmentation algorithms.
- The dataset comprises a total of 100 retinal fundus images divided into 80 train images & 20 test images.
- The 80 train images are divided into 60 images for training & 20 images for validation.
- Dataset link on Kaggle
U-Net is widely used in semantic segmentation because it excels at capturing fine-grained details and spatial context, thanks to its encoder-decoder architecture with skip connections. This design enables precise boundary delineation and efficient training even with a limited amount of labeled data. Moreover, U-Net's ability to preserve spatial information throughout the network significantly improves segmentation accuracy.
- Encoder (contracting path)
- Bottleneck
- Decoder (expansive path)
- Skip Connections
- Extract features from input images.
- Repeated 3x3 conv (valid conv) + ReLU layers.
- 2x2 max pooling to downsample (reduce spatial dimensions).
- Double channels with after the max pooling.
- Pivotal role in bridging the encoder and decoder.
- Capture the most abstract and high-level features from the input image.
- Serves as a feature-rich layer that condenses the spatial dimensions while preserving the semantic information.
- Enable the decoder to reconstruct the output image with high fidelity.
- The large number of channels in the bottleneck: Balance the loss of spatial information due to down-sampling by enriching the feature space.
- Repeated 3x3 conv (valid conv) + ReLU layers.
- Upsample using transpose convolution.
- Halves channels after transpose convolution.
- Successive blocks in decoder: Series of gradual upsampling operations & gradual refinement helps in generating a high-quality segmentation map with accurate boundaries.
- Preservation of Spatial Information because during the downsampling process, spatial information can be lost.
- Combining Low-level and High-level Features.
- Gradient Flow Improvement.
- Better Localization.
- Cropping is used in U-Net skip connections primarily due to the following reasons:
- Size Mismatch: ensures that the sizes are compatible for concatenation.
- Aligning the central regions: which contain more reliable information.
- The final layer of the U-Net decoder typically has several filters equal to the number of classes, producing an output feature map for each class.
- The final layer of the U-Net can be a 1x1 convolution to map the feature maps to the desired number of output classes for segmentation.
- If there are C classes, the output will be of shape (H _ W _ C).
- Interpolation methods like bilinear or nearest-neighbor interpolation can be used at the final layer to adjust the output dimensions to match the input. This ensures that each pixel in the input image has a corresponding label in the output segmentation map.
- The softmax function is applied to each pixel location across all the channels
- The choice of loss function is crucial for training a U-Net model for blood vessel segmentation.
- The Binary Cross-Entropy (BCE) loss is commonly used for binary segmentation tasks, such as blood vessel segmentation.
- BCE loss is well-suited for pixel-wise classification problems where each pixel is classified as either a blood vessel or background.
- IoU (Intersection over Union):
- Measures the overlap between the predicted segmentation and the ground truth.
- IoU is calculated as the ratio of the intersection area to the union area of the predicted and ground truth segmentation masks.
- A higher IoU indicates better segmentation accuracy.
Hyperparameters are passed using JSON file to the training script.
Job id | epochs | batch_size | learning_rate | val_IoU | val_loss | test_IoU | test_loss |
---|---|---|---|---|---|---|---|
17855 | 100 | 4 | 1e-04 | 0.6163 | 0.1475 | - | - |
17857 | 100 | 16 | 1e-04 | 0.4087 | 0.2136 | - | - |
17931 | 200 | 4 | 1e-04 | 0.6783 | 0.1251 | 0.6779 | 0.125 |
17932 | 200 | 4 | 5e-05 | 0.6466 | 0.1361 | - | - |
17939 | 200 | 4 | 1e-04 | 0.6204 | 0.1457 | - | - |
17941 | 300 | 4 | 1e-04 | 0.6126 | 0.1551 | - | - |
17942 | 300 | 4 | 5e-05 | 0.5701 | 0.1618 | - | - |
18049 | 400 | 4 | 1e-04 | 0.7242 | 0.0961 | 0.6827 | 0.1307 |
18808 | 1000 | 4 | 1e-04 | 0.8609 | 0.0454 | 0.701 | 0.1881 |
Experiments link on Comet
Number of GPUs used in the training is 4 GPUs
The best hyperparameters for my training after multiple experiments are:
- Learning Rate: 0.0001
- Optimizer: Adam
- Batch Size: 4
- Epochs: 1000
At epoch 992 the model has the best performance with:
- IoU score = 0.8609
- validation loss = 0.0454
The model is saved to disk for future use.