Assignment 2 for Computer Vision & Deep Learning course at Innopolis University. README contains a very short version of the report which you can by downloading A2_report.pdf
It's highly advised to view Jupyter Notebooks from Colab because GitHub does not display some outputs.
YOLOv4-tiny | YOLOv5-tiny | MaskRCNN
parse_dataset.py is a script responsible for convertation of supervisely format to the format accepted by MaskRCNN.
This project is aimed to detect recycling codes: PP (5), PAP (20-22), ALU (41). To achive that, I use three models: YOLOv4, YOLOv5, and MaskRCNN. You can find all Jupyter notebooks and resulting folders at this Google Drive folder.
I took photos of whatever I found at home. You can see examples below:
Since I didn't have much of aluminium stuff, the dataset is a bit unbalanced. The statistics is the following:
Class name | Images count | Objects count |
---|---|---|
PAP | 31 | 32 |
POL | 36 | 37 |
ALU | 29 | 29 |
The numbers in Objects Count column is different from the numbers in Objects Count column because some images contain more than 1 class.
I used supervisely to create polygon-annotations for the images. I present the examples of annotated objects below.
I used roboflow for data augmentation for YOLOs.
Preprocessing
- Auto-Orient: Applied
Augmentations
- Outputs per training example: 3
- Rotation: Between -45° and +45°
- Shear: ±20° Horizontal, ±20° Vertical
- Hue: Between -180° and +180°
- Saturation: Between -50% and +50%
- Brightness: Between -30% and +30%
- Blur: Up to 2.75px
I present the examples of augmentated images below.
I used supervisely to augment the dataset and increase its size. I added augmentations by using supervisely's DTL language to write a config and run the job. The result of this is a new dataset that has 234 images.
Augmentations:
- Resize: 700x700, keep aspect ratio
- Rotate: Between -180° and +180°
- Gaussian Blur: sigma between 0.5 and 2
- Contrast: between 0.5 and 2
- Brightness: between -50 and 50
- Random Color
I got the following statistics for this dataset:
Class name | Images count | Objects count |
---|---|---|
PAP | 93 | 96 |
POL | 108 | 111 |
ALU | 87 | 87 |
Finally, to get the required data representation, I've written my own python script parse_dataset.py that takes a path to dataset and creates two .json files: one for training and one for validation - by default train.json and valid.json respectively.
I used tiny-config and Darknet data format to train YOLOv4. As described before, I used Roboflow to convert dataset in Supervisely format to dataset in Darknet format.
- YOLOv4 trained for 6000 iterations
- Last accuracy: 79.66%; best accuracy: 83.51%
- The model saw 360000 images
CLass Name | Average Precision, % | True Positive (TP) | False Positive (FP) |
---|---|---|---|
ALU | 70.24 | 4 | 0 |
PAP | 67.26 | 5 | 3 |
POL | 100.00 | 8 | 0 |
As I wrote before, POL class had the highest number of images and PP recyling code is always associated with 5, whereas PAP and ALU codes could be associated with multiple numbers. Probably that is why AP for POL class is 100%.
I used tiny-config and pytorch-yolo data format to train YOLOv5. As described before, I used Roboflow to convert dataset in Supervisely format to dataset in yolov5 format.
- YOLOv5 trained for 1750 epochs (early stopping due to small improvement over the last 1000 epochs);
- Last precision: 92%; Last recall: 45%; Last mAP: 48%;
- Reached 1400th epoch after 2hr 16mins - very slow!
First and second images do not have any predictions at all. Third and forth have correct predictions.
I also trained a YOLO without pretrained weights, below you can see a comparison. Both results are frustrating.
YOLOv5 type | Precision | Recall | mAP |
---|---|---|---|
YOLOv5 | 86% | 47% | 29% |
Pretrained YOLOv5 | 92% | 45% | 48% |
These few lines contain a dense and most important information about configuration for the model:
- I use pretrained weights from
COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml
. - Initial learning rate is 0.0002
- Number of iterations: 4000.
- Decrease learning rate by 0.5 at iterations: 2800, 3600.
- Images size is 700x700.
- Finished after 58 minutes - very fast!
I have already described the way I augmented and transformed the data for MaskRCNN and visualized it, so just enjoy the statistics and results! :)
Evaluation results for segmentation:
AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|
89.765 | 100.000 | 100.000 | 85.050 | 90.040 | 94.175 |
Per-category segm AP:
Class Name | AP |
---|---|
PAP | 90.644 |
POL | 88.651 |
ALU | 90.000 |
It is very interesting that here POL does not have the heighest score, as we saw that it has a little bit more smaples than other classes.
I am very satisfied with the results! Maybe the augmentation from Supervisely played a role here, but...
- The training of MaskRCNN was faster
- The results are awesome
- The model is confident
- The masks are ultra fitting