Pytorch based FasterRCNN for custom dataset with explanation
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
Tutorial / Credits / Source : source || Github
In case of any confusion, please raise a query.
- Update
config class
inmain.py
(assign indices to your custom classes) tools.py
has all the bounding boxes/anchor box related thingsdataset.py
is manipulation bounding box with respect to various transformationsdebug.py
has debugging codes (some of them migh not work, while I was testing various functions)plot.py
has the code to visualize anchor boxes and bounding boxesloss.py
has loss functionsutiils.py
are basic functions IOU calculatins, saving models loading models etc.model.py
is the collections of 2 simple models (most important manipulation of Faster RCNN comes fromtools.py
).
Run someting like
python main.py --pretrained --data-format='bg_first' --save-evaluations --
- Create 2 json files for each set (train, validation, test) (I have only test and train set so 4 json files)
Converting dict to json GUIDE - Naming convention :
TRAIN_images.json
,TRAIN_objects.json
,TEST_images.json
,TEST_objects.json
TRAIN_images.json
contains path to various images, e.g. : ["../class_pics/IMG_0.jpg", "../class_pics/IMG_1.jpg" ... ]TRAIN_objects.json
contains list of dictionaries, i-th dictionary corresponding to i-th image in theTRAIN_images.json
.
Each dictinoary contains keys :
boxes : list of all bounding boxes: [xmin, ymin, xmax, ymax])
labels : list of labels corresponding to each bounding box
- Please note background class is the last index, that means you indexing of classes should with 0
- Sample
TEST_images.json
,TEST_objects.json
have been uploaded - I have removed difficulty tag, don't need
- just changes the desired labels in config class in main.py according to your new label set.
In case of error, raise an issue :)
Sample training image, after training the model for 10 epcohs for a dataset of 17 images with imbalanced bounding boxes
For each point on output layer of Resnet / base model (known as anchor point : 2048 , 20, 10 ==> (20 x 10) anchor points) is the center for an anchor box on the original image. Each point has 3 bounind boxes (by default) and each of these bounding boxes comes in to different aspect ratio (3 by default) (keeping the area same, see first diagram on how aspect ratio works). Hence 9 anchor boxes are possible corresponding to each anchor points. Each of these anchor boxes (9 * 20 * 10 ) are compared with golden bounding boxes or ground truth boxes, result is either positive : (overlap with golden bounding box > 0.7) or negative (overlap with ground truth box < 0.3) or neutral (0.3 < iou < 0.7).
Each of these anchor boxes are given label:
- +1 for positive anchor box,
- -1 for negative anchor box,
- 0 for neutral (ambigious) anchor box
For each ground truth bounding box, highest iou anchor box is kept , along with tx, ty, th, tw (see figure below)
num_anchors_for_bbox
stores the number of positive anchor boxes associated with that golden bounding boxes.
default_anchors
is the set of (xmin, ymin, width , height ) of all possible anchor boxes on the downscaled output feature space of base model (e.g. for 800x600 image => downscaled output feature space is 50 x 38 ). all_possible_anchor_boxes[:,i,j, k] and all_possible_anchor_boxes[:,m,j, k] will have the same xmin, width and height but different y_min. Similarly all_possible_anchor_boxes[:,i,j, k] and all_possible_anchor_boxes[:,i,m, k] will have same ymin, height and width.
Model predicts : reg_k : (tx, ty, tw, th) : given all the possible anchor boxes, one can estimate the model's predicted anchor boxes (model can't predict coordinates of anchor boxes apprently).
For every model rpn predictions (tx, ty, tw, th) converted to most probable bounding (anchor) boxes, closest GT anchor box is associated. For the anchor box to be qualified anchor box, only if iou is greater than 0.1. If Iou is > 0.5 (positive anchor box ), If 0.1 < iou < 0.5 negative bounding box and less than <0.1 is ambigious (ignored). For positive boxes [tx, ty, tw, th] is calculated (used for training the mdoel). It returns
- X : Positive/Negative anchor boxes (n , 4)
- Y1 : Label of the ground truth of anchor box associated withe qualified bounding boxes (n , # of classes)
- Y2 : concatenation of the labels : 1 if positive else 0 (n,4 * (# of class - 1) ) labels per class and 4 times because of the 4 values (tx, ty, tw, th) associated with the anchor boxes : (n,4 * (# of class - 1)).
where n is the qualified number of anchor boxes.
It returns 3 tensors : base_x , cls_k , reg_k.
base_x is the output of base layer (resnet in my case), the vector representation of the image.
cls_k is the model's prediction corresponding to each anchor box (9 in my case), where the prediction being the probability of anchor being positive anchor (trained via cross entorpy loss). Positive GT anchor box : +1, Negative anchor box= 0. Neutral boxes are ignored.
reg_k is the regression values corresponding to positive GT anchor boxes. (trained via smooth L1 loss)
For every qualified anchor boxes from model_rpn (in calc_iou function),
- Model is tasked to predict the class inside that qualified anchor box.
- Model is tasked to predict the (tx, ty, tw, th), the values of deviation of the anchor box from the actual GT anchor box.
- requirement.txt
- Save temp file for data loading
- parallelize the code (calc_rpn function) ??
- Remove log from tw and th to check if performance changes??