Comparison Two-stage and One-stage - [YouTube]
找出物体(Region Proposals) -> 识别物体(Object Recognition)
-Models in the R-CNN family are all region-based - [R-CNN]
- First, the model proposes a set of regions of interests by select search or regional proposal network. The proposed regions are sparse as the potential bounding box candidates can be infinite.
- Then a classifier only processes the region candidates.
The other different approach skips the region proposal stage and runs detection directly over a dense sampling of possible locations. This is how a one-stage object detection algorithm works. This is faster and simpler, but might potentially drag down the performance a bit.
找出物体同时识别物体 - Detecting objects in images using a single deep neural network
-YOLO (You only look once): YOLOv1, YOLOv2, YOLOv3, Tiny YOLO - [YOLO]
-Single Shot Detector (SSD) - [SSD]
- Single convolutional network predicts the bounding boxes and the class probabilities for these boxes.
For the last couple years, many results are exclusively measured with the COCO object detection dataset. COCO dataset is harder for object detection and usually detectors achieve much lower mAP. Here are the comparison for some key detectors.
For the result presented below, the model is trained with both PASCAL VOC 2007 and 2012 data. The mAP is measured with the PASCAL VOC 2012 testing set. For SSD, the chart shows results for 300 × 300 and 512 × 512 input images. For YOLO, it has results for 288 × 288, 416 ×461 and 544 × 544 images. Higher resolution images for the same model have better mAP but slower to process.
Input image resolutions and feature extractors impact speed. Below is the highest and lowest FPS reported by the corresponding papers. Yet, the result below can be highly biased in particular they are measured at different mAP.
Comparison COCO and Pascal VOC dataset -> [Click Here]