Skip to content

Related Work

myzwisc edited this page May 5, 2018 · 2 revisions

CNNs have achieved significant breakthrough on image classification and object detection. Nevertheless, the sliding window approach still needs to apply CNN on many different sliding windows and it is still a repetition of performing image classification on local regions; as a result, it is extremely computationally expensive with repetitive computations of CNNs.

In order to handle the expensive computation problem in CNNs, people tried to find a way to reduce the candidate locations of the sliding window, instead of performing CNN computation many times on every sliding window location. As a result, region proposal method was developed to find potential regions that have a high possibility containing objects [1], through which the number of potential regions is reduced compared with the sliding window approach. One of the most significant breakthrough on object detection is Faster R-CNN [6]. Faster R-CNN performes Regions of Interest (RoI) pooling and makes the CNN to do the Region Proposal, which inserts the Region Proposal Network (RPN) as part of the layers in the CNN model to predict the possibility of objectiveness in the region.

Some detection methods were introduced without the Region Proposal method in order to achieve smaller training and testing time. YOLO [5] and SSD [4] are two detection methods without Region Proposals. The input image goes directly to one big convolutional neural network. Inside the network, the input image at first is divided into many grid cells, and the classification scores and the bounding box coordinates and scales are determined on each grid cell. And the overall object classes and bounding boxes are calculated based on the results obtained from each grid cell. These two approaches further reduce the training and test time time but the accuracy is compromised compared with the method using Region Proposals [3]. The region proposal method has a reduce on the training and testing time compared with the sliding window techniques and helps increase the detection accuracy; in fact, faster R-CNN achieves the highest accuracy compared with other methods. In terms of training and testing time, SSD is significantly faster than other methods since it gets rid of the Region Proposal method, but with a cost of reduced accuracy compared with those with Region Proposal.

Also, we plan to try improving current state-of-art method or developing a new method to achieve real-time face detection and recognition. Finally, we will select the best method and model based on the evaluation of the trade-off between accuracy and speed of detection and recognition.

Clone this wiki locally