We provide a collection of classification and detection models pre-trained on the ImageNet dataset and the COCO dataset. In the table below, we summarized each such pre-trained model including:
- a model name.
- model input size.
- model speed: we report frame per second (fps) evaluated on our 520 and 720 hardwares.
- model size.
- model performance on the ImageNet validation set and COCO validation set.
Model | Input Size | FPS on 520 | FPS on 720 | Model Size | Rank 1 Accuracy | Rank 5 Accuracy |
---|---|---|---|---|---|---|
mobilenetv2 | 224x224 | 58.9418 | 620.677 | 14M | 69.82% | 89.29% |
resnet18 | 224x224 | 20.4376 | 141.371 | 46.9M | 66.46% | 87.09% |
resnet50 | 224x224 | 6.32576 | 49.0828 | 102.9M | 72.80% | 90.91% |
FP_classifier | 56x32 | 323.471 | 3370.47 | 5.1M | 94.13% | - |
mobilenetv2, resnet18 and resnet50 are models pre-trained on ImageNet classification dataset. FP_classifier is a model pre-trained on our own dataset for classifying person and background images.
Resnet50 is currently under training for Kneron preprocessing.
Backbone | Input Size | FPS on 520 | FPS on 720 | Model Size | mAP |
---|---|---|---|---|---|
YOLOv5s (no upsample) | 640x640 | 4.91429 | - | 13.1M | 40.4% |
YOLOv5s (with upsample) | 640x640 | - | 24.4114 | 14.6M | 50.9% |
FCOS (darknet53s backbone) | 416x416 | 7.27369 | 48.8437 | 33.9M | 44.8% |