YOLOv5 Edge TPU Performance Evaluation #9104
paradigmn
started this conversation in
Show and tell
Replies: 1 comment
-
@paradigmn very cool, thank you for your contributions! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
we recently released our paper Efficient Edge Deployment Demonstrated on YOLOv5 and Coral Edge TPU at the International Workshop on Edge Artificial Intelligence for Industrial Applications (EAI4IA). We like to share some insights of our research in this discussion which could be relevant for deployment and future improvements. All our measurements were based on the optimizations proposed and merged in #6808. We used a Raspberry Pi 4B with an edge TPU accelerator for our tests. Inference speed was measured with the Google benchmark model tool, excluding pre- and post processing. The mean average precision was determined by utilizing the pycocotools with the COCO evaluation dataset.
Speed-Accuracy Comparison
We wanted to compare YOLOv5 object detection performance in regard to speed and accuracy with other model solutions provided by the Google Coral Model Zoo. Different model- and input-sizes were evaluated to determine the optimal configuration for deployment.
In the figure one can see the USB3 performance for the different models. YOLOv5 can outperform its competitors with most of the tested configurations. Only the nano model is not performing well. It has the same architecture as the small variant with significantly reduced number of weights. This introduces an accuracy penalty but no performance gain, as the TPU was not working to capacity. Generally, the accelerator benefits from parallelism. Hence, wider models with less depth perform better. Furthermore, reducing input size can significantly improve inference speed while only slightly impacting accuracy. A good general TPU model would be YOLOv5 with an input of 320px. If speed is the deciding criteria, SSDLite MobileDet would be the preferred solution.
USB Speed Comparison
Low power embedded devices often do not offer USB3 or PCIe as an interface. Hence, we decided to evaluate the impact of utilizing a USB2 port for the dongle.
As one can see, a slowdown by a factor of roughly three is caused by the slower interface. Ideally, the entire model is mapped to the accelerator and all operations are performed by the coprocessor. For the YOLOv5 models however, a full mapping was not possible. This results in additional data transfers of data tensors between CPU and TPU.
We hope this short synopsis of our paper was of help and we are open to questions and suggestions.
Sincerely,
paradigm
Beta Was this translation helpful? Give feedback.
All reactions