YOLOv5 Edge TPU Performance Evaluation #9104

paradigmn · 2022-08-23T11:58:07Z

paradigmn
Aug 23, 2022

Hello everyone,

we recently released our paper Efficient Edge Deployment Demonstrated on YOLOv5 and Coral Edge TPU at the International Workshop on Edge Artificial Intelligence for Industrial Applications (EAI4IA). We like to share some insights of our research in this discussion which could be relevant for deployment and future improvements. All our measurements were based on the optimizations proposed and merged in #6808. We used a Raspberry Pi 4B with an edge TPU accelerator for our tests. Inference speed was measured with the Google benchmark model tool, excluding pre- and post processing. The mean average precision was determined by utilizing the pycocotools with the COCO evaluation dataset.

Speed-Accuracy Comparison

We wanted to compare YOLOv5 object detection performance in regard to speed and accuracy with other model solutions provided by the Google Coral Model Zoo. Different model- and input-sizes were evaluated to determine the optimal configuration for deployment.

In the figure one can see the USB3 performance for the different models. YOLOv5 can outperform its competitors with most of the tested configurations. Only the nano model is not performing well. It has the same architecture as the small variant with significantly reduced number of weights. This introduces an accuracy penalty but no performance gain, as the TPU was not working to capacity. Generally, the accelerator benefits from parallelism. Hence, wider models with less depth perform better. Furthermore, reducing input size can significantly improve inference speed while only slightly impacting accuracy. A good general TPU model would be YOLOv5 with an input of 320px. If speed is the deciding criteria, SSDLite MobileDet would be the preferred solution.

USB Speed Comparison

Low power embedded devices often do not offer USB3 or PCIe as an interface. Hence, we decided to evaluate the impact of utilizing a USB2 port for the dongle.

As one can see, a slowdown by a factor of roughly three is caused by the slower interface. Ideally, the entire model is mapped to the accelerator and all operations are performed by the coprocessor. For the YOLOv5 models however, a full mapping was not possible. This results in additional data transfers of data tensors between CPU and TPU.

We hope this short synopsis of our paper was of help and we are open to questions and suggestions.

Sincerely,
paradigm

glenn-jocher · 2022-08-30T13:53:14Z

glenn-jocher
Aug 30, 2022
Maintainer

@paradigmn very cool, thank you for your contributions!

@AyushExel

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YOLOv5 Edge TPU Performance Evaluation #9104

{{title}}

Replies: 1 comment

{{title}}

Select a reply

YOLOv5 Edge TPU Performance Evaluation #9104

paradigmn Aug 23, 2022

Speed-Accuracy Comparison

USB Speed Comparison

Replies: 1 comment

glenn-jocher Aug 30, 2022 Maintainer

paradigmn
Aug 23, 2022

glenn-jocher
Aug 30, 2022
Maintainer