-
-
Notifications
You must be signed in to change notification settings - Fork 17.2k
Description
Search before asking
- I have searched the YOLOv5 issues and discussions and found no similar questions.
Question
Hi team,
I am working on an overhead object detection project using images with a resolution of 1280x1024. The objects are generally small (e.g., cars and people). The inference will be performed on the DPU. (B4096 or B3136) I am considering different approaches and would like your advice on the best configuration and workflow. Here is what I have explored so far:
• Training at 1280 resolution:
◦ If I train at 1280x1024, I can avoid resize operations during preprocessing and postprocessing.
◦ This could potentially save some FPS, but I am not sure how much it will impact inference performance overall. The DPU might actually spend more time processing larger inputs.
• Training at 640x640 or 640x512 (custom) resolution and using a Nano or Small model:
◦ This approach should increase FPS on the DPU.
◦ However, I have observed that small objects are often missed in detection results.
• Dataset preparation:
◦ I am creating my own dataset.
◦ With augmentation, I can expand it to around 10,000 images.
◦ One challenge is that the object distribution between classes is not balanced (e.g., different frequencies of cars and people).
Questions:
1. What do you recommend as the best resolution and model size combination to balance FPS and detection accuracy, especially for small objects?
2. Is it worth training directly at 1280 resolution to avoid resizing, or would you expect diminishing returns on accuracy vs. performance?
3. Do you have any recommendations for dataset preparation and augmentation to improve performance and balance between classes?
Any guidance or suggestions would be greatly appreciated.
Thank you!
Additional
No response