SegTrackDetect is a modular framework designed for accurate small object detection using a combination of segmentation and tracking techniques. It performs detection within selected Regions of Interest (ROIs), providing a highly efficient solution for scenarios where detecting tiny objects with precision is critical. The framework's modularity empowers users to easily customize key components, including the ROI Estimation Module, ROI Prediction Module, and Object Detector. It also features our Overlapping Box Suppression Algorithm that efficiently combines detected objects from multiple sub-windows, filtering them to overcome the limitations of window-based detection methods. See the following sections for more details on the framework, its components, and customization options:
- SegTrackDetect Architecural Design
- ROI Fusion Module
- Object Detection Module
- Detection Aggregation and Filtering.
To get started with the framework right away, head to the Getting Started section.
We provide a Dockerfile that handles all the dependencies for you. Simply install the Docker Engine and, if you plan to run detection on a GPU, the NVIDIA Container Toolkit.
To download all the trained models described in Model ZOO and build a Docker image, simply run:
./build_and_run.sh
We currently support four datasets, and we provide scripts that downloads the datasets and converts them into supported format. To download and convert all of them, run:
./download_and_convert.sh
You can also download selected datasets by running corresponding scripts in the scripts
directory.
SegTrackDetect framework supports tiny object detection on consecutive frames (video detection), as well as detection on independent windows.
To run detection on video data using one of the supported datasets, e.g. SeaDronesSee
:
python inference_vid.py \
--roi_model 'SDS_large' --det_model 'SDS' --tracker 'sort' \
--ds 'SeaDronesSee' --split 'val' \
--bbox_type 'sorted' --allow_resize --obs_iou_th 0.1 \
--out_dir 'results/SDS/val' --debug
To run the detection on independent windows, e.g. MTSD
, use:
python inference_img.py \
--roi_model 'MTSD' --det_model 'MTSD' \
--ds 'MTSD' --split 'val' \
--bbox_type 'sorted' --allow_resize --obs_iou_th 0.7 \
--out_dir 'results/MTSD/val' --debug
Argument | Type | Description |
---|---|---|
--roi_model |
str |
Specifies the ROI model to use (e.g., SDS_large ). All available ROI models are defined here |
--det_model |
str |
Specifies the detection model to use (e.g., SDS ). All available detectors are defined here |
--tracker |
str |
Specifies the tracker to use (e.g., sort ). All available trackers are defoned here |
--ds |
str |
Dataset to use for inference (e.g., SeaDronesSee ). Available datasets |
--split |
str |
Data split to use (e.g., val for validation). If present, the script will save the detections using the coco image ids used in val.json |
--flist |
str |
An alternative version of providing an image list, path to the file with absolute paths to images. |
--name |
str |
A name for provided flist , coco annotations name.json will be generated and saved in the dataset root directory |
--bbox_type |
str |
Type of the detection window filtering algorithm (all - no filtering, naive , sorted ). |
--allow_resize |
flag |
Enables resizing of cropped detection windows. Siling window within large ROIs will be used otherwise. |
--obs_iou_th |
float |
Sets the IoU threshold for Overlapping Box Suppresion (default is 0.7). |
--cpu |
flag |
Use cpu for computations, if not set use cuda |
--out_dir |
str |
Directory to save output results (e.g., results/SDS/val ). |
--debug |
flag |
Enables saving visualisation in out_dir |
--vis_conf_th |
float |
Confidence threshold for the detections in visualisation, default 0.3. |
All available models can be found in Model ZOO. Currently, we provide trained models for 4 detection tasks.
We convert all datasets to coco format, and we provide a script for metrics computation.
All models we use, are in TorchScrpt format.
Model | Objects of Interest | Dataset | Model name | Input size | Weights |
---|---|---|---|---|---|
u2netp | traffic signs | MTSD | MTSD | 576x576 | here |
unet | fish | ZebraFish | ZeF20 | 160x256 | here |
unet | people | DroneCrowd | DC_tiny | 96x160 | here |
unet | people | DroneCrowd | DC_small | 192x320 | here |
unet | people | DroneCrowd | DC_medium | 384x640 | here |
unet | people, boats | SeaDronesSee | SDS_tiny | 64x96 | here |
unet | people, boats | SeaDronesSee | SDS_small | 128x192 | here |
unet | people, boats | SeaDronesSee | SDS_medium | 224x384 | here |
unet | people, boats | SeaDronesSee | SDS_large | 448x768 | here |
Model | Objects of Interest | Dataset | Model name | Input size | Weights |
---|---|---|---|---|---|
yolov4 | traffic signs | MTSD | MTSD | 960x960 | here |
yolov7 tiny | fish | ZebraFish | ZeF20 | 160x256 | here |
yolov7 tiny | people | DroneCrowd | SDS | 320x512 | here |
yolov7 tiny | people, boats | SeaDronesSee | DC | 320x512 | here |