YOLOv8s + OpenVINO + DeepSORT: A demo of threat detection and unattended baggage tracking.
A YOLOv8s is trained on COCO dataset and the weights are converted to OpenVINO format. The model is then used to detect threats and unattended baggage in a video stream. The detections are then tracked using DeepSORT.
- Threat Detection: Identifies potential threats such as
knife
andscissor
classes. - Multiple Tracking: Utilizes DeepSORT for tracking multiple persons and their associated baggages.
- Unattended Baggage Detection & Alerting: Flags and alerts when a person moves away from their associated baggage beyond a set threshold distance and time.
- Alert Resolution: Recognizes when an unattended baggage is claimed back by the associated person within a certain "grace period", and accordingly resolves the alert.
- Grace Period Exceeded Alert: Issues an additional alert when an unattended baggage remains unclaimed beyond the grace period.
- Leaving Scene Management: Handles situations where a person leaves the scene entirely, tracking and alerting on their unattended baggage.
- Historical Baggage Monitoring: Keeps an eye on baggages left unattended in the frame for more than a set duration, even without an associated person present.
- Prevents Repeated Alerts: Suppresses repeated alerts for the same unattended baggage until resolved, avoiding alert spam.
Please note: The system's accuracy and effectiveness depend on the quality of the object detection model, the configuration of thresholds and durations, and the clarity of the video feed.
- The bounding box logic isn't perfect and sometimes the bounding boxes just go outside the frame. This can cause the centre points to be outside the frame and tracking to fail.
- The model is limited by its training data (COCO dataset). It doesn't detect some classes (e.g., gun) because they weren't in the training data.
- We are using the 's' version of YOLOv8 due to issues maintaining consistency across frames. While the 'n' version is faster, it may not be as accurate.
- Train the model on a custom dataset with specific classes we want to detect (e.g., person, knife, scissor, luggage, backpack, handbag, gun).
- Reuse the 'n' version of YOLOv8 and fix the consistency issue.
- Improve the bounding box logic to keep boxes inside the frame.
- Find a way for DeepSORT to deal with custom aspect ratios and sizes to deal with multi classes. (Use separate models for each class or use find a new technique altogether.)
- Since the model used for DeepSORT called
mars-small128
is a TensorFlow model, we can convert it to OpenVINO IR format, add the preprocessing into the model and then also re-engineer or update the functionsextract_image_patch
,create_box_encoder
,ImageEncoderTF
and_run_in_batches
inutils.py
to be OpenVINO optimised using the latest updated libraries/helpers.
Since the complexity of testing the whole implementation, I won't be able to patch all issues before the Intel Chips' CHallenge - Detect Faster deadline.
- The model is trained on COCO dataset using YOLOv8s. (We used the
s
instead ofn
because we were having trouble with keeping consistency across the frames) - The model is then converted to OpenVINO format using the
convert.ipynb
notebook. - The model is then used to detect threats and unattended baggage in a video stream:
- Filter detections to only show the threats and persons and unattended baggage.
- Create unique detections for each person and unattended baggage using DeepSORT.
- Then create pairs for them, while also returning the alerts for threats.
- Create a relationship between the unattended baggage and the closest person to track the unattended baggage.
- Finally, draw the detections, keep track of the time period of the unattended baggage and draw the alerts.
demo.ipynb
is the main file that runs the inference, you have specified the video source and OpenVINO optimised model path. You may also play with the DeepSORT and Threat alert configuration in there.convert.ipynb
is copied straight from YOLOv8-OpenVINO-Optimised used purely to optimise the YOLO model using OpenVINO Toolkit.utils.py
contains a ton of helper functions and methods used for post-processing on the image. To be specific:log_output
Outputs logging stuff to log.txtVideoPlayer
class handles the counting of fps and management of frames (Copied from OpenVINO Notebooks - YOLO optimization)plot_one_box
plots a single box in the given frame (Copied from OpenVINO Notebooks - YOLO optimization)letterbox
resizes the image to fit into a new shape by saving the original aspect ratio and pads it to meet stride-multiple constraints (Copied from OpenVINO Notebooks - YOLO optimization)postprocess
applies non maximum suppression algorithm to detections and rescale boxes to their original image size (Copied from OpenVINO Notebooks - YOLO optimization)process_results
manages the filtering and DeepSORT tracking for the objects.track_risk
manages the alerts and tracking of object pairs
- Install the requirements:
pip install -r requirements.txt
- Create the optimized model by running the notebook cells at
convert.ipynb
to convert the model to OpenVINO format. (IMPORTANT) - Run the cell at
demo.ipynb
to run the demo. (Make sure to change the video path to your supported video source)
Note: Demo won't work without creating the optimized model.
- To my sister for staying up late with me to test the demo and debug the detections. ❤️
- DeepSORT library: theAIGuysCode/yolov4-deepsort
- The binding logic for DeepSORT and YOLO: MatPiech/DeepSORT-YOLOv4-TensorRT-OpenVINO
- Used as a base repo for inference: AJV009/YOLOv8-OpenVINO-Optimised