This repository contains code to optimize the YOLOv12n-seg-residual model using Pruna AI's optimization tools. The optimization process significantly improves inference speed while maintaining model accuracy.
YOLOv12 is a state-of-the-art "Attention-Centric Real-Time Object Detector" developed by Yunjie Tian, Qixiang Ye, and David Doermann. The YOLOv12n-seg-residual variant adds instance segmentation capabilities to the base detection model. This repository specifically focuses on optimizing this model using Pruna AI's optimization tools, which leverage PyTorch's compilation capabilities to achieve significant performance improvements.
The optimization is performed using Pruna's smash
functionality, which applies various optimization techniques including graph transformations and compilation optimizations to make the model run faster without compromising accuracy.
The YOLOv12n-seg-residual model used in this project was sourced from Weights & Biases: YOLOv12n-seg-residual Model
Note: YOLOv12 is relatively new, having been published in February 2025 on arXiv as "YOLOv12: Attention-Centric Real-Time Object Detectors."
git clone https://github.com/YOUR_USERNAME/yolov12n-seg-optimization-pruna-ai.git
cd yolov12n-seg-optimization-pruna-ai
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
- Download the YOLOv12n-seg-residual model from Weights & Biases
- Place the downloaded
.pt
file in themodels/
directory:
# Create models directory if it doesn't exist
mkdir -p models
# Move the downloaded model to the models directory
mv path/to/downloaded/yolov12n-seg-residual.pt models/
python optimize_yolo.py
This will:
- Load the YOLOv12n-seg-residual model
- Benchmark the original model's performance
- Apply Pruna's optimization techniques
- Benchmark the optimized model's performance
- Save the optimized model to
models/yolov12n-seg-residual_smashed_tc_gpu.pt
You can modify the following parameters in optimize_yolo.py
:
MODEL_PATH
: Path to the original YOLOv12n-seg-residual modelSMASHED_MODEL_PATH
: Path to save the optimized modelNUM_WARMUP_RUNS
: Number of warm-up inference runs before benchmarkingNUM_TIMED_RUNS
: Number of inference runs for benchmarkingSAMPLE_INPUT_SHAPE
: Input shape for the model (default: [1, 3, 640, 640])
The optimization process uses Pruna AI's smash
functionality with PyTorch's compilation backend. The script:
- Loads the YOLOv12n-seg-residual model
- Configures Pruna's SmashConfig with appropriate compiler settings
- Applies PyTorch's inductor backend optimization for NVIDIA GPUs
- Benchmarks the original and optimized models to measure performance improvement
Typical performance improvements vary depending on hardware, but you can expect:
- 1.5-3x speedup on NVIDIA GPUs
- Improved throughput for real-time applications
YOLOv12 is an attention-centric YOLO framework that matches the speed of CNN-based models while harnessing the performance benefits of attention mechanisms. According to the authors:
- YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU
- It outperforms advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with comparable speed
- The model also surpasses end-to-end real-time detectors like RT-DETR and RT-DETRv2
The main dependencies are:
- PyTorch (>= 2.0.0)
- Ultralytics (for the YOLO class used to load and operate the model)
- Pruna (for optimization)
See requirements.txt
for the complete list of dependencies.
MIT License