High-performance YOLO inference implementation in C++ for ROS2, optimized for robotics applications and real-time computer vision. This package provides a production-ready solution for deploying Ultralytics YOLO models on edge devices and robotic systems.
Compatible with Ultralytics YOLO models, including:
- YOLOv8 and YOLOv11 (all variants: n/s/m/l/x — detect, pose, segment tasks)
- Any future Ultralytics releases following the same export format
Models trained with Ultralytics can be directly exported to ONNX or TensorRT and deployed with no modifications required.
- High Performance: Optimized for real-time inference on edge devices
- Dual Backend: TensorRT (best performance) and ONNX Runtime (cross-platform)
- Multiple Tasks: Object detection, pose estimation, instance segmentation, GateNet gate detection
- Comprehensive Profiling: Detailed timing for all processing stages
- ROS2 Integration: Native ROS2 Humble support with custom messages
- Jetson Optimized: Special optimizations for NVIDIA Jetson platforms
- NVIDIA Jetson (Xavier NX, Orin, AGX): Primary target for robotics applications
- x86_64 with NVIDIA GPU: Development and testing
- ARM64: General ARM64 support (not yet validated)
git clone <repository-url>
cd yolo-ros2-inference
chmod +x scripts/install_dependencies.sh
./scripts/install_dependencies.sh
chmod +x scripts/build_package.sh
./scripts/build_package.shros2 launch yolo_inference_cpp yolo_pose.launch.py \
model_path:=<your_model>.onnx \
task:=segment \
input_topic:=/your/camera/compressed \
input_width:=832 \
input_height:=832 \
confidence_threshold:=0.25 \
max_detections:=40 \
publish_visualization:=true \
draw_bboxes:=false \
class_names:="['class_a', 'class_b', 'class_c']"ros2 launch yolo_inference_cpp gatenet.launch.py \
model_path:=<your_model>.onnx \
task:=pose \
input_topic:=/your/camera/compressed \
input_width:=640 \
input_height:=480 \
confidence_threshold:=0.4 \
publish_visualization:=trueTip: Use
input_width+input_heightfor non-square inputs. For square inputs, useinput_sizeinstead (e.g.input_size:=640).
ros2 launch yolo_inference_cpp yolo_pose.launch.py \
model_path:=yolo11n.onnx \
task:=detect \
input_topic:=/your/camera/compressed \
input_size:=640 \
confidence_threshold:=0.5ros2 launch yolo_inference_cpp yolo_tensorrt.launch.py \
model_path:=yolo11m-pose-fp16.engine \
task:=pose \
input_topic:=/your/camera/compressed \
input_size:=640 \
confidence_threshold:=0.3 \
max_detections:=10 \
publish_visualization:=false \
enable_profiling:=trueros2 launch yolo_inference_cpp gatenet.launch.py \
model_path:=gatenet.onnx \
task:=gatenet \
input_topic:=/your/camera/compressed \
input_width:=480 \
input_height:=368| Parameter | Type | Default | Description |
|---|---|---|---|
model_path |
string | yolo11n-pose.onnx |
Path to ONNX or TensorRT engine file |
task |
string | pose |
Task type: pose, detect, segment, gatenet |
input_topic |
string | /camera/image_raw/compressed |
Input compressed image topic |
input_size |
int | 640 |
Input image size (square) |
input_width |
int | -1 |
Input width for non-square models |
input_height |
int | -1 |
Input height for non-square models |
confidence_threshold |
float | 0.5 |
Detection confidence threshold |
nms_threshold |
float | 0.4 |
Non-maximum suppression threshold |
keypoint_threshold |
float | 0.3 |
Keypoint visibility threshold |
max_detections |
int | 20 |
Maximum detections per frame |
class_names |
string[] | [] |
Override class names (auto-detected from model if empty) |
publish_visualization |
bool | false |
Enable visualization output |
draw_bboxes |
bool | true |
Draw bounding boxes in visualization |
enable_profiling |
bool | true |
Enable detailed profiling |
paf_threshold |
float | 0.3 |
PAF affinity threshold (GateNet only) |
corner_threshold |
float | 0.5 |
Corner confidence threshold (GateNet only) |
min_corners |
int | 3 |
Minimum corners to form a valid gate (GateNet only) |
| File | Backend | Typical Use |
|---|---|---|
yolo_pose.launch.py |
ONNX Runtime | Cross-platform, any task |
yolo_tensorrt.launch.py |
TensorRT | Maximum performance on GPU/Jetson |
gatenet.launch.py |
ONNX Runtime | Gate detection with non-square inputs |
# View real-time performance metrics
ros2 topic echo /yolo/performance
# GPU and CPU usage
nvidia-smi -l 1
htopMain output topic published on /yolo/detections:
std_msgs/Header header
string model_type
string task
KeypointDetection[] detections
float32[] raw_output
PerformanceInfo performancestd_msgs/Header header
string label
int32 class_id
float32 confidence
BoundingBox bounding_box
Keypoint[] keypointsfloat64 total_time_ms
float64 inference_ms
float64 image_conversion_ms
float64 message_creation_ms
int32 detections_count
float64 fpsFull documentation including training parameters, distillation, export, and benchmarking is available in docs/training.md.
Quick reference:
Train
python scripts/yolo_training.py \
--data dataset.yaml \
--model yolo11x-pose.pt \
--pretrained --batch-size 4 --imgsz 640 --multiscaleExport
python scripts/yolo_batch_exporter_validator.py \
--model-folders <folder> --output-dir <output_dir> \
--task pose --use-tensorrtBenchmark
python scripts/yolo_benchmarking.py \
--models-folder <folder> --dataset-yaml dataset.yaml \
--output-dir <output_dir> --task poseCUDA Out of Memory
- Reduce
input_size(e.g. 416 instead of 640) - Lower
max_detections - Use FP16 TensorRT engine
Low FPS
- Set
publish_visualization:=false - Switch to TensorRT backend
- Use a smaller model variant (n/s instead of m/l/x)
Missing Dependencies
./scripts/install_dependencies.sh
ls -la /usr/local/onnxruntime/lib # check ONNX Runtime
ldconfig -p | grep tensorrt # check TensorRTDebug Logging
ros2 launch yolo_inference_cpp yolo_pose.launch.py --log-level debug- Issues: Report bugs via GitHub Issues
Alejandro Rodríguez-Ramos [https://alejandrorodriguezramos.me]
YOLO YOLOv8 YOLOv11 Ultralytics ROS2 C++ TensorRT ONNX pose estimation object detection instance segmentation edge AI Jetson drone UAV real-time inference computer vision robotics
