feat: ✨ Add rectangular inference support `rect` and speed optimization for pre-processors and post-processors #59

onuralpszr · 2026-01-10T14:58:56Z

This pull request introduces several significant improvements and new features to the Ultralytics inference Rust library, focusing on enhanced performance, usability, and expanded functionality. Major highlights include support for rectangular and batch inference, improved hardware acceleration options, expanded CLI arguments, and optimizations for preprocessing and post-processing. The documentation and example outputs have also been updated to reflect these changes.

New Features and CLI Enhancements:

Added rectangular inference (--rect) and batch inference (--batch) support, both enabled/configurable via CLI and passed through the inference pipeline. [1] [2] [3]
Increased default IoU threshold for NMS to 0.7, raised default max detections to 300, and exposed these as CLI arguments. [1] [2] [3] [4]
Expanded device selection to include more hardware acceleration options (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK), and improved CLI help/examples.

Performance and Preprocessing Optimizations:

Added SIMD-accelerated preprocessing via the wide crate and introduced an LRU cache for preprocessing LUTs for faster image handling. [1] [2]
Switched to "fat" LTO in release builds for improved optimization.
On Linux, configured RPATH in .cargo/config.toml to simplify shared library loading.

Batch Processing and Pipeline Improvements:

Implemented a pipelined, multi-threaded batch processing system using bounded channels between frame decoding and inference, improving throughput and responsiveness.
Centralized batch management in the prediction pipeline for more efficient processing.

Documentation and Example Updates:

Updated README.md with new CLI options, example commands, output samples, and a detailed breakdown of the codebase structure and dependencies. [1] [2] [3] [4] [5]
Revised example output to reflect new defaults, improved speed, and updated versioning.
Added new features to the "Features" checklist and clarified in-progress items.

Codebase and Dependency Updates:

Bumped crate version to 0.0.8 and added new dependencies (wide, lru) for preprocessing and caching. [1] [2]
Expanded and clarified module structure in documentation, highlighting new modules for batch processing, device management, annotation, I/O, and logging.

These changes collectively make the library faster, more flexible, and easier to use for a wider range of inference scenarios.

New Features and CLI Enhancements

Added rectangular inference (--rect), batch inference (--batch), and expanded CLI options for IoU, max detections, and device selection. Updated CLI help and examples accordingly. [1] [2] [3] [4] [5] [6] [7] [8]

Performance and Pipeline Improvements

Introduced SIMD-accelerated preprocessing (wide), LRU cache for LUTs (lru), and improved release build optimization with "fat" LTO. [1] [2] [3]
Implemented pipelined, multi-threaded batch processing using bounded channels for better throughput.

Hardware Acceleration and Platform Support

Added support for more hardware acceleration backends (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK) and configured Linux RPATH for runtime library loading. [1] [2] [3]

Documentation and Example Updates

Updated README.md with new CLI options, example outputs, features checklist, and detailed module/dependency breakdowns. [1] [2] [3] [4] [5]

Codebase and Dependency Updates

Bumped version to 0.0.8, added new dependencies, and clarified module structure in documentation. [1] [2] [3]

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Ultralytics Inference 0.0.8 boosts performance and usability with SIMD-accelerated pre/postprocessing, rectangular + batch inference, improved CLI defaults, and smoother Linux/device/runtime behavior 🚀

📊 Key Changes

⚡ Major speedups in preprocessing: replaced the old letterbox pipeline with a fused, zero-copy resize+pad+normalize path using SIMD (wide) plus an LRU-cached LUT (lru) to reduce repeated work.
🧠 Faster detection postprocessing (NMS): moved to zero-copy output slicing and introduced a SIMD-accelerated per-class NMS path to cut allocations and speed up hot loops.
📐 Rectangular inference (--rect) added + enabled by default: dynamically adjusts input shapes to reduce padding (when the ONNX model supports dynamic shapes) for better throughput/latency.
📦 Batch inference support (--batch): batch size is now configurable, and the CLI pipeline was updated accordingly.
🧵 Pipelined decoding + inference: decoding runs in a producer thread feeding a bounded channel, improving throughput for video/streams.
🛠️ CLI & defaults updated to match Ultralytics Python behavior:
- Default --iou changed to 0.7 (was 0.45) 🎛️
- Default --max-det is 300 (and builder/docs updated) 🧾
- Warns when using the default model (like Python) ⚠️
- --save and --verbose now show defaults more explicitly
🐧 Linux packaging improvement: sets RPATH to $ORIGIN so binaries can find libonnxruntime*.so placed beside the executable (no need to set LD_LIBRARY_PATH) 📦
🖥️ Visualization improvements: viewer now uses original image dimensions and avoids unnecessary resizing for display.
🔧 Release build tuning: switched to fat LTO for potentially better runtime performance (at the cost of longer compile times) 🏗️

🎯 Purpose & Impact

🚀 Much faster end-to-end inference, especially on CPU, due to fewer allocations, SIMD acceleration, and pipelined processing (notably improved preprocess/postprocess times).
📐 Better efficiency on non-square images with rectangular inference (less padding → less compute), while safely falling back to square padding for mixed-size batches.
📦 Easier deployment on Linux: colocating ONNX Runtime shared libraries with the binary “just works” thanks to $ORIGIN RPATH.
🎛️ More Ultralytics-consistent results and UX: updated defaults (iou=0.7, max_det=300, rect=true) align behavior with Ultralytics Python, reducing surprises when switching environments.
⚠️ Potential behavior change: higher default IoU threshold and higher max detections may alter output counts compared to prior versions (but improves consistency with Ultralytics defaults).

📋 Skipped 1 file (lock files, generated, images, etc.)

Cargo.lock

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

…ments - Added `#[allow(clippy::struct_excessive_bools)]` to `InferenceConfig` to suppress excessive bool warnings. - Removed unnecessary logging initialization code in `init_logging`. - Suppressed unnecessary wraps in the `main` function. - Enhanced `YOLOModel` with additional Clippy lints for better code quality. - Optimized image processing in `YOLOModel` by reducing unnecessary allocations and improving data handling. - Refactored post-processing to use zero-copy techniques and SIMD for faster detection extraction. - Introduced a new zero-copy preprocessing function to minimize memory allocations during image processing. - Improved letterbox resizing and bilinear interpolation with SIMD optimizations and LRU caching for X coordinate lookups. - Cleaned up deprecated code and comments for better readability and maintainability. Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

…tests Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

codecov · 2026-01-10T15:37:03Z

Codecov Report

❌ Patch coverage is 67.71218% with 175 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/postprocessing.rs	61.70%	90 Missing ⚠️
src/preprocessing.rs	76.96%	47 Missing ⚠️
src/model.rs	65.11%	15 Missing ⚠️
src/cli/predict.rs	64.51%	11 Missing ⚠️
src/source.rs	0.00%	6 Missing ⚠️
src/download.rs	72.72%	3 Missing ⚠️
src/main.rs	0.00%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ze, adjust IoU threshold, and improve download image path handling Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

…rgo.toml Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

…onfig example Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

UltralyticsAssistant · 2026-01-19T08:46:33Z

Merged! Huge thanks @onuralpszr for pushing Inference 0.0.8 forward with a seriously thoughtful set of performance + UX upgrades, and to @picsalex for the valuable contributions and collaboration.

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” — Antoine de Saint‑Exupéry

This PR embodies that idea: fewer copies, fewer allocations, smarter defaults, and smoother Linux deployment—resulting in faster end-to-end CPU inference, better throughput with rect + batch, and a CLI that feels more consistent with Ultralytics (including the iou=0.7 and max-det=300 alignment). The SIMD-accelerated pre/postprocessing and pipelined decoding are especially impactful—real, practical speed where it matters.

Appreciate the craftsmanship and attention to real-world usability—this is a big win for everyone building with Ultralytics Inference.

onuralpszr added 4 commits January 6, 2026 23:21

feat: ✨ Add rectangular inference support rect

794b9ed

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

feat: ✨ Optimize bilinear resizing for Python compatibility with SIMD

99a3eee

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

Merge branch 'main' into feat/rect

0dcd03d

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

onuralpszr requested a review from picsalex January 10, 2026 15:05

onuralpszr added 6 commits January 10, 2026 18:07

fix: 🐞 update max_det parameter in examples and tests to 300

fbb8f63

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

fix: 🐞 update max_det parameter in inference example to 300

7d85ce0

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

fix: 🐞 update max_det parameter to 300 in prediction and integration …

741ef82

…tests Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

fix: 🐞 remove unnecessary blank lines in extract_detect_boxes function

1d90592

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

fix: 🐞 rename max_detections to max_det in extract_detect_boxes function

e0592e4

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

fix: 🐞 reorder imports for clarity in predict.rs

ffe6567

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

onuralpszr changed the title ~~feat: ✨ Add rectangular inference support rect~~ feat: ✨ Add rectangular inference support rect and speed optimization for pre-processors and post-processors Jan 10, 2026

onuralpszr and others added 6 commits January 10, 2026 19:35

refactor: update CLI arguments for rectangular inference and batch si…

8a2c26e

…ze, adjust IoU threshold, and improve download image path handling Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

chore: 📦 update version to 0.0.8 and remove unused dependencies in Ca…

5e13f78

…rgo.toml Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

Auto-format by https://ultralytics.com/actions

011a03a

fix: 🐞 remove duplicate entry for zidane.jpg in README.md

12ebea6

Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

docs: increase max detections per image from 100 to 300 in InferenceC…

4e891b1

…onfig example Signed-off-by: Onuralp SEZER <onuralp@ultralytics.com>

Merge branch 'main' into feat/rect

7bbd97f

picsalex approved these changes Jan 19, 2026

View reviewed changes

onuralpszr merged commit 466710a into main Jan 19, 2026
9 checks passed

onuralpszr deleted the feat/rect branch January 19, 2026 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: ✨ Add rectangular inference support `rect` and speed optimization for pre-processors and post-processors #59

feat: ✨ Add rectangular inference support `rect` and speed optimization for pre-processors and post-processors #59

Uh oh!

onuralpszr commented Jan 10, 2026 •

edited by UltralyticsAssistant

Loading

Uh oh!

codecov bot commented Jan 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

UltralyticsAssistant commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: ✨ Add rectangular inference support rect and speed optimization for pre-processors and post-processors #59

feat: ✨ Add rectangular inference support rect and speed optimization for pre-processors and post-processors #59

Uh oh!

Conversation

onuralpszr commented Jan 10, 2026 • edited by UltralyticsAssistant Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

Uh oh!

codecov bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

UltralyticsAssistant commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: ✨ Add rectangular inference support `rect` and speed optimization for pre-processors and post-processors #59

feat: ✨ Add rectangular inference support `rect` and speed optimization for pre-processors and post-processors #59

onuralpszr commented Jan 10, 2026 •

edited by UltralyticsAssistant

Loading

codecov bot commented Jan 10, 2026 •

edited

Loading