Skip to content

Conversation

@onuralpszr
Copy link
Member

@onuralpszr onuralpszr commented Jan 10, 2026

This pull request introduces several significant improvements and new features to the Ultralytics inference Rust library, focusing on enhanced performance, usability, and expanded functionality. Major highlights include support for rectangular and batch inference, improved hardware acceleration options, expanded CLI arguments, and optimizations for preprocessing and post-processing. The documentation and example outputs have also been updated to reflect these changes.

New Features and CLI Enhancements:

  • Added rectangular inference (--rect) and batch inference (--batch) support, both enabled/configurable via CLI and passed through the inference pipeline. [1] [2] [3]
  • Increased default IoU threshold for NMS to 0.7, raised default max detections to 300, and exposed these as CLI arguments. [1] [2] [3] [4]
  • Expanded device selection to include more hardware acceleration options (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK), and improved CLI help/examples.

Performance and Preprocessing Optimizations:

  • Added SIMD-accelerated preprocessing via the wide crate and introduced an LRU cache for preprocessing LUTs for faster image handling. [1] [2]
  • Switched to "fat" LTO in release builds for improved optimization.
  • On Linux, configured RPATH in .cargo/config.toml to simplify shared library loading.

Batch Processing and Pipeline Improvements:

  • Implemented a pipelined, multi-threaded batch processing system using bounded channels between frame decoding and inference, improving throughput and responsiveness.
  • Centralized batch management in the prediction pipeline for more efficient processing.

Documentation and Example Updates:

  • Updated README.md with new CLI options, example commands, output samples, and a detailed breakdown of the codebase structure and dependencies. [1] [2] [3] [4] [5]
  • Revised example output to reflect new defaults, improved speed, and updated versioning.
  • Added new features to the "Features" checklist and clarified in-progress items.

Codebase and Dependency Updates:

  • Bumped crate version to 0.0.8 and added new dependencies (wide, lru) for preprocessing and caching. [1] [2]
  • Expanded and clarified module structure in documentation, highlighting new modules for batch processing, device management, annotation, I/O, and logging.

These changes collectively make the library faster, more flexible, and easier to use for a wider range of inference scenarios.


New Features and CLI Enhancements

  • Added rectangular inference (--rect), batch inference (--batch), and expanded CLI options for IoU, max detections, and device selection. Updated CLI help and examples accordingly. [1] [2] [3] [4] [5] [6] [7] [8]

Performance and Pipeline Improvements

  • Introduced SIMD-accelerated preprocessing (wide), LRU cache for LUTs (lru), and improved release build optimization with "fat" LTO. [1] [2] [3]
  • Implemented pipelined, multi-threaded batch processing using bounded channels for better throughput.

Hardware Acceleration and Platform Support

  • Added support for more hardware acceleration backends (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK) and configured Linux RPATH for runtime library loading. [1] [2] [3]

Documentation and Example Updates

  • Updated README.md with new CLI options, example outputs, features checklist, and detailed module/dependency breakdowns. [1] [2] [3] [4] [5]

Codebase and Dependency Updates

  • Bumped version to 0.0.8, added new dependencies, and clarified module structure in documentation. [1] [2] [3]

…ments

- Added `#[allow(clippy::struct_excessive_bools)]` to `InferenceConfig` to suppress excessive bool warnings.
- Removed unnecessary logging initialization code in `init_logging`.
- Suppressed unnecessary wraps in the `main` function.
- Enhanced `YOLOModel` with additional Clippy lints for better code quality.
- Optimized image processing in `YOLOModel` by reducing unnecessary allocations and improving data handling.
- Refactored post-processing to use zero-copy techniques and SIMD for faster detection extraction.
- Introduced a new zero-copy preprocessing function to minimize memory allocations during image processing.
- Improved letterbox resizing and bilinear interpolation with SIMD optimizations and LRU caching for X coordinate lookups.
- Cleaned up deprecated code and comments for better readability and maintainability.

Signed-off-by: Onuralp SEZER <[email protected]>
@onuralpszr onuralpszr requested a review from picsalex January 10, 2026 15:05
@onuralpszr onuralpszr changed the title feat: ✨ Add rectangular inference support rect feat: ✨ Add rectangular inference support rect and speed optimization for pre-processors and post-processors Jan 10, 2026
@codecov
Copy link

codecov bot commented Jan 10, 2026

Codecov Report

❌ Patch coverage is 67.71218% with 175 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/postprocessing.rs 61.70% 90 Missing ⚠️
src/preprocessing.rs 76.96% 47 Missing ⚠️
src/model.rs 65.11% 15 Missing ⚠️
src/cli/predict.rs 64.51% 11 Missing ⚠️
src/source.rs 0.00% 6 Missing ⚠️
src/download.rs 72.72% 3 Missing ⚠️
src/main.rs 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants