Skip to content

Conversation

antimora
Copy link
Collaborator

@antimora antimora commented Sep 21, 2025

Introduces a model verification system for YOLO family models including yolov5s, yolov8n, yolov8s, yolov10n, and yolo11x. This PR includes a new model-checks crate for Burn with Cargo.toml, build script for ONNX codegen, Python script to download and process YOLO models and generate test data, and Rust implementation to validate model outputs against reference data. This enables automated verification of YOLO model ONNX import and inference correctness across supported backends.

Pull Request Template

Checklist

  • Confirmed that cargo run-checks command has been executed.
  • Made sure the book is up to date with changes in this PR.

Related Issues/PRs

#2822

Changes

Added model-checks crate with support for YOLO family models (yolov5s, yolov8n, yolov8s, yolov10n, yolo11x)

Testing

Successfully runs for tch and ndarray backend but fails for metal backend. yolov10n also fails with TopK and Mod ONNX ops. PRs to fix the ops will follow.

Introduces a new model-checks/yolov8n crate for Burn, including Cargo.toml, build script for ONNX codegen, Python script to download and process the YOLOv8n model and generate test data, and a Rust main.rs to validate model output against reference data. This enables automated verification of YOLOv8n ONNX import and inference correctness across supported backends.
@antimora
Copy link
Collaborator Author

Running with metal backend fails. CC @wingertge

     Running `target/release/burn-import-model-checks-yolov8n`
========================================
YOLOv8n Burn Model Test
========================================

Initializing YOLOv8n model...
  Model initialized in 19.58ms

Saving model structure to artifacts/model.txt...
  Model structure saved

Loading test data from artifacts/test_data.pt...
  Data loaded in 3.87ms
  Loaded input tensor with shape: [1, 3, 640, 640]
  Loaded reference output with shape: [1, 84, 8400]

Running model inference with test input...

thread 'main' panicked at /Users/dilshod/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wgpu-26.0.1/src/backend/wgpu_core.rs:1055:30:
wgpu error: Validation Error

Caused by:
  In Device::create_shader_module_passthrough, label = 'reduce_kernel_f32_f32_f32'
    Failed to generate the backend-specific code


note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'main' panicked at /Users/dilshod/Projects/burn/crates/burn-fusion/src/stream/execution/ordering.rs:67:38:
index out of bounds: the len is 0 but the index is 0
stack backtrace:
   0:        0x1036960f8 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::hf35f9734f9a29483
   1:        0x1036b0908 - core::fmt::write::h60ec6633daab7b35
   2:        0x103693908 - std::io::Write::write_fmt::hc29709fdab2e34e2
   3:        0x103695fac - std::sys::backtrace::BacktraceLock::print::hca95bffd78053951
   4:        0x103697418 - std::panicking::default_hook::{{closure}}::h357ed4fbef22679d
   5:        0x103697270 - std::panicking::default_hook::h0a4e133b151d5758
   6:        0x103697eb8 - std::panicking::rust_panic_with_hook::h557a23724a5de839
   7:        0x103697ad4 - std::panicking::begin_panic_handler::{{closure}}::h269cace6208fef05
   8:        0x1036965a8 - std::sys::backtrace::__rust_end_short_backtrace::h5be0da278f3aaec7
   9:        0x1036977b0 - __rustc[de2ca18b4c54d5b8]::rust_begin_unwind
  10:        0x103706280 - core::panicking::panic_fmt::h477ff48eff31ffa4
  11:        0x103706400 - core::panicking::panic_bounds_check::h0b28316d3be9e695
  12:        0x102ea9048 - burn_fusion::stream::queue::execution::QueueExecution<R>::execute_strategy::he93027f4b7b9dc86
  13:        0x102ea9200 - burn_fusion::stream::queue::execution::QueueExecution<R>::run::h69b4bb28492722c8
  14:        0x102acd914 - <burn_fusion::stream::multi::Segment<R> as burn_fusion::stream::execution::processor::StreamSegment<<R as burn_fusion::backend::FusionRuntime>::Optimization>>::execute::h1cd274a013d181b4
  15:        0x102a2669c - burn_fusion::stream::execution::processor::Processor<O>::process::hdefc40eb7e5b71bf
  16:        0x102a9b820 - burn_fusion::stream::multi::MultiStream<R>::register::h16b2ea700e134af6
  17:        0x102f03568 - <burn_fusion::client::mutex::MutexFusionClient<R> as burn_fusion::client::base::FusionClient<R>>::register::hdc69a0a44ed55156
  18:        0x102ade954 - <burn_fusion::tensor::FusionTensor<R> as core::ops::drop::Drop>::drop::hbaf7fa167023e4a4
  19:        0x102a3b044 - core::ptr::drop_in_place<burn_fusion::tensor::FusionTensor<burn_cubecl::fusion::FusionCubeRuntime<cubecl_wgpu::runtime::WgpuRuntime,u8>>>::h98fd97f45199f5e2
  20:        0x102a19b40 - burn_fusion::ops::float::<impl burn_tensor::tensor::ops::tensor::FloatTensorOps<burn_fusion::backend::Fusion<B>> for burn_fusion::backend::Fusion<B>>::float_max_dim::h13d519d724d0db1a
  21:        0x102eb2628 - burn_tensor::tensor::api::numeric::<impl burn_tensor::tensor::api::base::Tensor<B,_,K>>::max_dim::hf13f77fb25fa44c8
  22:        0x10306bb4c - burn_tensor::tensor::activation::base::softmax::h5f591d563e468f20
  23:        0x102a03dd8 - burn_import_model_checks_yolov8n::yolov8n::Model<B>::forward::he9f5aefc32089b0f
  24:        0x1029e8e80 - burn_import_model_checks_yolov8n::main::hbac23f6b259d61d0
  25:        0x102e2eb9c - std::sys::backtrace::__rust_begin_short_backtrace::h082d3750cc044d43
  26:        0x102eeaf70 - std::rt::lang_start::{{closure}}::h1550c5b9d1e1cfc9
  27:        0x10368e3c8 - std::rt::lang_start_internal::hdb28e94b6865fa11
  28:        0x102a0e238 - _main

thread 'main' panicked at library/core/src/panicking.rs:233:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
zsh: abort      cargo run --release --features metal --no-default-features
[yolov8n]%

Copy link

codecov bot commented Sep 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.01%. Comparing base (e80e648) to head (a1c69d5).
⚠️ Report is 35 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3750   +/-   ##
=======================================
  Coverage   64.01%   64.01%           
=======================================
  Files        1084     1084           
  Lines      126880   126880           
=======================================
  Hits        81228    81228           
  Misses      45652    45652           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wingertge
Copy link
Contributor

That looks like a fusion error, so I think @nathanielsimard would know more here

@wingertge
Copy link
Contributor

Actually it could also be an issue fixed by tracel-ai/cubecl#895

Renamed yolov8n model-checks crate to yolo and refactored code to support multiple YOLO variants (yolov5s, yolov8n, yolov8s, yolov10n, yolo11x). Added model selection via YOLO_MODEL environment variable, updated build script, Python model preparation script, and main.rs to handle dynamic model selection and output. Added README with usage instructions and supported models. Removed yolov8n-specific files.
Deleted the YOLO11x model check directory, including Cargo.toml, build script, model preparation Python script, and main Rust source. This removes support and tests for the YOLO11x model from burn-import/model-checks.
@antimora antimora changed the title Add YOLOv8n model check with ONNX import and test Add YOLO model family check with ONNX import and test Sep 21, 2025
@vesuvisian
Copy link

Actually it could also be an issue fixed by tracel-ai/cubecl#895

I'm not sure. I'm running with that latest code and still seeing the issue with my YOLOv9m model when using Wgpu and Metal (but not with just Wgpu).

thread 'main' panicked at /Users/andrewmartin/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/wgpu-26.0.1/src/backend/wgpu_core.rs:1055:30:
wgpu error: Validation Error

Caused by:
  In Device::create_shader_module_passthrough, label = 'reduce_kernel_f32_f32_f32'
    Failed to generate the backend-specific code


note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'main' panicked at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/execution/ordering.rs:51:13:
Ordering is bigger than operations
stack backtrace:
   0:        0x1076b62dc - std::backtrace_rs::backtrace::libunwind::trace::h674dcd02776dcc9c
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
   1:        0x1076b62dc - std::backtrace_rs::backtrace::trace_unsynchronized::haccaae8fb80e4531
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/../../backtrace/src/backtrace/mod.rs:66:14
   2:        0x1076b62dc - std::sys::backtrace::_print_fmt::h3191fc6495b0a516
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/sys/backtrace.rs:66:9
   3:        0x1076b62dc - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h373e57e2286956dc
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/sys/backtrace.rs:39:26
   4:        0x1076d0cf0 - core::fmt::rt::Argument::fmt::hcee930b009d69e38
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/fmt/rt.rs:173:76
   5:        0x1076d0cf0 - core::fmt::write::h2c4a0b98b09e3b30
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/fmt/mod.rs:1465:25
   6:        0x1076b38f4 - std::io::default_write_fmt::h1b8f25d7cf9c86a4
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/io/mod.rs:639:11
   7:        0x1076b38f4 - std::io::Write::write_fmt::h00b4007fff731b84
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/io/mod.rs:1954:13
   8:        0x1076b6190 - std::sys::backtrace::BacktraceLock::print::h3eb1535b8d3666ca
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/sys/backtrace.rs:42:9
   9:        0x1076b7498 - std::panicking::default_hook::{{closure}}::hf623c44b740b115f
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:300:27
  10:        0x1076b72e8 - std::panicking::default_hook::h8875fb31ec87dfad
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:327:9
  11:        0x1076b7f20 - std::panicking::rust_panic_with_hook::hdd8ceeeb04975c2b
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:833:13
  12:        0x1076b7b2c - std::panicking::begin_panic_handler::{{closure}}::hdf417b72ab8ffff8
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:699:13
  13:        0x1076b6788 - std::sys::backtrace::__rust_end_short_backtrace::h507d79c50996742e
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/sys/backtrace.rs:168:18
  14:        0x1076b7830 - __rustc[5224e6b81cd82a8f]::rust_begin_unwind
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:697:5
  15:        0x10776fe24 - core::panicking::panic_fmt::h3505bfbec5a0b799
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/panicking.rs:75:14
  16:        0x104a70984 - burn_fusion::stream::execution::ordering::OrderedExecution<R>::execute_optimization::h029649116ba1c021
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/execution/ordering.rs:51:13
  17:        0x1053fe808 - burn_fusion::stream::queue::execution::QueueExecution<R>::execute_strategy::h249d9fa60aba2d5e
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/queue/execution.rs:139:31
  18:        0x1053fe92c - burn_fusion::stream::queue::execution::QueueExecution<R>::execute_strategy::h249d9fa60aba2d5e
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/queue/execution.rs:146:37
  19:        0x1053feb28 - burn_fusion::stream::queue::execution::QueueExecution<R>::run::hf2701ce49b0ae6f9
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/queue/execution.rs:100:25
  20:        0x104beb824 - burn_fusion::stream::queue::execution::<impl burn_fusion::stream::queue::base::OperationQueue<R>>::execute_block_optimization::hfb5d1ff2d967d2f7
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/queue/execution.rs:36:13
  21:        0x104beb960 - burn_fusion::stream::queue::execution::<impl burn_fusion::stream::queue::base::OperationQueue<R>>::execute::hee5ae9381b4fde25
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/queue/execution.rs:25:14
  22:        0x105da64c0 - <burn_fusion::stream::multi::Segment<R> as burn_fusion::stream::execution::processor::StreamSegment<<R as burn_fusion::backend::FusionRuntime>::Optimization>>::execute::h816663c55cfe7c6b
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/multi.rs:405:20
  23:        0x105f05ba4 - burn_fusion::stream::execution::processor::Processor<O>::explore::h76119a0ea32cc89e
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/execution/processor.rs:108:22
  24:        0x105f05d88 - burn_fusion::stream::execution::processor::Processor<O>::process::h280a884890a73855
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/execution/processor.rs:55:26
  25:        0x105d9e898 - burn_fusion::stream::multi::MultiStream<R>::enqueue_operation::h2b68cea0f738ba93
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/multi.rs:155:26
  26:        0x105d9ff88 - burn_fusion::stream::multi::MultiStream<R>::register::h6e48fc6952fc2ca1
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/stream/multi.rs:79:33
  27:        0x104af0758 - burn_fusion::server::FusionServer<R>::register::h0e4cbb11b3522c4d
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/server.rs:33:14
  28:        0x104be6884 - <burn_fusion::client::mutex::MutexFusionClient<R> as burn_fusion::client::base::FusionClient<R>>::register::hae5841312205ddba
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/client/mutex.rs:46:14
  29:        0x1054d199c - <burn_fusion::tensor::FusionTensor<R> as core::ops::drop::Drop>::drop::h4e5f0e72dcd05d77
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/tensor.rs:205:22
  30:        0x10545fa60 - core::ptr::drop_in_place<burn_fusion::tensor::FusionTensor<burn_cubecl::fusion::FusionCubeRuntime<cubecl_wgpu::runtime::WgpuRuntime,u32>>>::hbd630e8f6b8e1d15
                               at /Users/andrewmartin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:799:1
  31:        0x1056c56d4 - burn_fusion::ops::module::<impl burn_tensor::tensor::ops::modules::base::ModuleOps<burn_fusion::backend::Fusion<B>> for burn_fusion::backend::Fusion<B>>::conv2d::h8e7e81f5405279a4
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-fusion/src/ops/module.rs:157:5
  32:        0x1051bbbfc - burn_tensor::tensor::module::conv2d::ha1412c072b7368ad
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-tensor/src/tensor/module.rs:62:40
  33:        0x104ca6d60 - burn_nn::modules::conv::conv2d::Conv2d<B>::forward::hee39ac342334faa0
                               at /Users/andrewmartin/.cargo/git/checkouts/burn-6c277d792b0d5d7a/c339df5/crates/burn-nn/src/modules/conv/conv2d.rs:175:9
  34:        0x104c16604 - <yolov9::models::yolov9m::YOLOv9m<B> as yolov9::models::YOLOv9<B>>::forward::hc42acdc63b292977
                               at /Users/andrewmartin/Library/CloudStorage/OneDrive-BOOZALLENHAMILTON/Programming/Rust/burn/yolov9/src/models/yolov9m.rs:2647:45
  35:        0x105857d70 - yolov9::main::h52e70cf8405c396f
                               at /Users/andrewmartin/Library/CloudStorage/OneDrive-BOOZALLENHAMILTON/Programming/Rust/burn/yolov9/src/main.rs:64:39
  36:        0x105445234 - core::ops::function::FnOnce::call_once::h78fd0ecdf2fbb187
                               at /Users/andrewmartin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
  37:        0x10599cb40 - std::sys::backtrace::__rust_begin_short_backtrace::haffc33aa50b81cd8
                               at /Users/andrewmartin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:152:18
  38:        0x105822f68 - std::rt::lang_start::{{closure}}::h35b829ef8123bd36
                               at /Users/andrewmartin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:206:18
  39:        0x1076ae474 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hbdf81bf8a260214b
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/ops/function.rs:284:21
  40:        0x1076ae474 - std::panicking::catch_unwind::do_call::ha8cb357339794fb8
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:589:40
  41:        0x1076ae474 - std::panicking::catch_unwind::ha8075325519406b1
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:552:19
  42:        0x1076ae474 - std::panic::catch_unwind::h2d5697971e0bffb5
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panic.rs:359:14
  43:        0x1076ae474 - std::rt::lang_start_internal::{{closure}}::h811a1dd64656dd34
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/rt.rs:175:24
  44:        0x1076ae474 - std::panicking::catch_unwind::do_call::h42d748045da7e361
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:589:40
  45:        0x1076ae474 - std::panicking::catch_unwind::h6429ebd5e89aecb4
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:552:19
  46:        0x1076ae474 - std::panic::catch_unwind::h84e5cb0ee5c2084e
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panic.rs:359:14
  47:        0x1076ae474 - std::rt::lang_start_internal::h9c67c334770c9206
                               at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/rt.rs:171:5
  48:        0x105822f40 - std::rt::lang_start::h383c2f8b12eff7d4
                               at /Users/andrewmartin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:205:5
  49:        0x10585a75c - _main

thread 'main' panicked at library/core/src/panicking.rs:233:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
zsh: abort      cargo run

@antimora antimora requested a review from laggui September 29, 2025 20:00
Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought this was being held up by the test failure, but actually this is captured in #3780 (comment).

LGTM!

@laggui laggui merged commit efa87b3 into tracel-ai:main Sep 29, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants