-
Notifications
You must be signed in to change notification settings - Fork 692
Bump cubecl to use wgpu 26 #3657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3657 +/- ##
==========================================
- Coverage 63.02% 63.01% -0.01%
==========================================
Files 1042 1042
Lines 120602 120602
==========================================
- Hits 76008 75998 -10
- Misses 44594 44604 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
GPU CI test failures are unrelated to the wgpu 26 upgrade
This is an autotune + fusion bug. Locally I cannot reproduce, because autotune does not select the tunable that fails. It either selects the fallback or the simple unit matmul.
The simple and double vecmat tunables get the same error, but are not selected (as expected).
|
Shared on discord, but adding here for context: (hidden autotune cache log since it may not be the root)``` { "key": { "key": { "matmul_key": { "definition": { "m": 2, "n": 2, "k": 2, "lhs_pow2_factor": 1, "rhs_pow2_factor": 1, "elem_lhs": { "Float": "F32" }, "elem_rhs": { "Float": "F32" }, "elem_out": { "Float": "F32" }, "matrix_layout_lhs": { "MildlyPermuted": { "transposed": true, "batch_swap": false } }, "matrix_layout_rhs": "Contiguous" }, "analysis": { "scale_global": "Small", "kind": "General" } }, "num_out_buffers": 0, "num_ops": 4 }, "checksum": "b460b2b5faab200d678115506a9ba0b1" }, "value": { "fastest_index": 1, "results": [ { "Ok": { "name": "cubecl_runtime::tune::function_tunable::FunctionTunable, fn(burn_cubecl_fusion::tune::TuneInput>) -> core::result::Result, alloc::string::String>>", "index": 1, "computation": { "mean": { "secs": 0, "nanos": 0 }, "median": { "secs": 0, "nanos": 0 }, "variance": { "secs": 0, "nanos": 0 }, "min": { "secs": 0, "nanos": 0 }, "max": { "secs": 0, "nanos": 0 } } } }, { "Ok": { "name": "cubecl_runtime::tune::function_tunable::FunctionTunable, fn(burn_cubecl_fusion::tune::TuneInput>) -> core::result::Result, alloc::string::String>>", "index": 2, "computation": { "mean": { "secs": 0, "nanos": 0 }, "median": { "secs": 0, "nanos": 0 }, "variance": { "secs": 0, "nanos": 0 }, "min": { "secs": 0, "nanos": 0 }, "max": { "secs": 0, "nanos": 0 } } } }, { "Ok": { "name": "cubecl_runtime::tune::function_tunable::FunctionTunable, fn(burn_cubecl_fusion::tune::TuneInput>) -> core::result::Result, alloc::string::String>>", "index": 3, "computation": { "mean": { "secs": 0, "nanos": 0 }, "median": { "secs": 0, "nanos": 0 }, "variance": { "secs": 0, "nanos": 0 }, "min": { "secs": 0, "nanos": 0 }, "max": { "secs": 0, "nanos": 0 } } } }, { "Ok": { "name": "cubecl_runtime::tune::function_tunable::FunctionTunable, fn(burn_cubecl_fusion::tune::TuneInput>) -> core::result::Result, alloc::string::String>>", "index": 5, "computation": { "mean": { "secs": 0, "nanos": 0 }, "median": { "secs": 0, "nanos": 0 }, "variance": { "secs": 0, "nanos": 0 }, "min": { "secs": 0, "nanos": 0 }, "max": { "secs": 0, "nanos": 0 } } } }, { "Ok": { "name": "cubecl_runtime::tune::function_tunable::FunctionTunable, fn(burn_cubecl_fusion::tune::TuneInput>) -> core::result::Result, alloc::string::String>>", "index": 6, "computation": { "mean": { "secs": 0, "nanos": 0 }, "median": { "secs": 0, "nanos": 0 }, "variance": { "secs": 0, "nanos": 0 }, "min": { "secs": 0, "nanos": 0 }, "max": { "secs": 0, "nanos": 0 } } } }, { "Ok": { "name": "cubecl_runtime::tune::function_tunable::FunctionTunable, fn(burn_cubecl_fusion::tune::TuneInput>) -> core::result::Result, alloc::string::String>>", "index": 7, "computation": { "mean": { "secs": 0, "nanos": 0 }, "median": { "secs": 0, "nanos": 0 }, "variance": { "secs": 0, "nanos": 0 }, "min": { "secs": 0, "nanos": 0 }, "max": { "secs": 0, "nanos": 0 } } } }, { "Ok": { "name": "cubecl_runtime::tune::function_tunable::FunctionTunable, fn(burn_cubecl_fusion::tune::TuneInput>) -> core::result::Result, alloc::string::String>>", "index": 0, "computation": { "mean": { "secs": 0, "nanos": 5316 }, "median": { "secs": 0, "nanos": 5320 }, "variance": { "secs": 0, "nanos": 0 }, "min": { "secs": 0, "nanos": 5080 }, "max": { "secs": 0, "nanos": 5760 } } } }, { "Err": "Skip" }, { "Err": "Skip" }, { "Err": "Skip" } ] } } ```Seems that the kernels don't actually return an error during autotune but clearly they didn't execute anything (all zero timings except the fallback). So one of the failing kernels was selected, but at runtime the actual error appears. /edit: hmm it might not be only a timing issue according to the debug info I added to the CI in this run. Timings look OK, and some kernels use vecmat as the fastest index, but at runtime it fails. The same test doesn't fail in isolation because the vecmat matmul algo will failed during autotune setup (expected). But when you run the whole test suite, another test with the same matmul configs selects that algo (because the line sizes are ok in this case). So when executing the problematic test, it re-uses the selected algo from the cache but it breaks at runtime (unexpected). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing CI is unrelated to the wgpu 26 upgrade, so let's move forward with it
Pull Request Template
Checklist
cargo run-checks
command has been executed.Related Issues/PRs
tracel-ai/cubecl#850
Changes
Use wgpu 26 :)
Testing
CI and
cargo run-checks