-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Fix Conv LHS packing padding/uninitialized ptrs V2 #27215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Addresses non-deterministic correctness issues by hardening KleidiAI Conv LHS packing/padding behavior and adding an additional CUDA ConvTranspose bias validation.
Changes:
- Initialize all entries in the KleidiAI Conv LHS indirection table to padding pointers to avoid uninitialized reads for partial tiles.
- Replace the shared static padding buffer with a
thread_localbuffer and invalidate cached LHS pointer tables when the padding buffer reallocates. - Add CUDA ConvTranspose bias shape validation against the computed output channel count.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
onnxruntime/core/providers/cuda/nn/conv_transpose.cc |
Adds runtime validation that bias is a 1-D tensor matching num_output_channels. |
onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp |
Initializes LHS pointer table padding entries and updates padding-buffer + cache invalidation behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
Refer to V1 of the fix here: #27214
This PR includes all fixes from the V1 PR + logic to invalidate the lhs cache pointers in case the pad buffer's underlying buffer has changed due to a resize. The ARM team will look at potentially enhancing this logic after the 1.24.0 release.
Motivation and Context
Fix #26669