Skip to content

Conversation

@hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Jan 30, 2026

Description

Refer to V1 of the fix here: #27214

This PR includes all fixes from the V1 PR + logic to invalidate the lhs cache pointers in case the pad buffer's underlying buffer has changed due to a resize. The ARM team will look at potentially enhancing this logic after the 1.24.0 release.

Motivation and Context

Fix #26669

@hariharans29 hariharans29 changed the title Fix Conv LHS packing padding/uninitialized ptrs Fix Conv LHS packing padding/uninitialized ptrs V2 Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses non-deterministic correctness issues by hardening KleidiAI Conv LHS packing/padding behavior and adding an additional CUDA ConvTranspose bias validation.

Changes:

  • Initialize all entries in the KleidiAI Conv LHS indirection table to padding pointers to avoid uninitialized reads for partial tiles.
  • Replace the shared static padding buffer with a thread_local buffer and invalidate cached LHS pointer tables when the padding buffer reallocates.
  • Add CUDA ConvTranspose bias shape validation against the computed output channel count.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
onnxruntime/core/providers/cuda/nn/conv_transpose.cc Adds runtime validation that bias is a 1-D tensor matching num_output_channels.
onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp Initializes LHS pointer table padding entries and updates padding-buffer + cache invalidation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hariharans29 hariharans29 enabled auto-merge (squash) January 30, 2026 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cpu] Loading certain models leads to global error state on M4 Max

4 participants