CVS-175980: Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test #896

n1harika · 2026-01-08T08:18:48Z

Description

This PR disables pre-allocating output buffer memory on device for dynamic models. Now, the device_memory_name|OpenVINO_RT_NPU option can be used with the freeoverride dimension or the reshape input option as such:
onnxruntime_perf_test -e openvino -m duration -t 30 -o 0 -C "session.disable_cpu_ep_fallback|1" -i "device_type|NPU device_memory_name|OpenVINO_RT_NPU reshape_input|input[]"

or

onnxruntime_perf_test -e openvino -m duration -t 30 -o 0 -C "session.disable_cpu_ep_fallback|1" -I -f "dynamic_dimension:100" -i "device_type|NPU device_memory_name|OpenVINO_RT_NPU"

Motivation and Context

This PR addresses the following open-source issue: microsoft#26217

When using device memory allocation (using the flag- device_memory_name|OpenVINO_RT_NPU) with models containing dynamic dimensions, the current code pre-allocates output buffers by converting all -1 dimensions to 1. This prevents the freeoverride dimension (-f) and the reshape input option from working as required. For example, if users specify -f batch:5, the pre-allocated buffer (size=1) is too small for actual inference (size=5), leading to memory crashes.

onnxruntime/test/perftest/ort_test_session.cc

adrianlizarraga

This looks solid to me. Thank you for fixing it.

adrianlizarraga · 2026-01-08T18:24:38Z

onnxruntime/test/perftest/ort_test_session.cc

+    bool has_dynamic_output = false;
+
+    for (size_t i = 0; i < session_.GetOutputCount(); ++i) {
+    auto type_info = session_.GetOutputTypeInfo(i);


nit: some minor indentation issues. I only point this out because it will be flagged in the main ORT repo CI.

Thanks. will get this resolved.

Copilot

Pull request overview

This PR fixes memory allocation issues for dynamic models when using device memory with the OpenVINO execution provider. Previously, the code pre-allocated output buffers by treating dynamic dimensions (-1) as 1, which caused crashes when users specified larger dimensions via -f or reshape_input options.

Key Changes:

Detects dynamic dimensions in model outputs before pre-allocating device memory
Skips pre-allocation when dynamic outputs are present, allowing runtime dimension resolution
Preserves existing pre-allocation behavior for static models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/test/perftest/ort_test_session.cc

Copilot · 2026-01-08T18:32:44Z

onnxruntime/test/perftest/ort_test_session.cc

+      return Ort::Value(nullptr);
+      };
+    } else {
+      new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,
+                   Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
+      return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),


Inconsistent indentation: the lambda body at line 1052 should be indented consistently with other lambda bodies in the same context (compare with lines 1023-1025 and 1055-1059).

Suggested change

return Ort::Value(nullptr);

};

} else {

new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,

Ort::ConstTensorTypeAndShapeInfo& tensor_info) {

return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),

return Ort::Value(nullptr);

};

} else {

new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,

Ort::ConstTensorTypeAndShapeInfo& tensor_info) {

return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),

onnxruntime/test/perftest/ort_test_session.cc

javier-intel · 2026-01-08T18:59:54Z

onnxruntime/test/perftest/ort_test_session.cc


  auto transform_fcn = std::function<int64_t(int64_t)>();
  auto new_value = std::function<Ort::Value(OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&)>();
+  transform_fcn = [](int64_t input) { return input; };


The declaration of transform_fcn is just a few lines above. If the variable is unconditionally changed to an identity function I think it makes sense to instead remove the variable and skip the transform call below; it's not doing anything.

@javier-intel .. transform_fcn is used in line1067
However, there is no conditional seting it anymore.

@adrianlizarraga .. Can comment if the transform function can be removed altogether.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/test/perftest/ort_test_session.cc

javier-intel · 2026-01-08T22:19:46Z

onnxruntime/test/perftest/ort_test_session.cc

  auto transform_fcn = std::function<int64_t(int64_t)>();
  auto new_value = std::function<Ort::Value(OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&)>();
+  transform_fcn = [](int64_t input) { return input; };
  if (device_memory_name_.empty()) {
-    transform_fcn = [](int64_t input) { return input; };
    new_value = [](OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&) {
      return Ort::Value(nullptr);
    };
  } else {
    Ort::MemoryInfo memory_info(nullptr);  // Default initialize, will be overwritten
    if (device_memory_name_ == CUDA) {
      memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeDefault);
    } else {
      memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeCPUOutput);
    }
    custom_allocator_ = Ort::Allocator(session_, memory_info);
    // Switch to custom
    allocator_ = Ort::UnownedAllocator(custom_allocator_);

-    // free dimensions are treated as 1 if not overridden
-    transform_fcn = [](int64_t input) { return (input == -1) ? -input : input; };
-    new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape, Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
-      return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(), tensor_info.GetElementType());
-    };
+    // Do not pre-allocate if dynamic dimensions are present
+    bool has_dynamic_output = false;
+
+    for (size_t i = 0; i < session_.GetOutputCount(); ++i) {
+      auto type_info = session_.GetOutputTypeInfo(i);
+      auto tensor_info = type_info.GetTensorTypeAndShapeInfo();
+      auto shape = tensor_info.GetShape();
+      if (std::any_of(shape.begin(), shape.end(), [](int64_t d) { return d == -1; })) {
+        has_dynamic_output = true;
+        break;
+      }
+    }
+
+    if (has_dynamic_output) {
+      new_value = [](OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&) {
+        return Ort::Value(nullptr);
+      };
+    } else {
+      new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,
+                   Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
+        return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),
+                                     tensor_info.GetElementType());
+      };
+    }


I'd replace everything from line 1019-1069 with the following:

Suggested change

if (!device_memory_name_.empty()) {

Ort::MemoryInfo memory_info(nullptr); // Default initialize, will be overwritten

if (device_memory_name_ == CUDA) {

memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeDefault);

} else {

memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeCPUOutput);

}

custom_allocator_ = Ort::Allocator(session_, memory_info);

// Switch to custom

allocator_ = Ort::UnownedAllocator(custom_allocator_);

}

for (size_t i = 0; i < output_names_raw_ptr.size(); i++) {

Ort::TypeInfo type_info = session_.GetOutputTypeInfo(i);

auto tensor_info = type_info.GetTensorTypeAndShapeInfo();

std::vector<int64_t> output_shape = tensor_info.GetShape();

auto is_dynamic = std::find(shape.begin(), shape.end(), -1) != shape.end();

if (is_dynamic || device_memory_name_.empty()) {

outputs_.emplace_back(Ort::Value(nullptr));

} else {

auto & new_value = Ort::Value::CreateTensor(allocator_, output_shape.data(), output_shape.size(), tensor_info.GetElementType());

outputs_.emplace_back(std::move(new_value));

}

}

Thanks @javier-intel, I've updated the code to check each output and pre-allocate only the static ones as suggested.

n1harika requested a review from MayureshV1 January 8, 2026 08:22

MayureshV1 requested changes Jan 8, 2026

View reviewed changes

onnxruntime/test/perftest/ort_test_session.cc Outdated Show resolved Hide resolved

onnxruntime/test/perftest/ort_test_session.cc Outdated Show resolved Hide resolved

n1harika force-pushed the niharika/perf_test_device_mem_changes branch from b29a692 to 91c444c Compare January 8, 2026 16:36

MayureshV1 changed the title ~~Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test~~ CVS-175980: Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test Jan 8, 2026

adrianlizarraga approved these changes Jan 8, 2026

View reviewed changes

MayureshV1 requested a review from Copilot January 8, 2026 18:32

MayureshV1 approved these changes Jan 8, 2026

View reviewed changes

Copilot AI reviewed Jan 8, 2026

View reviewed changes

javier-intel requested changes Jan 8, 2026

View reviewed changes

MayureshV1 requested a review from Copilot January 8, 2026 21:24

Copilot AI reviewed Jan 8, 2026

View reviewed changes

onnxruntime/test/perftest/ort_test_session.cc Outdated Show resolved Hide resolved

javier-intel requested changes Jan 8, 2026

View reviewed changes

n1harika force-pushed the niharika/perf_test_device_mem_changes branch from fc9569c to e46e233 Compare January 9, 2026 04:36

n1harika added 3 commits January 9, 2026 11:18

Disable pre-allocation for dynamic models through onnxruntime_perf_test

596934f

Address review comments

6c201f7

Address review comments 2

35e276d

n1harika force-pushed the niharika/perf_test_device_mem_changes branch from cd6516f to 35e276d Compare January 9, 2026 05:48

Remove reference type

b9e7509

javier-intel approved these changes Jan 9, 2026

View reviewed changes

MayureshV1 merged commit b51ec77 into ovep-develop Jan 9, 2026
6 of 8 checks passed

CVS-175980: Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test #896

CVS-175980: Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test #896

Uh oh!

Conversation

n1harika commented Jan 8, 2026

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

adrianlizarraga left a comment

Choose a reason for hiding this comment

Uh oh!

adrianlizarraga Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

MayureshV1 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

javier-intel Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

MayureshV1 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

javier-intel Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

n1harika Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants