Skip to content

Conversation

@n1harika
Copy link

@n1harika n1harika commented Jan 8, 2026

Description

This PR disables pre-allocating output buffer memory on device for dynamic models. Now, the device_memory_name|OpenVINO_RT_NPU option can be used with the freeoverride dimension or the reshape input option as such:
onnxruntime_perf_test -e openvino -m duration -t 30 -o 0 -C "session.disable_cpu_ep_fallback|1" -i "device_type|NPU device_memory_name|OpenVINO_RT_NPU reshape_input|input[]"

or

onnxruntime_perf_test -e openvino -m duration -t 30 -o 0 -C "session.disable_cpu_ep_fallback|1" -I -f "dynamic_dimension:100" -i "device_type|NPU device_memory_name|OpenVINO_RT_NPU"

Motivation and Context

This PR addresses the following open-source issue: microsoft#26217

When using device memory allocation (using the flag- device_memory_name|OpenVINO_RT_NPU) with models containing dynamic dimensions, the current code pre-allocates output buffers by converting all -1 dimensions to 1. This prevents the freeoverride dimension (-f) and the reshape input option from working as required. For example, if users specify -f batch:5, the pre-allocated buffer (size=1) is too small for actual inference (size=5), leading to memory crashes.

@n1harika n1harika requested a review from MayureshV1 January 8, 2026 08:22
@n1harika n1harika force-pushed the niharika/perf_test_device_mem_changes branch from b29a692 to 91c444c Compare January 8, 2026 16:36
@MayureshV1 MayureshV1 changed the title Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test CVS-175980: Disable pre-allocation of memory for dynamic models through onnxruntime_perf_test Jan 8, 2026
Copy link

@adrianlizarraga adrianlizarraga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks solid to me. Thank you for fixing it.

bool has_dynamic_output = false;

for (size_t i = 0; i < session_.GetOutputCount(); ++i) {
auto type_info = session_.GetOutputTypeInfo(i);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some minor indentation issues. I only point this out because it will be flagged in the main ORT repo CI.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. will get this resolved.

@MayureshV1 MayureshV1 requested a review from Copilot January 8, 2026 18:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes memory allocation issues for dynamic models when using device memory with the OpenVINO execution provider. Previously, the code pre-allocated output buffers by treating dynamic dimensions (-1) as 1, which caused crashes when users specified larger dimensions via -f or reshape_input options.

Key Changes:

  • Detects dynamic dimensions in model outputs before pre-allocating device memory
  • Skips pre-allocation when dynamic outputs are present, allowing runtime dimension resolution
  • Preserves existing pre-allocation behavior for static models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1052 to 1057
return Ort::Value(nullptr);
};
} else {
new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,
Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: the lambda body at line 1052 should be indented consistently with other lambda bodies in the same context (compare with lines 1023-1025 and 1055-1059).

Suggested change
return Ort::Value(nullptr);
};
} else {
new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,
Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),
return Ort::Value(nullptr);
};
} else {
new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,
Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),

Copilot uses AI. Check for mistakes.

auto transform_fcn = std::function<int64_t(int64_t)>();
auto new_value = std::function<Ort::Value(OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&)>();
transform_fcn = [](int64_t input) { return input; };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declaration of transform_fcn is just a few lines above. If the variable is unconditionally changed to an identity function I think it makes sense to instead remove the variable and skip the transform call below; it's not doing anything.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javier-intel .. transform_fcn is used in line1067
However, there is no conditional seting it anymore.

@adrianlizarraga .. Can comment if the transform function can be removed altogether.

@MayureshV1 MayureshV1 requested a review from Copilot January 8, 2026 21:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1019 to 1060
auto transform_fcn = std::function<int64_t(int64_t)>();
auto new_value = std::function<Ort::Value(OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&)>();
transform_fcn = [](int64_t input) { return input; };
if (device_memory_name_.empty()) {
transform_fcn = [](int64_t input) { return input; };
new_value = [](OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&) {
return Ort::Value(nullptr);
};
} else {
Ort::MemoryInfo memory_info(nullptr); // Default initialize, will be overwritten
if (device_memory_name_ == CUDA) {
memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeDefault);
} else {
memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeCPUOutput);
}
custom_allocator_ = Ort::Allocator(session_, memory_info);
// Switch to custom
allocator_ = Ort::UnownedAllocator(custom_allocator_);

// free dimensions are treated as 1 if not overridden
transform_fcn = [](int64_t input) { return (input == -1) ? -input : input; };
new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape, Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(), tensor_info.GetElementType());
};
// Do not pre-allocate if dynamic dimensions are present
bool has_dynamic_output = false;

for (size_t i = 0; i < session_.GetOutputCount(); ++i) {
auto type_info = session_.GetOutputTypeInfo(i);
auto tensor_info = type_info.GetTensorTypeAndShapeInfo();
auto shape = tensor_info.GetShape();
if (std::any_of(shape.begin(), shape.end(), [](int64_t d) { return d == -1; })) {
has_dynamic_output = true;
break;
}
}

if (has_dynamic_output) {
new_value = [](OrtAllocator*, const std::vector<int64_t>&, Ort::ConstTensorTypeAndShapeInfo&) {
return Ort::Value(nullptr);
};
} else {
new_value = [](OrtAllocator* allocator, const std::vector<int64_t>& output_shape,
Ort::ConstTensorTypeAndShapeInfo& tensor_info) {
return Ort::Value::CreateTensor(allocator, output_shape.data(), output_shape.size(),
tensor_info.GetElementType());
};
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace everything from line 1019-1069 with the following:

Suggested change
if (!device_memory_name_.empty()) {
Ort::MemoryInfo memory_info(nullptr); // Default initialize, will be overwritten
if (device_memory_name_ == CUDA) {
memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeDefault);
} else {
memory_info = Ort::MemoryInfo(device_memory_name_.data(), OrtArenaAllocator, 0, OrtMemTypeCPUOutput);
}
custom_allocator_ = Ort::Allocator(session_, memory_info);
// Switch to custom
allocator_ = Ort::UnownedAllocator(custom_allocator_);
}
for (size_t i = 0; i < output_names_raw_ptr.size(); i++) {
Ort::TypeInfo type_info = session_.GetOutputTypeInfo(i);
auto tensor_info = type_info.GetTensorTypeAndShapeInfo();
std::vector<int64_t> output_shape = tensor_info.GetShape();
auto is_dynamic = std::find(shape.begin(), shape.end(), -1) != shape.end();
if (is_dynamic || device_memory_name_.empty()) {
outputs_.emplace_back(Ort::Value(nullptr));
} else {
auto & new_value = Ort::Value::CreateTensor(allocator_, output_shape.data(), output_shape.size(), tensor_info.GetElementType());
outputs_.emplace_back(std::move(new_value));
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @javier-intel, I've updated the code to check each output and pre-allocate only the static ones as suggested.

@n1harika n1harika force-pushed the niharika/perf_test_device_mem_changes branch from fc9569c to e46e233 Compare January 9, 2026 04:36
@n1harika n1harika force-pushed the niharika/perf_test_device_mem_changes branch from cd6516f to 35e276d Compare January 9, 2026 05:48
@MayureshV1 MayureshV1 merged commit b51ec77 into ovep-develop Jan 9, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants