Added load_config feature details and updated OV version

TejalKhade28 · TejalKhade28 · commit 0ec2dc775322 · 2025-04-09T01:22:54.000-07:00
diff --git a/docs/build/eps.md b/docs/build/eps.md
@@ -277,11 +277,11 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
 
 1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3** for the appropriate OS and target hardware:
    * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
-   * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
+   * [Linux - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
 
    Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
 
-  *2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
+  *2024.5 is the current recommended OpenVINO™ version. [OpenVINO™ 2024.5](https://docs.openvino.ai/2024/index.html) is minimal OpenVINO™ version requirement.*
 
 2. Configure the target hardware with specific follow on instructions:
    * To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#linux)
diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md
@@ -20,7 +20,7 @@ Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution
 ## Install
 
 Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release.
-* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.4 Release](https://github.com/intel/onnxruntime/releases)
+* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.6 Release](https://github.com/intel/onnxruntime/releases)
 * Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/)
 * Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20)
 
@@ -230,6 +230,46 @@ Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/in
 Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled. 
 Refer to [Configuration Options](#configuration-options) for more information about using these runtime options.
 
+### Loading Custom JSON OV Config During Runtime
+This feature is developed to facilitate loading of OVEP parameters from a single JSON configuration file.
+The JSON input schema must be of format -
+```
+{
+    "DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"}
+}
+```
+where "DEVICE_KEY" can be CPU, NPU or GPU , "PROPERTY" must be a valid entity defined in OV from its properties.hpp sections and "PROPERTY_VALUE" must be passed in as a string. If we pass any other type like int/bool we encounter errors from ORT like below -
+
+Exception during initialization: [json.exception.type_error.302] type must be string, but is a number.
+
+While one can set the int/bool values like this "NPU_TILES": "2" which is valid (refer to the example given below).
+If someone passes incorrect keys, it will be skipped with a warning while incorrect values assigned to a valid key will result in an exception arising from OV framework.
+ 
+The valid properties are of 2 types viz. MUTABLE (R/W) & IMMUTABLE (R ONLY) these are also governed while setting the same. If an IMMUTABLE property is being set, we skip setting the same with a similar warning.
+
+Example:
+
+The usage of this functionality using onnxruntime_perf_test application is as below – 
+
+```
+onnxruntime_perf_test.exe -e openvino -m times -r 1 -i "device_type|NPU load_config|npu_config.json" model.onnx 
+```
+where the npu_config.json file is defined as below –
+
+```bash
+{
+  "NPU": {
+    "PERFORMANCE_HINT": "THROUGHPUT",
+    "WORKLOAD_TYPE": "Efficient",
+    "NPU_TILES": "2",
+    "LOG_LEVEL": "LOG_DEBUG",
+    "NPU_COMPILATION_MODE_PARAMS": "enable-weights-swizzling=false enable-activation-swizzling=false enable-grouped-matmul=false"
+  }
+}
+
+```
+To explicitly enable logs one must use "LOG_LEVEL": "LOG_DEBUG"  in the JSON device configuration property. The log verifies that the correct device parameters and properties are being set / populated during runtime with OVEP.
+
 ### OpenVINO Execution Provider Supports EP-Weight Sharing across sessions
 The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports EP-Weight Sharing, enabling models to efficiently share weights across multiple inference sessions. This feature enhances the execution of Large Language Models (LLMs) with prefill and KV cache, reducing memory consumption and improving performance when running multiple inferences.
 
@@ -238,7 +278,7 @@ With EP-Weight Sharing, prefill and KV cache models can now reuse the same set o
 These changes enable weight sharing between two models using the session context option: ep.share_ep_contexts.
 Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/5068ab9b190c549b546241aa7ffbe5007868f595/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L319) for more details on configuring this runtime option.
 
- ### OVEP suppports CreateSessionFromArray API
+ ### OVEP supports CreateSessionFromArray API 
  The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports creating sessions from memory using the CreateSessionFromArray API. This allows loading models directly from memory buffers instead of file paths. The CreateSessionFromArray loads the model in memory then creates a session from the in-memory byte array.
  
  Note:
@@ -268,11 +308,12 @@ options[num_streams] = "8";
 options[cache_dir] = "";
 options[context] = "0x123456ff";
 options[enable_qdq_optimizer] = "True";
+options[load_config] = "config_path.json";
 session_options.AppendExecutionProvider_OpenVINO_V2(options);
 ```
 
 ### C/C++ Legacy API 
-Note: This api is no longer officially supported. Users are requested to move to V2 API. 
+Note: This API is no longer officially supported. Users are requested to move to V2 API. 
 
 The session configuration options are passed to SessionOptionsAppendExecutionProvider_OpenVINO() API as shown in an example below for GPU device type:
 
@@ -315,6 +356,7 @@ The following table lists all the available configuration options for API 2.0 an
 | context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.|
 | enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
 | enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
+| load_config | string | Any custom JSON path | string | This option enables a feature for loading custom JSON OV config during runtime which sets OV parameters. |
 
 
 Valid Hetero or Multi or Auto Device combinations: