You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/build/eps.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -277,11 +277,11 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
277
277
278
278
1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3**for the appropriate OS and target hardware:
Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
283
283
284
-
*2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
284
+
*2024.5 is the current recommended OpenVINO™ version. [OpenVINO™ 2024.5](https://docs.openvino.ai/2024/index.html) is minimal OpenVINO™ version requirement.*
285
285
286
286
2. Configure the target hardware with specific follow on instructions:
287
287
* To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#windows), [Linux](https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-gpu.html#linux)
@@ -230,6 +230,46 @@ Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/in
230
230
Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled.
231
231
Refer to [Configuration Options](#configuration-options) for more information about using these runtime options.
232
232
233
+
### Loading Custom JSON OV Config During Runtime
234
+
This feature is developed to facilitate loading of OVEP parameters from a single JSON configuration file.
235
+
The JSON input schema must be of format -
236
+
```
237
+
{
238
+
"DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"}
239
+
}
240
+
```
241
+
where "DEVICE_KEY" can be CPU, NPU or GPU , "PROPERTY" must be a valid entity defined in OV from its properties.hpp sections and "PROPERTY_VALUE" must be passed in as a string. If we pass any other type like int/bool we encounter errors from ORT like below -
242
+
243
+
Exception during initialization: [json.exception.type_error.302] type must be string, but is a number.
244
+
245
+
While one can set the int/bool values like this "NPU_TILES": "2" which is valid (refer to the example given below).
246
+
If someone passes incorrect keys, it will be skipped with a warning while incorrect values assigned to a valid key will result in an exception arising from OV framework.
247
+
248
+
The valid properties are of 2 types viz. MUTABLE (R/W) & IMMUTABLE (R ONLY) these are also governed while setting the same. If an IMMUTABLE property is being set, we skip setting the same with a similar warning.
249
+
250
+
Example:
251
+
252
+
The usage of this functionality using onnxruntime_perf_test application is as below –
To explicitly enable logs one must use "LOG_LEVEL": "LOG_DEBUG" in the JSON device configuration property. The log verifies that the correct device parameters and properties are being set / populated during runtime with OVEP.
272
+
233
273
### OpenVINO Execution Provider Supports EP-Weight Sharing across sessions
234
274
The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports EP-Weight Sharing, enabling models to efficiently share weights across multiple inference sessions. This feature enhances the execution of Large Language Models (LLMs) with prefill and KV cache, reducing memory consumption and improving performance when running multiple inferences.
235
275
@@ -238,7 +278,7 @@ With EP-Weight Sharing, prefill and KV cache models can now reuse the same set o
238
278
These changes enable weight sharing between two models using the session context option: ep.share_ep_contexts.
239
279
Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/5068ab9b190c549b546241aa7ffbe5007868f595/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L319) for more details on configuring this runtime option.
240
280
241
-
### OVEP suppports CreateSessionFromArray API
281
+
### OVEP supports CreateSessionFromArray API
242
282
The OpenVINO Execution Provider (OVEP) in ONNX Runtime supports creating sessions from memory using the CreateSessionFromArray API. This allows loading models directly from memory buffers instead of file paths. The CreateSessionFromArray loads the model in memory then creates a session from the in-memory byte array.
Note: This api is no longer officially supported. Users are requested to move to V2 API.
316
+
Note: This API is no longer officially supported. Users are requested to move to V2 API.
276
317
277
318
The session configuration options are passed to SessionOptionsAppendExecutionProvider_OpenVINO() API as shown in an example below for GPU device type:
278
319
@@ -315,6 +356,7 @@ The following table lists all the available configuration options for API 2.0 an
315
356
| context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.|
316
357
| enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
317
358
| enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
359
+
| load_config | string | Any custom JSON path | string | This option enables a feature for loading custom JSON OV config during runtime which sets OV parameters. |
318
360
319
361
320
362
Valid Hetero or Multi or Auto Device combinations:
0 commit comments