intel gpu support via target_device parameter #74

dtrawins · 2024-03-26T07:52:03Z

No description provided.

tanmayv25

We should add some testing in our CI for the support target_device.
Something like this: https://github.com/triton-inference-server/server/blob/main/qa/L0_libtorch_inference_mode/test.sh

We can call it L0_openvino_target_device?

tanmayv25 · 2024-05-01T23:03:11Z

src/openvino.cc

-  LOG_IF_ERROR(
-      ReadParameter(params, "CPU_EXTENSION_PATH", &(cpu_ext_path)),
-      "error when reading parameters");
+  ReadParameter(params, "CPU_EXTENSION_PATH", &(cpu_ext_path));


What is the reason for removing the error checking on reading and parsing the parameters, here and everywhere?

That is not related to the GPU support but those error messages were misleading. Error suggests that something is wrong with the setup but it is perfectly fine to skip those parameters

Suggestion: Update ReadParameter to take a value indicating if a missing value is ok / provide a default value

RETURN_IF_ERROR(ReadParameter(params,"OPTIONAL_KEY",&(cpu_ext_path),default_value='foo');

That way it is a little clearer the intent.

Note: suggestion only.

tanmayv25 · 2024-05-01T23:17:05Z

README.md

@@ -88,6 +98,7 @@ to skip the dynamic batch sizes in backend.
 * `ENABLE_BATCH_PADDING`: By default an error will be generated if backend receives a request with batch size less than max_batch_size specified in the configuration. This error can be avoided at a cost of performance by specifying `ENABLE_BATCH_PADDING` parameter as `YES`.
 * `RESHAPE_IO_LAYERS`: By setting this parameter as `YES`, the IO layers are reshaped to the dimensions provided in
 model configuration. By default, the dimensions in the model is used.
+* `TARGET_DEVICE`: Choose the OpenVINO device for running the inference. It could be CPU (default), GPU or any of the virtual devices like AUTO, MULTI, HETERO. Note: using Intel GPU is possible only if `--device /dev/dri` is passed to the container and is supported only on linux with x86_64 arch.


Does OpenVINO support models whose computation is spread across CPU and GPU cores?

Yes, that is possible via using a virtual target device called MULTI which is loadbalancing the request or HETERO target device which spreads a single inference.

tanmayv25 · 2024-05-01T23:25:02Z

src/openvino.cc

 {
-  if (Kind() != TRITONSERVER_INSTANCEGROUPKIND_CPU) {
+  if ((Kind() != TRITONSERVER_INSTANCEGROUPKIND_CPU) &&


I believe this check needs to be updated. From what I see we support following kinds:

CPU

GPU

AUTO: If GPU cores are available and model supports GPU deployment, then use GPU otherwise use CPU.

tanmayv25 · 2024-05-02T00:10:38Z

README.md

@@ -88,6 +98,7 @@ to skip the dynamic batch sizes in backend.
 * `ENABLE_BATCH_PADDING`: By default an error will be generated if backend receives a request with batch size less than max_batch_size specified in the configuration. This error can be avoided at a cost of performance by specifying `ENABLE_BATCH_PADDING` parameter as `YES`.
 * `RESHAPE_IO_LAYERS`: By setting this parameter as `YES`, the IO layers are reshaped to the dimensions provided in
 model configuration. By default, the dimensions in the model is used.
+* `TARGET_DEVICE`: Choose the OpenVINO device for running the inference. It could be CPU (default), GPU or any of the virtual devices like AUTO, MULTI, HETERO. Note: using Intel GPU is possible only if `--device /dev/dri` is passed to the container and is supported only on linux with x86_64 arch.


I don't think it is a good idea introducing a model level TARGET_DEVICE parameter in the model config. Triton allows you to have multiple model instances. And each instance can specify which device to use for the inference.
So they can have model A, which can specify something like below:

instance_group [ { count: 1 kind: KIND_GPU }, { count: 1 kind: KIND_CPU } ]

You can rely on information from TRITONBACKEND_ModelInstanceKind API call to determine the kind of the instance.
If the kind is CPU and the model is not loaded (within model_state_), then load the model on CPU and use it.
If the kind is GPU and the model is not loaded (within model_state_), then load the model on GPU and use it.
This allows sharing of the model across the model instances.

Even if the use-case of loading two separate kinds of model instances is not required, I would still prefer using TRITONBACKEND_ModelInstanceKind API instance of another config parameter.

In openvino, target device contains more options then just the selection between CPU and GPU. It can also set virtual device like MULTI, HETERO or AUTO or BATCH. It can also include extra options like a priority list AUTO:GPU,CPU. My understanding is that settings kind as GPU validated if CUDA GPU is present so that could not be used with other types of devices like iGPU. I couldn't find a way for using KIND as the target device without changes outside of the openvino backend code

In openvino, target device contains more options then just the selection between CPU and GPU. It can also set virtual device like MULTI, HETERO or AUTO or BATCH. It can also include extra options like a priority list AUTO:GPU,CPU. My understanding is that settings kind as GPU validated if CUDA GPU is present so that could not be used with other types of devices like iGPU. I couldn't find a way for using KIND as the target device without changes outside of the openvino backend code

@tanmayv25 do you think we can move forward? Do you need more input from @dtrawins?

@ryanloney - proposal in works - will update

We are in middle of discussing how to best support new hardware platforms. Directing this to @nnshah1.

I think we can update the model configuration to accommodate these fields. It would help keeping the device specification in a single place. The benefit is that in future we might get other backends targetting these devices besides current openVINO.

dtrawins · 2024-05-04T18:29:25Z

We should add some testing in our CI for the support target_device. Something like this: https://github.com/triton-inference-server/server/blob/main/qa/L0_libtorch_inference_mode/test.sh

We can call it L0_openvino_target_device?

Functional tests are under preparation in another PR. They should be easy to integrate in CI.

nnshah1

pending discussion

dtrawins added 4 commits March 26, 2024 08:51

update openvino backend version

b84940e

fix tbb

d69adca

added support for intel gpu and virtual devices

f6a696c

readme corrections

2f41e5f

dtrawins changed the title ~~WIP update openvino backend version to 2024.0~~ WIP update openvino backend version to 2024.0 with intel gpu support Mar 27, 2024

dtrawins mentioned this pull request Mar 27, 2024

Enable iGPU/dGPU plugin of OpenVINO #62

Open

dtrawins added 2 commits March 27, 2024 15:45

link to updated build.py

174aede

separate ov update and fix style

76ab823

dtrawins changed the title ~~WIP update openvino backend version to 2024.0 with intel gpu support~~ intel gpu support via target_device parameter Apr 9, 2024

drop trailing space

c05a182

dtrawins marked this pull request as ready for review April 9, 2024 14:58

Merge branch 'triton-inference-server:main' into ov2024.0

e793b01

dtrawins requested review from kthui and atobiszei April 17, 2024 18:32

tanmayv25 reviewed May 2, 2024

View reviewed changes

nnshah1 requested changes May 24, 2024

View reviewed changes

dtrawins added 4 commits July 24, 2024 08:23

included runtime libraries to execute on GPU

ffc1889

merge from main

948e223

update readme

cda8a94

training spaces

81381d2

dtrawins requested review from nnshah1 and tanmayv25 August 28, 2024 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intel gpu support via target_device parameter #74

intel gpu support via target_device parameter #74

dtrawins commented Mar 26, 2024

tanmayv25 left a comment

tanmayv25 May 1, 2024

dtrawins May 4, 2024

nnshah1 May 21, 2024 •

edited

Loading

tanmayv25 May 1, 2024

dtrawins May 4, 2024

tanmayv25 May 1, 2024

tanmayv25 May 2, 2024

tanmayv25 May 2, 2024

dtrawins May 4, 2024

ryanloney May 21, 2024

nnshah1 May 28, 2024

tanmayv25 May 28, 2024 •

edited

Loading

tanmayv25 May 28, 2024

dtrawins commented May 4, 2024

nnshah1 left a comment

intel gpu support via target_device parameter #74

Are you sure you want to change the base?

intel gpu support via target_device parameter #74

Conversation

dtrawins commented Mar 26, 2024

tanmayv25 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnshah1 May 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanmayv25 May 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dtrawins commented May 4, 2024

nnshah1 left a comment

Choose a reason for hiding this comment

nnshah1 May 21, 2024 •

edited

Loading

tanmayv25 May 28, 2024 •

edited

Loading