-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intel gpu support via target_device parameter #74
base: main
Are you sure you want to change the base?
Changes from all commits
b84940e
d69adca
f6a696c
2f41e5f
174aede
76ab823
c05a182
e793b01
ffc1889
948e223
cda8a94
81381d2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,6 +71,16 @@ but the listed CMake argument can be used to override. | |
* triton-inference-server/core: -DTRITON_CORE_REPO_TAG=[tag] | ||
* triton-inference-server/common: -DTRITON_COMMON_REPO_TAG=[tag] | ||
|
||
## Build a complete image with OpenVINO backend including Intel GPU drivers | ||
|
||
Build the custom triton image with the required runtime drivers using the script from [build.py](https://github.com/dtrawins/server/blob/igpu/build.py). | ||
|
||
``` | ||
python3 build.py --target-platform linux --enable-logging --enable-stats --enable-metrics --enable-cpu-metrics --endpoint grpc --endpoint http --filesystem s3 \ | ||
--backend openvino | ||
``` | ||
|
||
|
||
## Using the OpenVINO Backend | ||
|
||
### Parameters | ||
|
@@ -88,6 +98,7 @@ to skip the dynamic batch sizes in backend. | |
* `ENABLE_BATCH_PADDING`: By default an error will be generated if backend receives a request with batch size less than max_batch_size specified in the configuration. This error can be avoided at a cost of performance by specifying `ENABLE_BATCH_PADDING` parameter as `YES`. | ||
* `RESHAPE_IO_LAYERS`: By setting this parameter as `YES`, the IO layers are reshaped to the dimensions provided in | ||
model configuration. By default, the dimensions in the model is used. | ||
* `TARGET_DEVICE`: Choose the OpenVINO device for running the inference. It could be CPU (default), GPU or any of the virtual devices like AUTO, MULTI, HETERO. Note: using Intel GPU is possible only if `--device /dev/dri` is passed to the container and is supported only on linux with x86_64 arch. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it is a good idea introducing a model level TARGET_DEVICE parameter in the model config. Triton allows you to have multiple model instances. And each instance can specify which device to use for the inference.
You can rely on information from TRITONBACKEND_ModelInstanceKind API call to determine the kind of the instance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even if the use-case of loading two separate kinds of model instances is not required, I would still prefer using TRITONBACKEND_ModelInstanceKind API instance of another config parameter. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In openvino, target device contains more options then just the selection between CPU and GPU. It can also set virtual device like MULTI, HETERO or AUTO or BATCH. It can also include extra options like a priority list AUTO:GPU,CPU. My understanding is that settings kind as GPU validated if CUDA GPU is present so that could not be used with other types of devices like iGPU. I couldn't find a way for using KIND as the target device without changes outside of the openvino backend code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@tanmayv25 do you think we can move forward? Do you need more input from @dtrawins? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ryanloney - proposal in works - will update There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are in middle of discussing how to best support new hardware platforms. Directing this to @nnshah1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can update the model configuration to accommodate these fields. It would help keeping the device specification in a single place. The benefit is that in future we might get other backends targetting these devices besides current openVINO. |
||
|
||
|
||
|
||
|
@@ -231,6 +242,36 @@ string_value:"yes" | |
} | ||
} | ||
``` | ||
### Running the models on Intel GPU | ||
|
||
Add to your config.pbtxt a parameter `TARGET_DEVICE`: | ||
``` | ||
parameters: [ | ||
{ | ||
key: "NUM_STREAMS" | ||
value: { | ||
string_value: "1" | ||
} | ||
}, | ||
{ | ||
key: "PERFORMANCE_HINT" | ||
value: { | ||
string_value: "THROUGHPUT" | ||
} | ||
}, | ||
{ | ||
key: "TARGET_DEVICE" | ||
value: { | ||
string_value: "GPU" | ||
} | ||
} | ||
] | ||
``` | ||
|
||
Start the container with extra parameter to pass the device `/dev/dri`: | ||
``` | ||
docker run -it --rm --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* ) tritonserver:latest | ||
``` | ||
|
||
## Known Issues | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,6 +84,9 @@ class ModelState : public BackendModel { | |
TRITONSERVER_Error* ParseParameter( | ||
const std::string& mkey, triton::common::TritonJson::Value& params, | ||
std::vector<std::pair<std::string, ov::Any>>* device_config); | ||
TRITONSERVER_Error* ParseStringParameter( | ||
const std::string& mkey, triton::common::TritonJson::Value& params, | ||
std::string* value); | ||
TRITONSERVER_Error* ParseParameterHelper( | ||
const std::string& mkey, std::string* value, | ||
std::pair<std::string, ov::Any>* ov_property); | ||
|
@@ -118,6 +121,7 @@ class ModelState : public BackendModel { | |
|
||
bool SkipDynamicBatchSize() { return skip_dynamic_batchsize_; } | ||
bool EnableBatchPadding() { return enable_padding_; } | ||
std::string TargetDevice() { return target_device_; } | ||
|
||
private: | ||
ModelState(TRITONBACKEND_Model* triton_model); | ||
|
@@ -140,6 +144,7 @@ class ModelState : public BackendModel { | |
bool skip_dynamic_batchsize_; | ||
bool enable_padding_; | ||
bool reshape_io_layers_; | ||
std::string target_device_; | ||
}; | ||
|
||
TRITONSERVER_Error* | ||
|
@@ -179,7 +184,7 @@ ModelState::Create(TRITONBACKEND_Model* triton_model, ModelState** state) | |
ModelState::ModelState(TRITONBACKEND_Model* triton_model) | ||
: BackendModel(triton_model), model_read_(false), | ||
skip_dynamic_batchsize_(false), enable_padding_(false), | ||
reshape_io_layers_(false) | ||
reshape_io_layers_(false), target_device_("CPU") | ||
{ | ||
} | ||
|
||
|
@@ -238,12 +243,11 @@ ModelState::ParseParameters() | |
bool status = model_config_.Find("parameters", ¶ms); | ||
if (status) { | ||
RETURN_IF_ERROR(LoadCpuExtensions(params)); | ||
RETURN_IF_ERROR(ParseBoolParameter( | ||
"SKIP_OV_DYNAMIC_BATCHSIZE", params, &skip_dynamic_batchsize_)); | ||
RETURN_IF_ERROR( | ||
ParseBoolParameter("ENABLE_BATCH_PADDING", params, &enable_padding_)); | ||
RETURN_IF_ERROR( | ||
ParseBoolParameter("RESHAPE_IO_LAYERS", params, &reshape_io_layers_)); | ||
ParseBoolParameter( | ||
"SKIP_OV_DYNAMIC_BATCHSIZE", params, &skip_dynamic_batchsize_); | ||
ParseBoolParameter("ENABLE_BATCH_PADDING", params, &enable_padding_); | ||
ParseBoolParameter("RESHAPE_IO_LAYERS", params, &reshape_io_layers_); | ||
ParseStringParameter("TARGET_DEVICE", params, &target_device_); | ||
} | ||
|
||
return nullptr; | ||
|
@@ -256,18 +260,13 @@ ModelState::ParseParameters(const std::string& device) | |
triton::common::TritonJson::Value params; | ||
bool status = model_config_.Find("parameters", ¶ms); | ||
if (status) { | ||
if (device == "CPU") { | ||
config_[device] = {}; | ||
auto& device_config = config_.at(device); | ||
RETURN_IF_ERROR( | ||
ParseParameter("INFERENCE_NUM_THREADS", params, &device_config)); | ||
RETURN_IF_ERROR( | ||
ParseParameter("COMPILATION_NUM_THREADS", params, &device_config)); | ||
RETURN_IF_ERROR(ParseParameter("HINT_BF16", params, &device_config)); | ||
RETURN_IF_ERROR(ParseParameter("NUM_STREAMS", params, &device_config)); | ||
RETURN_IF_ERROR( | ||
ParseParameter("PERFORMANCE_HINT", params, &device_config)); | ||
} | ||
config_[device] = {}; | ||
auto& device_config = config_.at(device); | ||
ParseParameter("INFERENCE_NUM_THREADS", params, &device_config); | ||
ParseParameter("COMPILATION_NUM_THREADS", params, &device_config); | ||
ParseParameter("HINT_BF16", params, &device_config); | ||
ParseParameter("NUM_STREAMS", params, &device_config); | ||
ParseParameter("PERFORMANCE_HINT", params, &device_config); | ||
} | ||
|
||
return nullptr; | ||
|
@@ -277,9 +276,7 @@ TRITONSERVER_Error* | |
ModelState::LoadCpuExtensions(triton::common::TritonJson::Value& params) | ||
{ | ||
std::string cpu_ext_path; | ||
LOG_IF_ERROR( | ||
ReadParameter(params, "CPU_EXTENSION_PATH", &(cpu_ext_path)), | ||
"error when reading parameters"); | ||
ReadParameter(params, "CPU_EXTENSION_PATH", &(cpu_ext_path)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason for removing the error checking on reading and parsing the parameters, here and everywhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is not related to the GPU support but those error messages were misleading. Error suggests that something is wrong with the setup but it is perfectly fine to skip those parameters There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Update ReadParameter to take a value indicating if a missing value is ok / provide a default value
That way it is a little clearer the intent. Note: suggestion only. |
||
if (!cpu_ext_path.empty()) { | ||
// CPU (MKLDNN) extensions is loaded as a shared library and passed as a | ||
// pointer to base extension | ||
|
@@ -301,8 +298,7 @@ ModelState::ParseBoolParameter( | |
bool* setting) | ||
{ | ||
std::string value; | ||
LOG_IF_ERROR( | ||
ReadParameter(params, mkey, &(value)), "error when reading parameters"); | ||
RETURN_IF_ERROR(ReadParameter(params, mkey, &(value))); | ||
std::transform( | ||
value.begin(), value.end(), value.begin(), | ||
[](unsigned char c) { return std::tolower(c); }); | ||
|
@@ -313,14 +309,30 @@ ModelState::ParseBoolParameter( | |
return nullptr; | ||
} | ||
|
||
TRITONSERVER_Error* | ||
ModelState::ParseStringParameter( | ||
const std::string& mkey, triton::common::TritonJson::Value& params, | ||
std::string* setting) | ||
{ | ||
std::string value; | ||
RETURN_IF_ERROR(ReadParameter(params, mkey, &(value))); | ||
std::transform( | ||
value.begin(), value.end(), value.begin(), | ||
[](unsigned char c) { return std::toupper(c); }); | ||
if (value.length() > 0) { | ||
*setting = value; | ||
} | ||
|
||
return nullptr; | ||
} | ||
|
||
TRITONSERVER_Error* | ||
ModelState::ParseParameter( | ||
const std::string& mkey, triton::common::TritonJson::Value& params, | ||
std::vector<std::pair<std::string, ov::Any>>* device_config) | ||
{ | ||
std::string value; | ||
LOG_IF_ERROR( | ||
ReadParameter(params, mkey, &(value)), "error when reading parameters"); | ||
RETURN_IF_ERROR(ReadParameter(params, mkey, &(value))); | ||
if (!value.empty()) { | ||
std::pair<std::string, ov::Any> ov_property; | ||
RETURN_IF_ERROR(ParseParameterHelper(mkey, &value, &ov_property)); | ||
|
@@ -410,6 +422,16 @@ ModelState::ParseParameterHelper( | |
TRITONSERVER_Error* | ||
ModelState::ConfigureOpenvinoCore() | ||
{ | ||
auto availableDevices = ov_core_.get_available_devices(); | ||
std::stringstream list_of_devices; | ||
|
||
for (auto& element : availableDevices) { | ||
list_of_devices << element << ","; | ||
} | ||
LOG_MESSAGE( | ||
TRITONSERVER_LOG_VERBOSE, | ||
(std::string("Available OpenVINO devices: " + list_of_devices.str())) | ||
.c_str()); | ||
for (auto&& item : config_) { | ||
std::string device_name = item.first; | ||
std::vector<std::pair<std::string, ov::Any>> properties = item.second; | ||
|
@@ -438,9 +460,10 @@ ModelState::LoadModel( | |
std::to_string(OPENVINO_VERSION_MINOR) + "." + | ||
std::to_string(OPENVINO_VERSION_PATCH)) | ||
.c_str()); | ||
|
||
LOG_MESSAGE( | ||
TRITONSERVER_LOG_VERBOSE, | ||
(std::string("Device info: \n") + | ||
(std::string("Device info: ") + | ||
ConvertVersionMapToString(ov_core_.get_versions(device))) | ||
.c_str()); | ||
|
||
|
@@ -932,19 +955,27 @@ ModelInstanceState::Create( | |
ModelInstanceState::ModelInstanceState( | ||
ModelState* model_state, TRITONBACKEND_ModelInstance* triton_model_instance) | ||
: BackendModelInstance(model_state, triton_model_instance), | ||
model_state_(model_state), device_("CPU"), batch_pad_size_(0) | ||
model_state_(model_state), device_(model_state->TargetDevice()), | ||
batch_pad_size_(0) | ||
{ | ||
if (Kind() != TRITONSERVER_INSTANCEGROUPKIND_CPU) { | ||
if ((Kind() != TRITONSERVER_INSTANCEGROUPKIND_CPU) && | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this check needs to be updated. From what I see we support following kinds:
|
||
(Kind() != TRITONSERVER_INSTANCEGROUPKIND_AUTO)) { | ||
throw triton::backend::BackendModelInstanceException(TRITONSERVER_ErrorNew( | ||
TRITONSERVER_ERROR_INVALID_ARG, | ||
(std::string("unable to load model '") + model_state_->Name() + | ||
"', Triton openVINO backend supports only CPU device") | ||
"', Triton OpenVINO backend supports only Kind CPU and AUTO") | ||
.c_str())); | ||
} | ||
|
||
if (model_state_->ModelNotRead()) { | ||
std::string model_path; | ||
THROW_IF_BACKEND_INSTANCE_ERROR(model_state_->ParseParameters()); | ||
device_ = model_state->TargetDevice(); | ||
LOG_MESSAGE( | ||
TRITONSERVER_LOG_INFO, | ||
(std::string("Target device " + device_)).c_str()); | ||
|
||
|
||
THROW_IF_BACKEND_INSTANCE_ERROR( | ||
model_state_->ReadModel(ArtifactFilename(), &model_path)); | ||
THROW_IF_BACKEND_INSTANCE_ERROR(model_state_->ValidateConfigureModel()); | ||
|
@@ -1519,8 +1550,7 @@ TRITONBACKEND_ModelInstanceInitialize(TRITONBACKEND_ModelInstance* instance) | |
LOG_MESSAGE( | ||
TRITONSERVER_LOG_INFO, | ||
(std::string("TRITONBACKEND_ModelInstanceInitialize: ") + name + " (" + | ||
TRITONSERVER_InstanceGroupKindString(kind) + " device " + | ||
std::to_string(device_id) + ")") | ||
TRITONSERVER_InstanceGroupKindString(kind) + ")") | ||
.c_str()); | ||
|
||
// Get the model state associated with this instance's model. | ||
|
@@ -1608,7 +1638,7 @@ TRITONBACKEND_GetBackendAttribute( | |
TRITONSERVER_LOG_VERBOSE, | ||
"TRITONBACKEND_GetBackendAttribute: setting attributes"); | ||
RETURN_IF_ERROR(TRITONBACKEND_BackendAttributeAddPreferredInstanceGroup( | ||
backend_attributes, TRITONSERVER_INSTANCEGROUPKIND_CPU, 0, nullptr, 0)); | ||
backend_attributes, TRITONSERVER_INSTANCEGROUPKIND_AUTO, 0, nullptr, 0)); | ||
|
||
return nullptr; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does OpenVINO support models whose computation is spread across CPU and GPU cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is possible via using a virtual target device called MULTI which is loadbalancing the request or HETERO target device which spreads a single inference.