can i server my model with triton inference server #761

leo-XUKANG · 2021-12-03T14:10:51Z

❓ Question

What you have already tried

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0):
CPU Architecture:
OS (e.g., Linux):
How you installed PyTorch (conda, pip, libtorch, source):
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version:
CUDA version:
GPU models and configuration:
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

leo-XUKANG · 2021-12-03T14:11:42Z

after torch-tensorrt complie

narendasan · 2021-12-06T17:50:28Z

We are currently working on integration with Triton's LibTorch backend so this workflow is supported out of the box.

cc: @borisfom

ShivamShrirao · 2021-12-13T12:05:05Z

Can we serve it with tensorrt backend in Triton ? I tried converting to trt engine using the following code.

trt_ts_engine = torch_tensorrt.ts.convert_method_to_trt_engine(traced_script_module,
                                                               method_name='forward',
                                                               inputs=inputs,
                                                               truncate_long_and_double=True)
with open(f"{OUT_PATH}/model.plan", 'wb') as f:
    f.write(trt_ts_engine)

But I get this "Version tag does not match" error in Triton.

E1213 11:30:16.510379 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::34] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 43, Serialized Engine Version: 0)
E1213 11:30:16.510406 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::75] Error Code 4: Internal Error (Engine deserialization failed.)
E1213 11:30:16.646657 1 model_repository_manager.cc:1186] failed to load 'rbg_720_outer' version 1: Internal: unable to create TensorRT engine

Using:
TensorRT version: 8.0.3.4
Triton Inference Server container image, release 21.11

Metareflektor · 2021-12-17T08:35:10Z

Can we serve it with tensorrt backend in Triton ? I tried converting to trt engine using the following code.

trt_ts_engine = torch_tensorrt.ts.convert_method_to_trt_engine(traced_script_module,
                                                               method_name='forward',
                                                               inputs=inputs,
                                                               truncate_long_and_double=True)
with open(f"{OUT_PATH}/model.plan", 'wb') as f:
    f.write(trt_ts_engine)

But I get this "Version tag does not match" error in Triton.

E1213 11:30:16.510379 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::34] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 43, Serialized Engine Version: 0)
E1213 11:30:16.510406 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::75] Error Code 4: Internal Error (Engine deserialization failed.)
E1213 11:30:16.646657 1 model_repository_manager.cc:1186] failed to load 'rbg_720_outer' version 1: Internal: unable to create TensorRT engine

Using: TensorRT version: 8.0.3.4 Triton Inference Server container image, release 21.11

It's working for me for fp32/fp16 with nvcr.io/nvidia/pytorch:21.11-py3 and nvcr.io/nvidia/tritonserver:21.11-py3 images.

github-actions · 2022-03-18T00:02:30Z

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

lminer · 2024-09-12T16:27:04Z

Any updates? Is this working yet on triton server?

leo-XUKANG added the question Further information is requested label Dec 3, 2021

github-actions bot added the No Activity label Mar 18, 2022

github-actions bot closed this as completed Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can i server my model with triton inference server #761

can i server my model with triton inference server #761

leo-XUKANG commented Dec 3, 2021

leo-XUKANG commented Dec 3, 2021

narendasan commented Dec 6, 2021

ShivamShrirao commented Dec 13, 2021 •

edited

Loading

Metareflektor commented Dec 17, 2021 •

edited

Loading

github-actions bot commented Mar 18, 2022

lminer commented Sep 12, 2024

can i server my model with triton inference server #761

can i server my model with triton inference server #761

Comments

leo-XUKANG commented Dec 3, 2021

❓ Question

What you have already tried

Environment

Additional context

leo-XUKANG commented Dec 3, 2021

narendasan commented Dec 6, 2021

ShivamShrirao commented Dec 13, 2021 • edited Loading

Metareflektor commented Dec 17, 2021 • edited Loading

github-actions bot commented Mar 18, 2022

lminer commented Sep 12, 2024

ShivamShrirao commented Dec 13, 2021 •

edited

Loading

Metareflektor commented Dec 17, 2021 •

edited

Loading