You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am aware that PyTriton already have an example for using PyTriton with tensorrt_llm. But I noticed that the example only support single gpu inference. Therefore, may I ask is there any other examples or reference docs which using tensorrt_llm with PyTriton and support tensor parallelism.
Describe the solution you'd like
I think right now the example is excellent, but will be more comprehensive if can add multiple gpu inference(tensor parallelism inference) examples since this will be one of the widely use case.
The text was updated successfully, but these errors were encountered:
JoeLiu996
changed the title
Tensor parallelism for tensorrt_llm
[Question] Tensor parallelism for tensorrt_llm
Jul 5, 2024
Is your feature request related to a problem? Please describe.
I am aware that PyTriton already have an example for using PyTriton with tensorrt_llm. But I noticed that the example only support single gpu inference. Therefore, may I ask is there any other examples or reference docs which using tensorrt_llm with PyTriton and support tensor parallelism.
Describe the solution you'd like
I think right now the example is excellent, but will be more comprehensive if can add multiple gpu inference(tensor parallelism inference) examples since this will be one of the widely use case.
The text was updated successfully, but these errors were encountered: