[Question] Tensor parallelism for tensorrt_llm #79

JoeLiu996 · 2024-07-05T15:55:25Z

Is your feature request related to a problem? Please describe.
I am aware that PyTriton already have an example for using PyTriton with tensorrt_llm. But I noticed that the example only support single gpu inference. Therefore, may I ask is there any other examples or reference docs which using tensorrt_llm with PyTriton and support tensor parallelism.

Describe the solution you'd like
I think right now the example is excellent, but will be more comprehensive if can add multiple gpu inference(tensor parallelism inference) examples since this will be one of the widely use case.

github-actions · 2024-07-30T01:54:37Z

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

JoeLiu996 changed the title ~~Tensor parallelism for tensorrt_llm~~ [Question] Tensor parallelism for tensorrt_llm Jul 5, 2024

piotrm-nvidia self-assigned this Jul 8, 2024

github-actions bot added the Stale label Jul 30, 2024

piotrm-nvidia added non-stale This label can be used to prevent marking issues or PRs as Stale and removed Stale labels Aug 1, 2024

piotrm-nvidia assigned piotrm-nvidia and pziecina-nv and unassigned piotrm-nvidia Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Tensor parallelism for tensorrt_llm #79

[Question] Tensor parallelism for tensorrt_llm #79

JoeLiu996 commented Jul 5, 2024

github-actions bot commented Jul 30, 2024

[Question] Tensor parallelism for tensorrt_llm #79

[Question] Tensor parallelism for tensorrt_llm #79

Comments

JoeLiu996 commented Jul 5, 2024

github-actions bot commented Jul 30, 2024