Is it even possible to have multiple input layers #641

forrestjgq · 2023-12-12T10:21:42Z

Hi guys:

We're developing a new model using tensorrt-llm, and this model has more than one input layers. I checked out GptSession and GptDecoder code, and find that it seems only input ids could be passed to model?

Is it possible for us to pass more data to model? those data will NOT be changed throughout the whole generating processing.

Thanks!

forrestjgq · 2023-12-12T12:25:07Z

Or did you has a plan to support llava model or other multimodals?

byshiue · 2023-12-13T01:54:32Z

You could use several input layer. Currently, all k/v caches are passed by input layers.

llava is on our roadmap, add @ncomly-nvidia share more details.

forrestjgq · 2023-12-13T02:11:46Z

You could use several input layer. Currently, all k/v caches are passed by input layers.

llava is on our roadmap, add @ncomly-nvidia share more details.

Our plan is to design 2 extra layers other than input_ids based on Llama, and build to trt engine.

In ensemble pipeline, data for those 2 layers will be processed by preprocessing model and passed to trtllm within named tensor.

We have 2 questions:

is this plan working?
will these 2 extra layers be delivered to trt engine in each forwarding by gpt manager(or something else)?

symphonylyh · 2023-12-18T21:26:16Z

@forrestjgq for 2 extra layers do you mean 2 extra inputs? Here are some examples that might be useful to you:

position_ids and token_type_ids: in encoder-decoder models like BART, it requires an extra input called position_ids, so you can see here
BLIP2 example has input_embeds instead of input_ids to handle visual encoder's embeddings, treating it as a prompt tuning table, here
Last, if your goal is to enable LLaVA, a good news is that we have internally supported LLaVA and more generally a multi-modal family class. Stay tuned and we'll release soon. We can use TensorRT-LLM Requests #632 as the tracker

symphonylyh · 2024-01-09T23:26:34Z

@forrestjgq we now released the multi-modal support for BLIP w/ OPT or T5, and LLaVA. Please take a look :)
Annoucement: #847

Closing the issue for now since LLaVA is now supported. Feel free to re-open or open a new issue if you encountered any problem.

forrestjgq · 2024-01-16T08:13:23Z

@symphonylyh
Great!

One more question, how to deploy it in triton server?

symphonylyh · 2024-01-26T03:17:16Z

@forrestjgq the simplest approach is to write a Triton python backend and some examples are in the triton repo. But for more general support and inflight-batching feature, we're still in progress and expect to have some update in Feb.

byshiue assigned byshiue and ncomly-nvidia and unassigned byshiue Dec 13, 2023

byshiue added the feature request New feature or request label Dec 13, 2023

ncomly-nvidia assigned symphonylyh Dec 18, 2023

ncomly-nvidia added the triaged Issue has been triaged by maintainers label Dec 18, 2023

ncomly-nvidia mentioned this issue Jan 2, 2024

TensorRT-LLM Requests #632

Open

41 tasks

symphonylyh closed this as completed Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it even possible to have multiple input layers #641

Is it even possible to have multiple input layers #641

forrestjgq commented Dec 12, 2023 •

edited

Loading

forrestjgq commented Dec 12, 2023

byshiue commented Dec 13, 2023

forrestjgq commented Dec 13, 2023

symphonylyh commented Dec 18, 2023

symphonylyh commented Jan 9, 2024

forrestjgq commented Jan 16, 2024 •

edited

Loading

symphonylyh commented Jan 26, 2024

Is it even possible to have multiple input layers #641

Is it even possible to have multiple input layers #641

Comments

forrestjgq commented Dec 12, 2023 • edited Loading

forrestjgq commented Dec 12, 2023

byshiue commented Dec 13, 2023

forrestjgq commented Dec 13, 2023

symphonylyh commented Dec 18, 2023

symphonylyh commented Jan 9, 2024

forrestjgq commented Jan 16, 2024 • edited Loading

symphonylyh commented Jan 26, 2024

forrestjgq commented Dec 12, 2023 •

edited

Loading

forrestjgq commented Jan 16, 2024 •

edited

Loading