-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it even possible to have multiple input layers #641
Comments
Or did you has a plan to support llava model or other multimodals? |
You could use several input layer. Currently, all k/v caches are passed by input layers. llava is on our roadmap, add @ncomly-nvidia share more details. |
Our plan is to design 2 extra layers other than In ensemble pipeline, data for those 2 layers will be processed by preprocessing model and passed to trtllm within named tensor. We have 2 questions:
|
@forrestjgq for 2 extra layers do you mean 2 extra inputs? Here are some examples that might be useful to you:
|
@forrestjgq we now released the multi-modal support for BLIP w/ OPT or T5, and LLaVA. Please take a look :) Closing the issue for now since LLaVA is now supported. Feel free to re-open or open a new issue if you encountered any problem. |
@symphonylyh One more question, how to deploy it in triton server? |
@forrestjgq the simplest approach is to write a Triton python backend and some examples are in the triton repo. But for more general support and inflight-batching feature, we're still in progress and expect to have some update in Feb. |
Hi guys:
We're developing a new model using tensorrt-llm, and this model has more than one input layers. I checked out GptSession and GptDecoder code, and find that it seems only input ids could be passed to model?
Is it possible for us to pass more data to model? those data will NOT be changed throughout the whole generating processing.
Thanks!
The text was updated successfully, but these errors were encountered: