exllama + triton inference server? #213

arbi-dev · 2023-07-31T13:38:32Z

arbi-dev
Jul 31, 2023

would it be possible to deploy exllama with Nvidia's open-source Triton Inference Server? Triton inference server has several useful features for deploying llms in production.
https://developer.nvidia.com/triton-inference-server

As I understand it allows custom back-ends, so was wondering if this could potentially work or if anyone has tried?
https://github.com/triton-inference-server/backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exllama + triton inference server? #213

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

exllama + triton inference server? #213

arbi-dev Jul 31, 2023

Replies: 0 comments

arbi-dev
Jul 31, 2023