Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does tensorRT-LLM support serving 4bit quantised unsloth Llama model #2472

Open
jayakommuru opened this issue Nov 20, 2024 · 2 comments
Open
Assignees
Labels
quantization Issue about lower bit quantization, including int8, int4, fp8 question Further information is requested triaged Issue has been triaged by maintainers

Comments

@jayakommuru
Copy link

We want to deploy https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit which is 4-bit quantized version of llama-3.2-1B model. It is quantized using bitsandbytes. Can we deploy this using tensor RT-LLM backend ? If so, is there any documentation to refer?

@jayakommuru jayakommuru changed the title Does tensor RT support loading 4bit quantised unsloth model Does tensor RT support serving 4bit quantised unsloth Llama model Nov 20, 2024
@jayakommuru jayakommuru changed the title Does tensor RT support serving 4bit quantised unsloth Llama model Does tensorRT-LLM support serving 4bit quantised unsloth Llama model Nov 20, 2024
@hello-11 hello-11 added question Further information is requested triaged Issue has been triaged by maintainers quantization Issue about lower bit quantization, including int8, int4, fp8 labels Nov 21, 2024
@Tracin
Copy link
Collaborator

Tracin commented Nov 21, 2024

Sorry, can not support that for now.

@jayakommuru
Copy link
Author

@Tracin is it because of the nf4 quantization of bitsandbytes used in the model ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization Issue about lower bit quantization, including int8, int4, fp8 question Further information is requested triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants