[Feature] Does SGLang support AWQ W4Afp8? #1964

vkc1vk · 2024-11-08T21:29:15Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

AWQ with INT4 weights and fp8 activations / KV cache works fairly well with Llama-3 models, and is a useful quantization technique for high-throughput regime. Is this quantization format supported by SGLang?

Related resources

https://github.com/NVIDIA/TensorRT-LLM/blob/b7868dd1bd1186840e3755b97ea3d3a73ddd76c5/examples/falcon/README.md?plain=1#L311

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Does SGLang support AWQ W4Afp8? #1964

[Feature] Does SGLang support AWQ W4Afp8? #1964

vkc1vk commented Nov 8, 2024 •

edited

Loading

[Feature] Does SGLang support AWQ W4Afp8? #1964

[Feature] Does SGLang support AWQ W4Afp8? #1964

Comments

vkc1vk commented Nov 8, 2024 • edited Loading

Checklist

Motivation

Related resources

vkc1vk commented Nov 8, 2024 •

edited

Loading