-
Hello sglang community, I am currently working on implementing the SmoothQuant quantitative algorithm within the sglang framework to optimize the inference process for a large language model. While I have a general understanding of the SmoothQuant approach, I am seeking guidance on how to efficiently integrate this algorithm into sglang. Specific Questions:Implementation Reference:Could you point me to any existing references or similar implementations that could serve as a guide for integrating SmoothQuant into sglang? I am aware that the vLLM inference framework does not support SmoothQuant quantization on the main branch, so I am particularly interested in any alternative frameworks or branches that might have implemented something similar. Best Practices:What are the best practices when implementing quantitative inference algorithms like SmoothQuant in sglang? Are there any specific features or optimizations within sglang that I should leverage to ensure efficient performance? Potential Challenges:Are there any known challenges or pitfalls when implementing SmoothQuant in sglang that I should be aware of? How can these be mitigated? I would greatly appreciate any advice, references, or examples that the community could share. Your expertise and experience will be invaluable as I work through this implementation. Thank you in advance for your support! Best regards, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Some of the work my former colleagues have done before might be something you can refer to. https://github.com/vllm-project/vllm/pull/1508/files |
Beta Was this translation helpful? Give feedback.
Some of the work my former colleagues have done before might be something you can refer to.
https://github.com/vllm-project/vllm/pull/1508/files
https://github.com/vllm-project/vllm/pull/5218/files
https://github.com/InternLM/lmdeploy/pull/2274/files