-
Notifications
You must be signed in to change notification settings - Fork 144
Labels
Description
Is your feature request related to a problem? Please describe.
Our FP8 implementation requires patching over vLLM's fp8 module for linear layers. For MoEs we have to extend our patching to vLLM's fused moe module:
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/fp8.py#L476
Describe the solution you'd like
A clear and concise description of what you want to happen.
Provide a code snippet on how new APIs/changes would be used by others.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here