-
Notifications
You must be signed in to change notification settings - Fork 603
Description
Is your feature request related to a problem? Please describe.
Users integrating TE Engine with HuggingFace models are unable to find examples or guidance for Mixture-of-Experts (MoE) architectures.
Currently, TE provides an excellent tutorial for Llama models, but there are no equivalent examples for MoE models. This gap is causing friction for customers trying to integrate TE with MoE models outside of the Megatron framework.
Describe the solution you'd like
Add tutorial/example notebooks demonstrating how to integrate TE with popular MoE architectures, similar to the existing Llama tutorial. Examples:
- Mixtral integration example - showing how to wrap HuggingFace Mixtral models with TE
- General MoE integration guide - documenting the workflow for arbitrary MoE architectures
Describe alternatives you've considered
Using Megatron (e.g. https://github.com/yanring/Megatron-MoE-ModelZoo). This works but many users want to stay within the HF ecosystem.
Additional context