Add examples for MoE models (non-Megatron)

**Is your feature request related to a problem? Please describe.**

Users integrating TE Engine with HuggingFace models are unable to find examples or guidance for Mixture-of-Experts (MoE) architectures. 

Currently, TE provides an excellent tutorial for Llama models, but there are no equivalent examples for MoE models. This gap is causing friction for customers trying to integrate TE with MoE models outside of the Megatron framework. 

**Describe the solution you'd like**

Add tutorial/example notebooks demonstrating how to integrate TE with popular MoE architectures, similar to the existing Llama tutorial. Examples:
- Mixtral integration example - showing how to wrap HuggingFace Mixtral models with TE
- General MoE integration guide - documenting the workflow for arbitrary MoE architectures

**Describe alternatives you've considered**

Using Megatron (e.g. https://github.com/yanring/Megatron-MoE-ModelZoo). This works but many users want to stay within the HF ecosystem.

**Additional context**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add examples for MoE models (non-Megatron) #2573

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add examples for MoE models (non-Megatron) #2573

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions