Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fake_initialize_model_parallel for MoE models #11441

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Commits on Nov 30, 2024

  1. Fix fake_initialize_model_parallel for MoE models

    * Make the use of RankGenerator consistent with recent Mcore changes !1940
    ( NVIDIA/Megatron-LM@7f22e21)
      - use ep=1 for decoder_rank_generator, making it treat EP as part of DP
      - define a new expert_decoder_rank_generator to handle EP groups/ranks only
    
    Signed-off-by: Guyue Huang <[email protected]>
    guyueh1 committed Nov 30, 2024
    Configuration menu
    Copy the full SHA
    a23345b View commit details
    Browse the repository at this point in the history
  2. Apply isort and black reformatting

    Signed-off-by: guyueh1 <[email protected]>
    guyueh1 committed Nov 30, 2024
    Configuration menu
    Copy the full SHA
    b12892b View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2024

  1. Fix expert rank generator

    Signed-off-by: Guyue Huang <[email protected]>
    guyueh1 committed Dec 3, 2024
    Configuration menu
    Copy the full SHA
    23befa1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    689ffee View commit details
    Browse the repository at this point in the history