New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fix fake_initialize_model_parallel for MoE models #11441

Open

guyueh1 wants to merge 4 commits into NVIDIA:main from guyueh1:fix_parallel_state_for_moe

+23 −4

Commits on Nov 30, 2024

Fix fake_initialize_model_parallel for MoE models

* Make the use of RankGenerator consistent with recent Mcore changes !1940
( NVIDIA/Megatron-LM@7f22e21)
  - use ep=1 for decoder_rank_generator, making it treat EP as part of DP
  - define a new expert_decoder_rank_generator to handle EP groups/ranks only

Signed-off-by: Guyue Huang <[email protected]>

guyueh1 committed Nov 30, 2024

Apply isort and black reformatting
```
Signed-off-by: guyueh1 <[email protected]>
```
guyueh1 committed Nov 30, 2024
Configuration menu
View commit details

Copy full SHA for b12892b

Browse repository at this point
Copy the full SHA

b12892b View commit details

Browse the repository at this point in the history

Commits on Dec 3, 2024

Fix expert rank generator
```
Signed-off-by: Guyue Huang <[email protected]>
```
guyueh1 committed Dec 3, 2024
Configuration menu
View commit details

Copy full SHA for 23befa1

Browse repository at this point
Copy the full SHA

23befa1 View commit details

Browse the repository at this point in the history
Merge branch 'main' into fix_parallel_state_for_moe

guyueh1 authored Dec 3, 2024
Configuration menu
View commit details

Copy full SHA for 689ffee

Browse repository at this point
Copy the full SHA

689ffee View commit details

Browse the repository at this point in the history