feat: add sglang prefill moe power law #134

xutizhou · 2025-11-26T04:14:14Z

Overview:

Details:

Power-Law Distribution Support for Prefill MoE Benchmarking

This change adds power-law token distribution simulation for MoE prefill phase benchmarking.

Overview:

Simulates realistic token-to-expert assignment patterns observed in production workloads
Configurable alpha parameter controls distribution skewness:
- alpha < 1.0: More uniform distribution (e.g., 0.6, 0.8)
- alpha ~ 1.0: Zipf-like distribution (e.g., 1.02)
- alpha > 1.0: Heavy-tailed distribution with few dominant experts (e.g., 1.2)
Multiple samples (5x) are generated per configuration to reduce variance from
single-sample outliers
Ensures max tokens per expert stays within bounds via power_law_logits_v3/v4

Implementation:

power_law_logits_v3: Generates power-law distributed topk_idx, topk_weights, and
num_recv_tokens_per_expert for prefill phase
power_law_logits_v4: Similar to v3 but ensures max tokens per expert <= num_tokens,
used for decode phase
Results are logged with distribution type (uniform/power_law_{alpha}) for analysis

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

- Introduced `power_law_logits_v3` function to calculate token distribution among experts based on a power law. - Added debug option for logging expert token assignments. - Enhanced the handling of token distribution adjustments to ensure proper expert allocation.

- Expanded `get_moe_prefill_test_cases` to include distribution types and power law alpha values. - Updated `benchmark_moe_layer_prefill` to handle new test case format and added logging for distribution type. - Refined token distribution logic in `power_law_logits_v3` to return additional metrics for expert allocation. - Improved handling of weights and indices for uniform and power-law distributions.

- Removed unnecessary debug print statements in `power_law_logits_v3` and `benchmark_moe_layer_prefill`. - Updated `expert_assignments` tensor type to `int64` for consistency. - Changed masking logic in `topk_idx` to set out-of-range indices to -1 instead of 0 for better clarity. - Ensured `topk_idx_iter` is directly moved to the device without type conversion.

- Updated the `num_experts` descriptions to accurately reflect the expert per GPU calculations for different expert parallel sizes. - Ensured clarity in the documentation for users configuring expert settings.

- Implemented multiple sampling in `benchmark_moe_layer_prefill` for power law distribution to mitigate outlier effects. - Adjusted handling of weights and indices for both power law and uniform distributions. - Ensured consistent processing of samples during warmup and iteration phases to improve performance and reliability.

copy-pr-bot · 2025-11-26T04:14:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

AichenF

aiconfigurator/src/aiconfigurator/sdk/models.py

Line 1323 in 24479e6

self.context_ops.extend(

could also be modified, as the context_moe uses "uniform" as the default workload_distribution for now.

AichenF · 2025-11-27T04:07:50Z

collector/sglang/collect_wideep_deepep_moe.py


+def power_law_logits_v3(num_tokens, num_experts, topk, ep, alpha):
+    if num_tokens * topk > num_experts:
+        num_tokens_per_expert = sample_power_law(num_experts, alpha, 1, num_tokens * 0.8)


why is 0.8 here, is this parameter should be fixed

it's a hyper parameter referring

aiconfigurator/collector/helper.py

Line 304 in 24479e6

num_tokens_per_expert = sample_power_law(num_experts, alpha, 1, num_tokens * 0.8)

0.8 means the number of tokens for the most heavy expert is 0.8 * num_tokens, which is from the statistics. You can change the coefficient if you need.

AichenF

self._power_law_alpha
Do we need to use different _power_law_alpha values for prefill and decode in the modeling？

xutizhou · 2025-11-27T12:48:24Z

aiconfigurator/src/aiconfigurator/sdk/models.py

Line 1323 in 24479e6

self.context_ops.extend(

could also be modified, as the context_moe uses "uniform" as the default workload_distribution for now.

We can use modify it in another PR

xutizhou · 2025-11-27T12:56:22Z

self._power_law_alpha Do we need to use different _power_law_alpha values for prefill and decode in the modeling？

Yes, I think it's more flexible.

- Changed the hardcoded "uniform" string to the variable `workload_distribution` for improved flexibility in model configuration.

xutizhou · 2025-12-01T07:19:21Z

aiconfigurator/src/aiconfigurator/sdk/models.py

Line 1323 in 24479e6

self.context_ops.extend(

could also be modified, as the context_moe uses "uniform" as the default workload_distribution for now.

done

xutizhou added 6 commits November 21, 2025 16:35

fix: correct expert simulation calculations in README.md

c1a7c20

- Updated the `num_experts` descriptions to accurately reflect the expert per GPU calculations for different expert parallel sizes. - Ensured clarity in the documentation for users configuring expert settings.

Merge remote-tracking branch 'origin/main' into feat/power_law

aee0c8a

xutizhou requested a review from AichenF as a code owner November 26, 2025 04:14

github-actions bot added the feat label Nov 26, 2025

xutizhou changed the title ~~feat: power law~~ feat: add sglang prefill moe power law Nov 26, 2025

AichenF reviewed Nov 27, 2025

View reviewed changes

tianhaox approved these changes Nov 27, 2025

View reviewed changes

tianhaox self-requested a review November 27, 2025 17:05

fix: update workload distribution parameter in WideEPDeepSeekModel

bd6ad03

- Changed the hardcoded "uniform" string to the variable `workload_distribution` for improved flexibility in model configuration.

xutizhou requested review from Arsene12358, davilu-nvidia, jasonqinzhou and simone-chen as code owners December 1, 2025 07:19

AichenF approved these changes Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add sglang prefill moe power law #134

feat: add sglang prefill moe power law #134

xutizhou commented Nov 26, 2025

Uh oh!

copy-pr-bot bot commented Nov 26, 2025

Uh oh!

AichenF left a comment

Uh oh!

AichenF Nov 27, 2025

Uh oh!

xutizhou Nov 27, 2025

Uh oh!

YijiaZhao Nov 28, 2025

Uh oh!

AichenF left a comment

Uh oh!

xutizhou commented Nov 27, 2025

Uh oh!

xutizhou commented Nov 27, 2025

Uh oh!

xutizhou commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add sglang prefill moe power law #134

Are you sure you want to change the base?

feat: add sglang prefill moe power law #134

Conversation

xutizhou commented Nov 26, 2025

Overview:

Details:

Power-Law Distribution Support for Prefill MoE Benchmarking

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

copy-pr-bot bot commented Nov 26, 2025

Uh oh!

AichenF left a comment

Choose a reason for hiding this comment

Uh oh!

AichenF Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

xutizhou Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

YijiaZhao Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

AichenF left a comment

Choose a reason for hiding this comment

Uh oh!

xutizhou commented Nov 27, 2025

Uh oh!

xutizhou commented Nov 27, 2025

Uh oh!

xutizhou commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants