-
Notifications
You must be signed in to change notification settings - Fork 31
feat: add sglang prefill moe power law #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced `power_law_logits_v3` function to calculate token distribution among experts based on a power law. - Added debug option for logging expert token assignments. - Enhanced the handling of token distribution adjustments to ensure proper expert allocation.
- Expanded `get_moe_prefill_test_cases` to include distribution types and power law alpha values. - Updated `benchmark_moe_layer_prefill` to handle new test case format and added logging for distribution type. - Refined token distribution logic in `power_law_logits_v3` to return additional metrics for expert allocation. - Improved handling of weights and indices for uniform and power-law distributions.
- Removed unnecessary debug print statements in `power_law_logits_v3` and `benchmark_moe_layer_prefill`. - Updated `expert_assignments` tensor type to `int64` for consistency. - Changed masking logic in `topk_idx` to set out-of-range indices to -1 instead of 0 for better clarity. - Ensured `topk_idx_iter` is directly moved to the device without type conversion.
- Updated the `num_experts` descriptions to accurately reflect the expert per GPU calculations for different expert parallel sizes. - Ensured clarity in the documentation for users configuring expert settings.
- Implemented multiple sampling in `benchmark_moe_layer_prefill` for power law distribution to mitigate outlier effects. - Adjusted handling of weights and indices for both power law and uniform distributions. - Ensured consistent processing of samples during warmup and iteration phases to improve performance and reliability.
AichenF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aiconfigurator/src/aiconfigurator/sdk/models.py
Line 1323 in 24479e6
| self.context_ops.extend( |
|
|
||
| def power_law_logits_v3(num_tokens, num_experts, topk, ep, alpha): | ||
| if num_tokens * topk > num_experts: | ||
| num_tokens_per_expert = sample_power_law(num_experts, alpha, 1, num_tokens * 0.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is 0.8 here, is this parameter should be fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a hyper parameter referring
aiconfigurator/collector/helper.py
Line 304 in 24479e6
| num_tokens_per_expert = sample_power_law(num_experts, alpha, 1, num_tokens * 0.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.8 means the number of tokens for the most heavy expert is 0.8 * num_tokens, which is from the statistics. You can change the coefficient if you need.
AichenF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._power_law_alpha
Do we need to use different _power_law_alpha values for prefill and decode in the modeling?
We can use modify it in another PR |
Yes, I think it's more flexible. |
- Changed the hardcoded "uniform" string to the variable `workload_distribution` for improved flexibility in model configuration.
done |
Overview:
Details:
Power-Law Distribution Support for Prefill MoE Benchmarking
This change adds power-law token distribution simulation for MoE prefill phase benchmarking.
Overview:
single-sample outliers
power_law_logits_v3/v4Implementation:
power_law_logits_v3: Generates power-law distributed topk_idx, topk_weights, andnum_recv_tokens_per_expert for prefill phase
power_law_logits_v4: Similar to v3 but ensures max tokens per expert <= num_tokens,used for decode phase
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)