Skip to content

Conversation

@Jackmin801
Copy link
Member

@Jackmin801 Jackmin801 commented Nov 24, 2025

Note

Updates get_load_balance_stats to optionally ignore top-k routed experts (avoiding padding experts) when computing max violation, with proper buffer reset.

  • MoE Load Balance Stats (src/prime_rl/trainer/model.py):
    • Add try_to_avoid_padding_experts: bool = True to get_load_balance_stats.
    • When enabled, sort mlp.tokens_per_expert desc and slice off router.top_k to exclude top-k routed experts from mean/max calc.
    • Ensure reset zeros the original mlp.tokens_per_expert buffer.
    • Add explicit tensor type annotation for tokens_per_expert.

Written by Cursor Bugbot for commit a1d15ea. This will update automatically on new commits. Configure here.

Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would make us blind to very heavy expert imbalance where all tokens are routed to a singular expert?

@Jackmin801
Copy link
Member Author

yea it would but also its better than us not caring about the metric because we always think its just padding tokens ruining the metric

@Jackmin801 Jackmin801 changed the title use second expert exclude top k from max vio calculation to try and skip the "padding experts" Nov 24, 2025
@mikasenghaas mikasenghaas force-pushed the feat-use-second-expert-for-max-vio branch from 99d2b21 to a1d15ea Compare December 3, 2025 20:24
@mikasenghaas mikasenghaas merged commit 28c93fa into main Dec 4, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants