exclude top k from max vio calculation to try and skip the "padding experts" #1332

Jackmin801 · 2025-11-24T05:22:57Z

Note

Updates get_load_balance_stats to optionally ignore top-k routed experts (avoiding padding experts) when computing max violation, with proper buffer reset.

MoE Load Balance Stats (src/prime_rl/trainer/model.py):
- Add try_to_avoid_padding_experts: bool = True to get_load_balance_stats.
- When enabled, sort mlp.tokens_per_expert desc and slice off router.top_k to exclude top-k routed experts from mean/max calc.
- Ensure reset zeros the original mlp.tokens_per_expert buffer.
- Add explicit tensor type annotation for tokens_per_expert.

^{Written by Cursor Bugbot for commit a1d15ea. This will update automatically on new commits. Configure here.}

src/prime_rl/trainer/model.py

mikasenghaas

this would make us blind to very heavy expert imbalance where all tokens are routed to a singular expert?

Jackmin801 · 2025-11-24T06:13:59Z

yea it would but also its better than us not caring about the metric because we always think its just padding tokens ruining the metric

src/prime_rl/trainer/model.py

cursor bot reviewed Nov 24, 2025

View reviewed changes

src/prime_rl/trainer/model.py Outdated Show resolved Hide resolved

mikasenghaas reviewed Nov 24, 2025

View reviewed changes

cursor bot reviewed Nov 24, 2025

View reviewed changes

src/prime_rl/trainer/model.py Show resolved Hide resolved

src/prime_rl/trainer/model.py Show resolved Hide resolved

src/prime_rl/trainer/model.py Show resolved Hide resolved

Jackmin801 changed the title ~~use second expert~~ exclude top k from max vio calculation to try and skip the "padding experts" Nov 24, 2025

Jackmin801 added 4 commits December 3, 2025 20:24

use second expert

025a29e

do top k + 1

cd7d39d

just cut out the padding experts from the calc

050c593

zero on the original

a1d15ea

mikasenghaas force-pushed the feat-use-second-expert-for-max-vio branch from 99d2b21 to a1d15ea Compare December 3, 2025 20:24

mikasenghaas approved these changes Dec 3, 2025

View reviewed changes

mikasenghaas merged commit 28c93fa into main Dec 4, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

exclude top k from max vio calculation to try and skip the "padding experts" #1332

exclude top k from max vio calculation to try and skip the "padding experts" #1332

Uh oh!

Jackmin801 commented Nov 24, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

mikasenghaas left a comment

Uh oh!

Jackmin801 commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

exclude top k from max vio calculation to try and skip the "padding experts" #1332

exclude top k from max vio calculation to try and skip the "padding experts" #1332

Uh oh!

Conversation

Jackmin801 commented Nov 24, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Jackmin801 commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jackmin801 commented Nov 24, 2025 •

edited by cursor bot

Loading