Skip to content

Conversation

@faresobeid
Copy link
Contributor

@faresobeid faresobeid commented Oct 27, 2025

Currently have matched rope implementations (seemingly vllm does rope in bf16 instead of fp32 which is weird)
Add residuals in fp32 instead of bf16
Stop using liger kernel for our RMS Norm

Planned:
Check MoE difference
Vllm fuses the residual and rms norm, not sure if we can do the equivalent
torch.topk apparently is unstable so will try to find an alternative
Test actual mismatch_kl difference and mfu difference (although mismatch_kl here might not be perfect as it can have an impact late in training)

Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gods work

faresobeid and others added 6 commits October 27, 2025 19:41
Signed-off-by: faresobeid <[email protected]>
Removed unused import of Optional from typing.

Signed-off-by: faresobeid <[email protected]>
Signed-off-by: faresobeid <[email protected]>
Signed-off-by: faresobeid <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants