-
Notifications
You must be signed in to change notification settings - Fork 71
Add Muonremez and test #2285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Muonremez and test #2285
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds the MuonRemez optimizer, a variant of the Muon optimizer that uses coupled Newton-Schulz iteration to compute matrix square roots (U·Σ^(1/2)·V^T) instead of Newton-Schulz orthogonalization. The implementation includes configuration, state management, and supporting mathematical functions, along with an experiment script to test the optimizer through learning rate sweeps on small Llama models.
Key changes:
- Implemented MuonRemez optimizer with coupled Newton-Schulz iteration for computing matrix square roots
- Integrated optimizer into the public API with proper registration
- Created experiment script for learning rate sweep testing on 300M Llama model
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
lib/levanter/src/levanter/optim/muonremez.py |
Complete MuonRemez optimizer implementation with config class, gradient transformation, and coupled Newton-Schulz quintic algorithm for matrix square root computation |
lib/levanter/src/levanter/optim/__init__.py |
Registered and exported MuonRemezConfig in the optimizer module's public interface |
experiments/exp2284_test_remez.py |
Experiment setup for testing MuonRemez with learning rate sweeps on a 300M parameter Llama model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
This pull request introduces a new optimizer, MuonRemez proposed by @mahyarjn80 , and sets up an experiment to perform learning rate sweeps on small Llama models using this optimizer. The changes span the addition of the optimizer implementation, its integration into the codebase, and the creation of an experiment script to test its performance.
Addition of MuonRemez optimizer:
MuonRemezConfigoptimizer, which uses a coupled Newton-Schulz iteration to compute matrix square roots for weight updates, along with supporting functions and state management inmuonremez.py.MuonRemezConfigin the optimizer module's public interface (__init__.py). [1] [2]Experiment setup for MuonRemez:
exp2284_test_remez.py, which defines and runs a learning rate sweep experiment comparing MuonRemez and non-MuP variants on a small Llama model using the new optimizer configuration.## DescriptionFixes #2284