Skip to content

Conversation

@WhenWen
Copy link
Contributor

@WhenWen WhenWen commented Jan 6, 2026

This pull request introduces a new optimizer, MuonRemez proposed by @mahyarjn80 , and sets up an experiment to perform learning rate sweeps on small Llama models using this optimizer. The changes span the addition of the optimizer implementation, its integration into the codebase, and the creation of an experiment script to test its performance.

Addition of MuonRemez optimizer:

  • Implemented the MuonRemezConfig optimizer, which uses a coupled Newton-Schulz iteration to compute matrix square roots for weight updates, along with supporting functions and state management in muonremez.py.
  • Registered and imported MuonRemezConfig in the optimizer module's public interface (__init__.py). [1] [2]

Experiment setup for MuonRemez:

  • Added exp2284_test_remez.py, which defines and runs a learning rate sweep experiment comparing MuonRemez and non-MuP variants on a small Llama model using the new optimizer configuration.## Description

Fixes #2284

Copilot AI review requested due to automatic review settings January 6, 2026 06:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the MuonRemez optimizer, a variant of the Muon optimizer that uses coupled Newton-Schulz iteration to compute matrix square roots (U·Σ^(1/2)·V^T) instead of Newton-Schulz orthogonalization. The implementation includes configuration, state management, and supporting mathematical functions, along with an experiment script to test the optimizer through learning rate sweeps on small Llama models.

Key changes:

  • Implemented MuonRemez optimizer with coupled Newton-Schulz iteration for computing matrix square roots
  • Integrated optimizer into the public API with proper registration
  • Created experiment script for learning rate sweep testing on 300M Llama model

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
lib/levanter/src/levanter/optim/muonremez.py Complete MuonRemez optimizer implementation with config class, gradient transformation, and coupled Newton-Schulz quintic algorithm for matrix square root computation
lib/levanter/src/levanter/optim/__init__.py Registered and exported MuonRemezConfig in the optimizer module's public interface
experiments/exp2284_test_remez.py Experiment setup for testing MuonRemez with learning rate sweeps on a 300M parameter Llama model

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test a new optimizer called MuonRemez

2 participants