What would be the best way to keep track of the EMA of your model parameters in NNX? #4528

bitsandscraps · 2025-02-04T12:24:54Z

bitsandscraps
Feb 4, 2025

What would be the best way to keep track of the EMA of your model parameters in NNX?
Using GraphDef is obviously a viable option, but I was wondering if there is a more "NNX-ic" way to do this.

Answered by NobuoTsukamoto

May 12, 2025

I created a sample that performs Model EMA with Flax NXX.

https://gist.github.com/NobuoTsukamoto/61eb086f4bf1230c9613a3035dae8cec

It is based on the MNIST sample of Flax NNX.

Create a derived class of nnx.Optimizer (Model EMA) and update the model parameters with the update method.
Clone the model and create an instance of the ModelEMA class (ema_optimizer) with optax.ema.
In the learning loop, after train_step, update the EMA parameters by calling the update method of ema_optimizer in ema_step with the updated parameters.

Even in the case of a model that includes BatchNorm, this can be achieved by changing the wr argument specified in the nnx.Optimizer constructor, and there is no ne…

View full answer

NobuoTsukamoto · 2025-05-12T12:25:07Z

NobuoTsukamoto
May 12, 2025

I created a sample that performs Model EMA with Flax NXX.

https://gist.github.com/NobuoTsukamoto/61eb086f4bf1230c9613a3035dae8cec

It is based on the MNIST sample of Flax NNX.

Create a derived class of nnx.Optimizer (Model EMA) and update the model parameters with the update method.
Clone the model and create an instance of the ModelEMA class (ema_optimizer) with optax.ema.
In the learning loop, after train_step, update the EMA parameters by calling the update method of ema_optimizer in ema_step with the updated parameters.

Even in the case of a model that includes BatchNorm, this can be achieved by changing the wr argument specified in the nnx.Optimizer constructor, and there is no need to be aware of it in the learning loop or ema_step.
(It was necessary to be aware of it in the case of linen)

If you want to change the EMA logic, you should define your own optax GradientTransformation.

0 replies

aurelio-amerio · 2025-09-23T11:31:59Z

aurelio-amerio
Sep 23, 2025

Thanks for the very nice example.
I think it might be better to write the ema class not to use flax/optax internals, as it may break in new flax versions.
Something like this should work (following the new 0.11 flax syntax):

class ModelEMA(nnx.Optimizer):

    def __init__(
        self,
        model: nnx.Module,
        tx: optax.GradientTransformation,
    ):
        super().__init__(model, tx, wrt=[nnx.Param, nnx.BatchStat])


    def update(self, model, model_orginal: nnx.Module):
        params = nnx.state(model_orginal, self.wrt)
        ema_params = nnx.state(model, self.wrt)
        self.step.value += 1

        ema_state = optax.EmaState(count=self.step, ema=ema_params)

        _, new_ema_state = self.tx.update(params, ema_state)

        nnx.update(model, new_ema_state.ema)

An updated version of the notebook is available here: https://colab.research.google.com/gist/aurelio-amerio/afa5b4da0c3a2b881250e490c8688345/flax-nnx-model-ema.ipynb

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What would be the best way to keep track of the EMA of your model parameters in NNX? #4528

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What would be the best way to keep track of the EMA of your model parameters in NNX? #4528

Uh oh!

bitsandscraps Feb 4, 2025

Replies: 2 comments

Uh oh!

NobuoTsukamoto May 12, 2025

Uh oh!

aurelio-amerio Sep 23, 2025

bitsandscraps
Feb 4, 2025

NobuoTsukamoto
May 12, 2025

aurelio-amerio
Sep 23, 2025