0.12.0 #4984

cgarciae · 2025-09-25T23:58:18Z

cgarciae
Sep 25, 2025
Maintainer

Flax 0.12.0 includes many updates and some important breaking changes to the NNX API.

Breaking Changes

Pytree Strict Attributes

nnx.Pytree and therefore nnx.Module are now stricter with regards to attributes that contain Arrays and changing the status of attributes. For example, the code below now fails:

from flax import nnx
import jax
import jax.numpy as jnp

class Foo(nnx.Module):
  def __init__(self, use_bias, rngs):
    self.layers = [  # ERROR
      nnx.Linear(3, 3, rngs=rngs) for _ in range(5)
    ]
    self.bias = None # status = static
    if use_bias:
      self.bias = nnx.Param(rngs.params.uniform(3,)) # ERROR

This happens for two reasons:

JAX pytree structures that contain Arrays now have to be marked with nnx.data. Alternatively, if the container pytree is a list or a dict, you can use nnx.List or nnx.Dict, which additionally allow mixed "data" and "static" elements.
Attributes will no longer automatically change their status—this now has to be done explicitly using nnx.data or nnx.static. Additionally, assigning Arrays or structures with Arrays to static attributes is now an error, as they will not automatically change to data.

To fix the above you can just create layers as a List Module which is automatically recognized as data, and be explicit about bias being a data attribute on the first assignment by using nnx.data:

class Foo(nnx.Module):
  def __init__(self, use_bias, rngs):
    self.layers = nnx.List([  # nnx.data also works but List is recommended
      nnx.Linear(3, 3, rngs=rngs) for _ in range(5)
    ])
    self.bias = nnx.data(None)
    if use_bias:
      self.bias = nnx.Param(rngs.params.uniform(3,))

For more information check the Module & Pytree guide.

Eager Sharding

Variables will now eagerly shard their values when sharding_names metadata is provided. A mesh is required—it can be provided either via passing a mesh metadata attribute or setting the global mesh context via jax.set_mesh. This simplifies the process of sharding a Variable to construction time:

jax.config.update('jax_num_cpu_devices', 8)
mesh = jax.make_mesh((2, 4), ('data', 'model'))

with jax.set_mesh(mesh):
  variable = nnx.Param(jnp.ones((16, 32)), sharding_names=(None, 'model'))
  
print(variable.value.sharding)

Eager sharding will also occur when using the nnx.with_partitioning initializer decorator and will automatically extend to the Optimizer. This means that both model and optimizer will be sharded at construction without the need for the somewhat cumbersome nnx.get_partition_spec + jax.lax.with_sharding_constraint + nnx.update pattern:

with jax.set_mesh(mesh):
  linear = nnx.Linear(
    in_features=16, out_features=16, use_bias=False,
    kernel_init=nnx.with_partitioning(
      nnx.initializers.lecun_normal(), (None, 'model')
    ),
    rngs=nnx.Rngs(0),
  )
  optimizer = nnx.Optimizer(linear, optax.adam(1e-3), wrt=nnx.Param)
  
print(linear.kernel.value.sharding)
print(optimizer.opt_state[0].mu.kernel.value.sharding)

For projects that currently rely on other means for sharding, eager sharding can be turned off by passing eager_sharding=False to the Variable constructor, either directly or through initializer decorators like nnx.with_partitioning:

linear = nnx.Linear(
  in_features=16, out_features=16, use_bias=False,
  kernel_init=nnx.with_partitioning(
    nnx.initializers.lecun_normal(), (None, 'model'), eager_sharding=False
  ),
  rngs=nnx.Rngs(0),
)
optimizer = nnx.Optimizer(linear, optax.adam(1e-3), wrt=nnx.Param)
  
print(linear.kernel.value.sharding)
print(optimizer.opt_state[0].mu.kernel.value.sharding)

Eager sharding can also be turned off globally via the flax_always_shard_variable config flag or the FLAX_ALWAYS_SHARD_VARIABLE environment variable:

import flax
flax.config.update('flax_always_shard_variable', False)

For more information, check out the Variable eager sharding FLIP.

In-Place Operators No Longer Allowed

In-place operators will now raise an error. This is done as part of the push for Variables to be compatible with Tracer semantics:

w = nnx.Variable(jnp.array(0))
w += 1  # ERROR

The fix is to simply operate on the .value property instead:

w.value += 1

All Changes

Doc fix: remove dead link to pre-Orbax checkpointing. by @copybara-service[bot] in Doc fix: remove dead link to pre-Orbax checkpointing. #4914
Fix typo in unflatten docs by @copybara-service[bot] in Fix typo in unflatten docs #4918
fix RNN by @copybara-service[bot] in fix RNN #4917
Update optimizer.py to support masked variable from optax. by @ywrt in Update optimizer.py to support masked variable from optax. #4904
Added missing functions to graph.rst by @vfdev-5 in Added missing functions to graph.rst #4922
Update flax/docs_nnx/guides/performance.md and .ipynb by @hanrach9 in Update flax/docs_nnx/guides/performance.md and .ipynb #4919
Added preferred_element_type arg to nnx.Linear*, nnx.Conv*, nnx.Einsum by @vfdev-5 in Added preferred_element_type arg to nnx.Linear*, nnx.Conv*, nnx.Einsum #4920
Update README badges and remove invalid ones by @IvyZX in Update README badges and remove invalid ones #4905
static + pytree guide by @cgarciae in static + pytree guide #4897
fix mypy by @copybara-service[bot] in fix mypy #4931
Avoid passing non-boolean mask to where argument of jax.numpy reductions. Non-boolean mask inputs have been deprecated for several releases, and will result in an error starting in JAX v0.8.0. by @copybara-service[bot] in Avoid passing non-boolean mask to where argument of jax.numpy reductions. Non-boolean mask inputs have been deprecated for several releases, and will result in an error starting in JAX v0.8.0. #4923
Ported nnx.PReLU from linen by @vfdev-5 in Ported nnx.PReLU from linen #4934
Added nnx.scan docs and few minor docs fixes by @vfdev-5 in Added nnx.scan docs and few minor docs fixes #4930
add variables argument to nnx.clone by @cgarciae in add variables argument to nnx.clone #4945
only copy dicts on State.getitem by @cgarciae in only copy dicts on State.__getitem__ #4946
always differentiate standalone Variables in nnx.grad by @cgarciae in always differentiate standalone Variables in nnx.grad #4947
Implement instance norm in NNX by @mattbahr in Implement instance norm in NNX #4939
Automatically apply sharding constraints to sharded models by @IvyZX in Automatically apply sharding constraints to sharded models #4844
Add reference of flip doc to gspmd guide by @IvyZX in Add reference of flip doc to gspmd guide #4949
Fixed nnx.is_data docstring rendering by @vfdev-5 in Fixed nnx.is_data docstring rendering #4957
expose pytree guide by @cgarciae in expose pytree guide #4951
fix toy examples by @cgarciae in fix toy examples #4952
Explicitly cast attribute names to string before checking for private attributes. by @copybara-service[bot] in Explicitly cast attribute names to string before checking for private attributes. #4955
add flax_hijax_variable flag by @cgarciae in add flax_hijax_variable flag #4953
mark shard_map as implemented in transforms guide by @cgarciae in mark shard_map as implemented in transforms guide #4738
improve Variable flatten by @cgarciae in improve Variable flatten #4954
Minor typo fix in nnx.call docstring by @vfdev-5 in Minor typo fix in nnx.call docstring #4959
allow split tuples in Rngs.fork by @cgarciae in allow split tuples in Rngs.fork #4958
Fixed Gemma example using Gemma2 models by @vfdev-5 in Fixed Gemma example using Gemma2 models #4830
finish pytree guide by @cgarciae in finish pytree guide #4929
update bridge wrappers from maxtext by @cgarciae in update bridge wrappers from maxtext #4937
fix HashableMapping hash definition for mixed key types by @copybara-service[bot] in fix HashableMapping hash definition for mixed key types #4936
Flax RNG guide for jax.jit: clarify rng outputs are shared but not inputs. by @copybara-service[bot] in Flax RNG guide for jax.jit: clarify rng outputs are shared but not inputs. #4956
fix Variable pytree flatten by @copybara-service[bot] in fix Variable pytree flatten #4962
import PathParts from flax.typing by @cgarciae in import PathParts from flax.typing #4966
Correctly expose flax.config.temp_flip_flag by @IvyZX in Correctly expose flax.config.temp_flip_flag #4969
raise on Variable inplace operators by @cgarciae in raise on Variable inplace operators #4967
Copybara import of the project: by @copybara-service[bot] in Copybara import of the project: #4976
update to version 0.12.0 by @cgarciae in update to version 0.12.0 #4982
Minor typo fixes in flax gspmd guide by @vfdev-5 in Minor typo fixes in flax gspmd guide #4970
ignore uv.lock by @copybara-service[bot] in ignore uv.lock #4974
[nnx] preserve the function's type information in jit by @cgarciae in [nnx] preserve the function's type information in jit #4981
add Variable.set_metadata by @cgarciae in add Variable.set_metadata #4968
propagate eager sharding by @cgarciae in propagate eager sharding #4983

New Contributors

@ywrt made their first contribution in Update optimizer.py to support masked variable from optax. #4904
@hanrach9 made their first contribution in Update flax/docs_nnx/guides/performance.md and .ipynb #4919

Full Changelog: v0.11.2...v0.12.0

This discussion was created from the release 0.12.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.12.0 #4984

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

0.12.0 #4984

Uh oh!

cgarciae Sep 25, 2025 Maintainer

Breaking Changes

Pytree Strict Attributes

Eager Sharding

In-Place Operators No Longer Allowed

All Changes

New Contributors

Replies: 0 comments

cgarciae
Sep 25, 2025
Maintainer