Inconsistency between Linen Dropout and BatchNorm wrt mode (use_running_average / deterministic) #545

rwightman · 2020-10-20T21:09:16Z

rwightman
Oct 20, 2020

The training state for BatchNorm is set via self.use_running_average attr. For Dropout, it is passed via deterministic arg in __call__.

(I realize those modes are not specific to training/not training).

Is there any reason for this difference? I was planning to use a training=False/True arg in my __call__ chain to pass training state as opposed to binding layer creation args. I believe it still works fine with jit if it's marked as static?

Having training/not training state passed through in some cases as an arg for the layer init and in others as an arg in the __call__ is a bit jarring and seems error prone (already messed it up once).

jheek · 2020-11-30T11:05:00Z

jheek
Nov 30, 2020
Maintainer

I think this is a mistake made when porting the old BatchNom to linen. It should be an argument to __call__

We have so far not included a global trainings=False/True switch to the Module like many other NN apis have.
It wouldn't be a big thing to add but we do like modules to be explicit about their functionality.

One nice pattern to avoid errors is the following:

norm = nn.BatchNorm.partial(
        use_running_average=not train,
        momentum=0.9, epsilon=1e-5,
        dtype=dtype)

#683 should allow for a __call__ override of use_running_average

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency between Linen Dropout and BatchNorm wrt mode (use_running_average / deterministic) #545

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Inconsistency between Linen Dropout and BatchNorm wrt mode (use_running_average / deterministic) #545

rwightman Oct 20, 2020

Replies: 1 comment

jheek Nov 30, 2020 Maintainer

rwightman
Oct 20, 2020

jheek
Nov 30, 2020
Maintainer