Skip to content

Weight Decay gradient question #18

@kogolobo

Description

@kogolobo

In WeightDecay regularization class, the code replaces the parameter's gradient with the gradient of the regularization:

param.grad = self.regularize(param)

Should it instead add the regularization gradient to the existing parameter gradient? i.e.:

param.grad.add_(self.regularize(param))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions