Skip to content

Commit

Permalink
Improve formatting of Implementation details for AGC (#970)
Browse files Browse the repository at this point in the history
  • Loading branch information
eracah authored May 3, 2022
1 parent a950de6 commit e313462
Showing 1 changed file with 5 additions and 8 deletions.
13 changes: 5 additions & 8 deletions composer/algorithms/agc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,12 @@ trainer.fit()

### Implementation Details

AGC is implemented as follows. On `Event.AFTER_TRAIN_BATCH`:
1. For every parameter in the model that has gradients:
a. The L2 norm of the parameter is computed (normalized across rows for MLP's, across entire filters for CNN's, and across the entire vector for biases).
b. The L2 norm of each parameters corresponding gradients are computed in a similar fashion.
c. If a norm of some gradients is greater than the norm of the correpsonding weights multiplied by the `clipping_threshold`:
* Scale all the gradients that contributed to that norm by the clipping threshold multiplied by the ratio of weight norm to the gradient norm.
Otherwise:
* Keep those gradients the same.
AGC is implemented as follows:

On `Event.AFTER_TRAIN_BATCH`, for every parameter in the model that has gradients:
1. Compute the parameter's weight norm with an L2 norm (normalized across rows for MLP's, across entire filters for CNN's, and across the entire vector for biases).
2. Compute the parameter's gradient norm with an L2 norm.
3. If `grad_norm > weight_norm * clipping_threshold`, scale all the contributing gradients by `clipping_threshold * (weight_norm / grad_norm)`.


## Suggested Hyperparameters
Expand Down

0 comments on commit e313462

Please sign in to comment.