From e3134621404d9b047579b0440d6c8813f1fc0c73 Mon Sep 17 00:00:00 2001 From: Evan Racah Date: Tue, 3 May 2022 15:19:52 -0700 Subject: [PATCH] Improve formatting of Implementation details for AGC (#970) --- composer/algorithms/agc/README.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/composer/algorithms/agc/README.md b/composer/algorithms/agc/README.md index 1211013a56..f5c5a636b2 100644 --- a/composer/algorithms/agc/README.md +++ b/composer/algorithms/agc/README.md @@ -59,15 +59,12 @@ trainer.fit() ### Implementation Details -AGC is implemented as follows. On `Event.AFTER_TRAIN_BATCH`: -1. For every parameter in the model that has gradients: - a. The L2 norm of the parameter is computed (normalized across rows for MLP's, across entire filters for CNN's, and across the entire vector for biases). - b. The L2 norm of each parameters corresponding gradients are computed in a similar fashion. - c. If a norm of some gradients is greater than the norm of the correpsonding weights multiplied by the `clipping_threshold`: - * Scale all the gradients that contributed to that norm by the clipping threshold multiplied by the ratio of weight norm to the gradient norm. - Otherwise: - * Keep those gradients the same. +AGC is implemented as follows: +On `Event.AFTER_TRAIN_BATCH`, for every parameter in the model that has gradients: +1. Compute the parameter's weight norm with an L2 norm (normalized across rows for MLP's, across entire filters for CNN's, and across the entire vector for biases). +2. Compute the parameter's gradient norm with an L2 norm. +3. If `grad_norm > weight_norm * clipping_threshold`, scale all the contributing gradients by `clipping_threshold * (weight_norm / grad_norm)`. ## Suggested Hyperparameters