From e3134621404d9b047579b0440d6c8813f1fc0c73 Mon Sep 17 00:00:00 2001
From: Evan Racah <ejracah@gmail.com>
Date: Tue, 3 May 2022 15:19:52 -0700
Subject: [PATCH] Improve formatting of Implementation details for AGC (#970)

---
 composer/algorithms/agc/README.md | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/composer/algorithms/agc/README.md b/composer/algorithms/agc/README.md
index 1211013a56..f5c5a636b2 100644
--- a/composer/algorithms/agc/README.md
+++ b/composer/algorithms/agc/README.md
@@ -59,15 +59,12 @@ trainer.fit()
 
 ### Implementation Details
 
-AGC is implemented as follows. On `Event.AFTER_TRAIN_BATCH`:
-1. For every parameter in the model that has gradients:
-    a. The L2 norm of the parameter is computed (normalized across rows for MLP's, across entire filters for CNN's, and across the entire vector for biases).
-    b. The L2 norm of each parameters corresponding gradients are computed in a similar fashion.
-    c. If a norm of some gradients is greater than the norm of the correpsonding weights multiplied by the `clipping_threshold`:
-            * Scale all the gradients that contributed to that norm by the clipping threshold multiplied by the ratio of weight norm to the gradient norm.
-        Otherwise:
-            * Keep those gradients the same.
+AGC is implemented as follows:
 
+On `Event.AFTER_TRAIN_BATCH`, for every parameter in the model that has gradients:
+1. Compute the parameter's weight norm with an L2 norm (normalized across rows for MLP's, across entire filters for CNN's, and across the entire vector for biases).
+2. Compute the parameter's gradient norm with an L2 norm.
+3. If `grad_norm > weight_norm * clipping_threshold`, scale all the contributing gradients by `clipping_threshold * (weight_norm / grad_norm)`. 
 
 
 ## Suggested Hyperparameters