fix doc

RayenTian · RayenTian · commit d77b56ad967a · 2025-11-30T22:35:05.000-08:00
Signed-off-by: ruit &lt;ruit@nvidia.com&gt;
diff --git a/docs/guides/grpo.md b/docs/guides/grpo.md
@@ -221,7 +221,7 @@ RL generations typically produce highly variable sequence lengths, which result
 We use the [ClippedPGLossFn](../../nemo_rl/algorithms/loss_functions.py) to calculate the loss for GRPO. Formally,
 
 $$
-L(\theta) = E_{x \sim \pi_{\theta_{\¬text{old}}}} \Big[ \min \Big(\frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}A_t, \text{clip} \big( \frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}, 1 - \varepsilon, 1 + \varepsilon \big) A_t \Big) \Big] - \beta D_{\text{KL}} (\pi_\theta \| \pi_\text{ref})
+L(\theta) = E_{x \sim \pi_{\theta_{\text{old}}}} \Big[ \min \Big(\frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}A_t, \text{clip} \big( \frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}, 1 - \varepsilon, 1 + \varepsilon \big) A_t \Big) \Big] - \beta D_{\text{KL}} (\pi_\theta \| \pi_\text{ref})
 $$
 
 where:
@@ -391,4 +391,4 @@ We use this to track if our models are entropy-collapsing too quickly during tra
 
 ## Evaluate the Trained Model
 
-Upon completion of the training process, you can refer to our [evaluation guide](eval.md) to assess model capabilities.
+Upon completion of the training process, you can refer to our [evaluation guide](eval.md) to assess model capabilities.