Skip to content

Commit d77b56a

Browse files
committed
fix doc
Signed-off-by: ruit <[email protected]>
1 parent 2ea47c1 commit d77b56a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/guides/grpo.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ RL generations typically produce highly variable sequence lengths, which result
221221
We use the [ClippedPGLossFn](../../nemo_rl/algorithms/loss_functions.py) to calculate the loss for GRPO. Formally,
222222

223223
$$
224-
L(\theta) = E_{x \sim \pi_{\theta_{\¬text{old}}}} \Big[ \min \Big(\frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}A_t, \text{clip} \big( \frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}, 1 - \varepsilon, 1 + \varepsilon \big) A_t \Big) \Big] - \beta D_{\text{KL}} (\pi_\theta \| \pi_\text{ref})
224+
L(\theta) = E_{x \sim \pi_{\theta_{\text{old}}}} \Big[ \min \Big(\frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}A_t, \text{clip} \big( \frac{\pi_\theta(x)}{\pi_{\theta_{\text{old}}}(x)}, 1 - \varepsilon, 1 + \varepsilon \big) A_t \Big) \Big] - \beta D_{\text{KL}} (\pi_\theta \| \pi_\text{ref})
225225
$$
226226

227227
where:
@@ -391,4 +391,4 @@ We use this to track if our models are entropy-collapsing too quickly during tra
391391

392392
## Evaluate the Trained Model
393393

394-
Upon completion of the training process, you can refer to our [evaluation guide](eval.md) to assess model capabilities.
394+
Upon completion of the training process, you can refer to our [evaluation guide](eval.md) to assess model capabilities.

0 commit comments

Comments
 (0)