Skip to content

Commit

Permalink
use log1p in orpo loss
Browse files Browse the repository at this point in the history
  • Loading branch information
hiyouga committed Mar 31, 2024
1 parent 099db6a commit 68aaa49
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/llmtuner/train/orpo/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def odds_ratio_loss(

# Derived from Eqs. (4) and (7) from https://arxiv.org/abs/2403.07691 by using log identities and exp(log(P(y|x)) = P(y|x)
log_odds = (chosen_logps - rejected_logps) - (
torch.log(1 - torch.exp(chosen_logps)) - torch.log(1 - torch.exp(rejected_logps))
torch.log1p(-torch.exp(chosen_logps)) - torch.log1p(-torch.exp(rejected_logps))
)
ratio = F.logsigmoid(log_odds)
losses = self.beta * ratio
Expand Down

0 comments on commit 68aaa49

Please sign in to comment.