Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss fn might be wrong #1

Open
KangOxford opened this issue Sep 2, 2024 · 1 comment
Open

Loss fn might be wrong #1

KangOxford opened this issue Sep 2, 2024 · 1 comment

Comments

@KangOxford
Copy link

image
image
image
They use the common ppo loss updating. So the algorithm is actually maximise the rewards. But the loss make the experts to be closer to zeros, in this case, the logit (discriminator.forward()) will be 0 if it is an expert behaviour. The logit will be 1 if it is a fake behaviour. The reward is actually the log_logit, would be bigger if it is fake.

@KangOxford
Copy link
Author

the generated is compared with ones
and
the expert is compared with zeros

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant