-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and refactor SAC #985
Fix and refactor SAC #985
Conversation
|
Codecov Report
@@ Coverage Diff @@
## main #985 +/- ##
========================================
+ Coverage 0.00% 1.69% +1.69%
========================================
Files 212 213 +1
Lines 7445 7508 +63
========================================
+ Hits 0 127 +127
+ Misses 7445 7381 -64
|
ready for review |
LGTM. One note, I would bump the versions by a whole minor release, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
Correct, there you go. |
Sorry I keep finding compat entries that need updating before the tests can run. Maybe wait for the tests to be green before approving. |
Following #970, I took it up on me to fix this (@gggoes you don't need to do it, but you can review the PR if you want). I did it because I'll actually need SAC but also because I think RL.jl is so obscure in it's internal workings that it's probably way too much work for someone who's not already into deep like me.
Anyway. To fix SAC I had to implement a variant of the GaussianNetwork, I called it SoftGaussianNetwork but it's not a very good name because there's nothing soft about it (I'm open to alternative ideas). The difference between the two is that SoftGaussianNetwork is differentiable through the reparameterization trick (action sampling) and that it necessarily uses a tanh squash function. The latter could have been optional but I decided to stick with the algorithm described in the literature for now. A corollary of the discussion in #970 is that GaussianNetwork had an incorrect logpdf since I removed the tanh activation. It now accepts both identity and tanh and will have the correct output in both cases.
The batch_size => batchsize changes are me being annoyed that we sometimes had one and sometimes the other, so I changed them all to batchsize.
PR Checklist
closes #970.