Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and refactor SAC #985

Merged
merged 25 commits into from
Oct 12, 2023
Merged

Fix and refactor SAC #985

merged 25 commits into from
Oct 12, 2023

Conversation

HenriDeh
Copy link
Member

Following #970, I took it up on me to fix this (@gggoes you don't need to do it, but you can review the PR if you want). I did it because I'll actually need SAC but also because I think RL.jl is so obscure in it's internal workings that it's probably way too much work for someone who's not already into deep like me.

Anyway. To fix SAC I had to implement a variant of the GaussianNetwork, I called it SoftGaussianNetwork but it's not a very good name because there's nothing soft about it (I'm open to alternative ideas). The difference between the two is that SoftGaussianNetwork is differentiable through the reparameterization trick (action sampling) and that it necessarily uses a tanh squash function. The latter could have been optional but I decided to stick with the algorithm described in the literature for now. A corollary of the discussion in #970 is that GaussianNetwork had an incorrect logpdf since I removed the tanh activation. It now accepts both identity and tanh and will have the correct output in both cases.

The batch_size => batchsize changes are me being annoyed that we sometimes had one and sometimes the other, so I changed them all to batchsize.

PR Checklist

  • Update NEWS.md?
  • Unit tests for all structs / functions?
  • Integration and correctness tests using a simple env?
  • PR Review?
  • Add or update documentation?
  • Write docstrings for new methods?

closes #970.

@HenriDeh
Copy link
Member Author

HenriDeh commented Oct 10, 2023

@jeremiahpslewis CI for RLCore fails but it's related to multi-agent stuff that I did not touch
Okay it was my bad. Looks fixed.

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Merging #985 (9a37ef4) into main (3b301af) will increase coverage by 1.69%.
Report is 1 commits behind head on main.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##            main    #985      +/-   ##
========================================
+ Coverage   0.00%   1.69%   +1.69%     
========================================
  Files        212     213       +1     
  Lines       7445    7508      +63     
========================================
+ Hits           0     127     +127     
+ Misses      7445    7381      -64     
Files Coverage Δ
...inforcementLearningCore/src/utils/distributions.jl 0.00% <ø> (ø)
...forcementLearningCore/test/core/stop_conditions.jl 0.00% <ø> (ø)
...nforcementLearningCore/test/utils/distributions.jl 0.00% <ø> (ø)
...c/ReinforcementLearningCore/test/utils/networks.jl 0.00% <ø> (ø)
...ments/experiments/CFR/JuliaRL_DeepCFR_OpenSpiel.jl 0.00% <ø> (ø)
...eps/experiments/experiments/DQN/DQN_CartPoleGPU.jl 0.00% <ø> (ø)
...ments/experiments/DQN/JuliaRL_BasicDQN_CartPole.jl 0.00% <ø> (ø)
...ts/experiments/DQN/JuliaRL_BasicDQN_MountainCar.jl 0.00% <ø> (ø)
...periments/DQN/JuliaRL_BasicDQN_PendulumDiscrete.jl 0.00% <ø> (ø)
...ments/DQN/JuliaRL_BasicDQN_SingleRoomUndirected.jl 0.00% <ø> (ø)
... and 62 more

... and 8 files with indirect coverage changes

@HenriDeh
Copy link
Member Author

ready for review

@jeremiahpslewis
Copy link
Member

LGTM. One note, I would bump the versions by a whole minor release, e.g. 0.14 -> 0.15, as the batchsize renaming is definitely a breaking change.

Copy link
Member

@jeremiahpslewis jeremiahpslewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@HenriDeh
Copy link
Member Author

LGTM. One note, I would bump the versions by a whole minor release, e.g. 0.14 -> 0.15, as the batchsize renaming is definitely a breaking change.

Correct, there you go.

@HenriDeh
Copy link
Member Author

Sorry I keep finding compat entries that need updating before the tests can run. Maybe wait for the tests to be green before approving.

@jeremiahpslewis jeremiahpslewis merged commit e772d6f into main Oct 12, 2023
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fixing SAC Policy
2 participants