Fix and refactor SAC #985

HenriDeh · 2023-10-10T14:47:05Z

Following #970, I took it up on me to fix this (@gggoes you don't need to do it, but you can review the PR if you want). I did it because I'll actually need SAC but also because I think RL.jl is so obscure in it's internal workings that it's probably way too much work for someone who's not already into deep like me.

Anyway. To fix SAC I had to implement a variant of the GaussianNetwork, I called it SoftGaussianNetwork but it's not a very good name because there's nothing soft about it (I'm open to alternative ideas). The difference between the two is that SoftGaussianNetwork is differentiable through the reparameterization trick (action sampling) and that it necessarily uses a tanh squash function. The latter could have been optional but I decided to stick with the algorithm described in the literature for now. A corollary of the discussion in #970 is that GaussianNetwork had an incorrect logpdf since I removed the tanh activation. It now accepts both identity and tanh and will have the correct output in both cases.

The batch_size => batchsize changes are me being annoyed that we sometimes had one and sometimes the other, so I changed them all to batchsize.

PR Checklist

Update NEWS.md?
Unit tests for all structs / functions?
Integration and correctness tests using a simple env?
PR Review?
Add or update documentation?
Write docstrings for new methods?

closes #970.

HenriDeh · 2023-10-10T15:41:45Z

~~@jeremiahpslewis CI for RLCore fails but it's related to multi-agent stuff that I did not touch~~
Okay it was my bad. Looks fixed.

codecov · 2023-10-11T12:24:18Z

Codecov Report

Merging #985 (9a37ef4) into main (3b301af) will increase coverage by 1.69%.
Report is 1 commits behind head on main.
The diff coverage is 0.00%.

@@           Coverage Diff            @@
##            main    #985      +/-   ##
========================================
+ Coverage   0.00%   1.69%   +1.69%     
========================================
  Files        212     213       +1     
  Lines       7445    7508      +63     
========================================
+ Hits           0     127     +127     
+ Misses      7445    7381      -64

Files	Coverage Δ
...inforcementLearningCore/src/utils/distributions.jl	`0.00% <ø> (ø)`
...forcementLearningCore/test/core/stop_conditions.jl	`0.00% <ø> (ø)`
...nforcementLearningCore/test/utils/distributions.jl	`0.00% <ø> (ø)`
...c/ReinforcementLearningCore/test/utils/networks.jl	`0.00% <ø> (ø)`
...ments/experiments/CFR/JuliaRL_DeepCFR_OpenSpiel.jl	`0.00% <ø> (ø)`
...eps/experiments/experiments/DQN/DQN_CartPoleGPU.jl	`0.00% <ø> (ø)`
...ments/experiments/DQN/JuliaRL_BasicDQN_CartPole.jl	`0.00% <ø> (ø)`
...ts/experiments/DQN/JuliaRL_BasicDQN_MountainCar.jl	`0.00% <ø> (ø)`
...periments/DQN/JuliaRL_BasicDQN_PendulumDiscrete.jl	`0.00% <ø> (ø)`
...ments/DQN/JuliaRL_BasicDQN_SingleRoomUndirected.jl	`0.00% <ø> (ø)`
... and 62 more

... and 8 files with indirect coverage changes

HenriDeh · 2023-10-11T14:10:03Z

ready for review

src/ReinforcementLearningDatasets/README.md

jeremiahpslewis · 2023-10-12T10:27:51Z

LGTM. One note, I would bump the versions by a whole minor release, e.g. 0.14 -> 0.15, as the batchsize renaming is definitely a breaking change.

jeremiahpslewis

See above

Co-authored-by: Jeremiah <[email protected]>

HenriDeh · 2023-10-12T10:40:31Z

LGTM. One note, I would bump the versions by a whole minor release, e.g. 0.14 -> 0.15, as the batchsize renaming is definitely a breaking change.

Correct, there you go.

…einforcementLearning.jl into sac

HenriDeh · 2023-10-12T13:21:11Z

Sorry I keep finding compat entries that need updating before the tests can run. Maybe wait for the tests to be green before approving.

HenriDeh added 10 commits October 10, 2023 11:12

make softgaussian

3342071

add tanh

fd13e6f

Update docstring

0e54d98

fixing SAC

3d570e6

enable tests

cec1be1

Improve correctness of GaussianNetwork

d9c8435

update CUDA

6c169d9

use the new TargetNetwork

5019b7f

fix test

3b6ac40

fix diaglogpdf

365201c

HenriDeh requested a review from jeremiahpslewis October 10, 2023 15:41

HenriDeh added 2 commits October 11, 2023 13:59

fix tests

cf89b5c

RLCore

9a37ef4

HenriDeh added 3 commits October 11, 2023 14:26

bump versions and compats

5a40538

Core import

3417284

reomve DomainSets 0.7 compat

e233f39

jeremiahpslewis reviewed Oct 12, 2023

View reviewed changes

src/ReinforcementLearningDatasets/README.md Outdated Show resolved Hide resolved

jeremiahpslewis requested changes Oct 12, 2023

View reviewed changes

HenriDeh and others added 5 commits October 12, 2023 12:37

Update src/ReinforcementLearningDatasets/README.md

62f30dd

Co-authored-by: Jeremiah <[email protected]>

Update Project.toml

26bcf97

Update Project.toml

0573ca0

Update Project.toml

a65d172

Update Project.toml

65c5529

Merge branch 'main' into sac

2faf648

jeremiahpslewis previously approved these changes Oct 12, 2023

View reviewed changes

HenriDeh added 2 commits October 12, 2023 14:34

Bump compat

e5ed79d

Merge branch 'sac' of https://github.com/JuliaReinforcementLearning/R…

afd21b8

…einforcementLearning.jl into sac

HenriDeh dismissed jeremiahpslewis’s stale review via afd21b8 October 12, 2023 12:35

Update Project.toml

126fce1

jeremiahpslewis previously approved these changes Oct 12, 2023

View reviewed changes

HenriDeh dismissed jeremiahpslewis’s stale review via 03e87cb October 12, 2023 13:19

Update Project.toml

03e87cb

HenriDeh requested a review from jeremiahpslewis October 12, 2023 13:57

jeremiahpslewis approved these changes Oct 12, 2023

View reviewed changes

jeremiahpslewis merged commit e772d6f into main Oct 12, 2023
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and refactor SAC #985

Fix and refactor SAC #985

HenriDeh commented Oct 10, 2023

HenriDeh commented Oct 10, 2023 •

edited

Loading

codecov bot commented Oct 11, 2023

HenriDeh commented Oct 11, 2023

jeremiahpslewis commented Oct 12, 2023

jeremiahpslewis left a comment

HenriDeh commented Oct 12, 2023

HenriDeh commented Oct 12, 2023

Fix and refactor SAC #985

Fix and refactor SAC #985

Conversation

HenriDeh commented Oct 10, 2023

HenriDeh commented Oct 10, 2023 • edited Loading

codecov bot commented Oct 11, 2023

Codecov Report

HenriDeh commented Oct 11, 2023

jeremiahpslewis commented Oct 12, 2023

jeremiahpslewis left a comment

Choose a reason for hiding this comment

HenriDeh commented Oct 12, 2023

HenriDeh commented Oct 12, 2023

HenriDeh commented Oct 10, 2023 •

edited

Loading