Fixed Acrobot Rewards (#140)

* Fixed Acrobot Rewards 1. Previously, reward was 0 until success, at which point the reward was set to -1. This is the opposite of the desired behavior, since it discourages the agent from success. 2. The reward was previusly set whenever the environment finished, whether by exceeding `max_steps` or by having the agent succeed. Thus it didn't really matter whether or not the agent did anything, so long as `max_steps` was exceeded before the `stop_condition`. Now the reward is only set when the agent succeeds. * Reward is now reset on `reset!`
JuliaReinforcementLearning · Apr 28, 2021 · 29bb58e · 29bb58e
1 parent 1c0d284
commit 29bb58e
Showing 1 changed file with 4 additions and 2 deletions.
diff --git a/src/environments/3rd_party/AcrobotEnv.jl b/src/environments/3rd_party/AcrobotEnv.jl
@@ -86,6 +86,7 @@ function RLBase.reset!(env::AcrobotEnv{T}) where {T <: Number}
     env.t = 0
     env.action = 2
     env.done = false
+    env.reward = -1
     nothing
 end
 
@@ -117,8 +118,9 @@ function (env::AcrobotEnv{T})(a) where {T <: Number}
     ns[4] = bound(ns[4], -env.params.max_vel_b, env.params.max_vel_b)
     env.state = ns
     # termination criterion
-    env.done = (-cos(ns[1]) - cos(ns[2] + ns[1]) > 1.0) || env.t > env.params.max_steps
-    env.reward = env.done ? -1.0 : 0.0
+    succeeded = -cos(ns[1]) - cos(ns[2] + ns[1]) > 1.0
+    env.done = succeeded || env.t > env.params.max_steps
+    env.reward = succeeded ? 0.0 : -1.0
     nothing
 end