Skip to content
This repository has been archived by the owner on May 6, 2021. It is now read-only.

Commit

Permalink
Fixed Acrobot Rewards (#140)
Browse files Browse the repository at this point in the history
* Fixed Acrobot Rewards

1. Previously, reward was 0 until success, at which point the reward was set to -1. This is the opposite of the desired behavior, since it discourages the agent from success.
2. The reward was previusly set whenever the environment finished, whether by exceeding `max_steps` or by having the agent succeed. Thus it didn't really matter whether or not the agent did anything, so long as `max_steps` was exceeded before the `stop_condition`. Now the reward is only set when the agent succeeds.

* Reward is now reset on `reset!`
  • Loading branch information
JBoerma authored Apr 28, 2021
1 parent 1c0d284 commit 29bb58e
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions src/environments/3rd_party/AcrobotEnv.jl
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ function RLBase.reset!(env::AcrobotEnv{T}) where {T <: Number}
env.t = 0
env.action = 2
env.done = false
env.reward = -1
nothing
end

Expand Down Expand Up @@ -117,8 +118,9 @@ function (env::AcrobotEnv{T})(a) where {T <: Number}
ns[4] = bound(ns[4], -env.params.max_vel_b, env.params.max_vel_b)
env.state = ns
# termination criterion
env.done = (-cos(ns[1]) - cos(ns[2] + ns[1]) > 1.0) || env.t > env.params.max_steps
env.reward = env.done ? -1.0 : 0.0
succeeded = -cos(ns[1]) - cos(ns[2] + ns[1]) > 1.0
env.done = succeeded || env.t > env.params.max_steps
env.reward = succeeded ? 0.0 : -1.0
nothing
end

Expand Down

0 comments on commit 29bb58e

Please sign in to comment.