This repository has been archived by the owner on May 6, 2021. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fixed Acrobot Rewards 1. Previously, reward was 0 until success, at which point the reward was set to -1. This is the opposite of the desired behavior, since it discourages the agent from success. 2. The reward was previusly set whenever the environment finished, whether by exceeding `max_steps` or by having the agent succeed. Thus it didn't really matter whether or not the agent did anything, so long as `max_steps` was exceeded before the `stop_condition`. Now the reward is only set when the agent succeeds. * Reward is now reset on `reset!`
- Loading branch information