Update order and No. of state variables in the Pendulum environment #475

ShuhuaGao · 2021-08-20T04:50:00Z

ShuhuaGao
Aug 20, 2021

This may be not a serious issue, and I thus decide to put it in the Discussions part. Currently, I am trying to implement a new environment and have referred to the PendulumEnv.jl for an example. (I have a background mainly in control theory, which is closely related to RL though.)

The _step! function seems interesting to me. The basic interaction in RL is (s, a) --> (s', r), where r denotes the reward we get by taking action a at state s. However, in _step!, the reward (i.e., -costs therein) is calculated before a is applied: it depends on the old state s and the action a.

    costs = angle_normalize(th)^2 + 0.1 * thdot^2 + 0.001 * a^2

In my opinion, it seems to be an improper choice, and the reward should depend on s' and a instead (i.e., after a updates the environment). You can imagine a specific scenario for the underlying rationality. In the above line computing costs, the sign of a does not matter, which means, even if you apply a reverse force (torque), you still get the same reward, which is not what we have expected. (This tutorial explains the dynamics of a simple pendulum.)

I can make a PR possibly next week if you think the above statement is reasonable.

findmyway · 2021-08-20T05:10:48Z

findmyway
Aug 20, 2021
Maintainer

I simply moved the implementation from ReinforcementLearningEnvironmentClassicControl.jl
here years ago. Not to bother @jbrea , but I'm no expert in control theory. What do you think?

2 replies

jbrea Aug 30, 2021
Maintainer

I think back than I translated it pretty much without thinking directly from OpenAI Gym https://github.com/openai/gym/blob/master/gym/envs/classic_control/pendulum.py to have something that works exactly the same as their classic environments. I would proceed the following way:

If I did a mistake in the translation it would be good to fix it.
If the current code matches the OpenAI Gym version, but you think it would be good to have an alternative implementation, I would suggest to add the alternative implementation as an option (e.g. version = :default and keep the current version as version = :open_ai_gym.).

ShuhuaGao Aug 30, 2021
Author

Hi, @jbrea . Thanks for your reply. I believe the changes I proposed have only a marginal impact on the RL performance if any. So let's keep your old version to be consistent with OpenAI Gym.

ShuhuaGao · 2021-08-20T07:28:22Z

ShuhuaGao
Aug 20, 2021
Author

Another possible issue: refer to the outer constructor of PendulumEnv.

The observation_space is assigned by Space(ClosedInterval{T}.(-high, high)), but high is a vector of length 3 implying that we have a state vector of length 3. There are actually only two state variables (i.e., the angular position th and velocity thdot) as shown in the following line:

function _step!(env::PendulumEnv, a)
    env.t += 1
    th, thdot = env.state  # two state variables

Also in the outer constructor itself:

    env = PendulumEnv(
        PendulumEnvParams(max_speed, max_torque, g, m, l, dt, max_steps),
        action_space,
        Space(ClosedInterval{T}.(-high, high)),
        zeros(T, 2),  # two state variables

Did I miss anything?

Fortunately, in the experiment like JuliaRL_DDPG_Pendulum, the number of state variables is obtained by ns = length(state(inner_env)) instead of from state_space.

It seems that most RL algorithms do not touch RLBase.state_space at all.

1 reply

findmyway Aug 20, 2021
Maintainer

That's definitely a bug. A PR is welcomed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update order and No. of state variables in the Pendulum environment #475

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Update order and No. of state variables in the Pendulum environment #475

ShuhuaGao Aug 20, 2021

Replies: 2 comments · 3 replies

findmyway Aug 20, 2021 Maintainer

jbrea Aug 30, 2021 Maintainer

ShuhuaGao Aug 30, 2021 Author

ShuhuaGao Aug 20, 2021 Author

findmyway Aug 20, 2021 Maintainer

ShuhuaGao
Aug 20, 2021

Replies: 2 comments 3 replies

findmyway
Aug 20, 2021
Maintainer

jbrea Aug 30, 2021
Maintainer

ShuhuaGao Aug 30, 2021
Author

ShuhuaGao
Aug 20, 2021
Author

findmyway Aug 20, 2021
Maintainer