Replies: 2 comments 3 replies
-
I simply moved the implementation from ReinforcementLearningEnvironmentClassicControl.jl |
Beta Was this translation helpful? Give feedback.
-
Another possible issue: refer to the outer constructor of PendulumEnv. The function _step!(env::PendulumEnv, a)
env.t += 1
th, thdot = env.state # two state variables Also in the outer constructor itself: env = PendulumEnv(
PendulumEnvParams(max_speed, max_torque, g, m, l, dt, max_steps),
action_space,
Space(ClosedInterval{T}.(-high, high)),
zeros(T, 2), # two state variables Did I miss anything? Fortunately, in the experiment like JuliaRL_DDPG_Pendulum, the number of state variables is obtained by It seems that most RL algorithms do not touch |
Beta Was this translation helpful? Give feedback.
-
This may be not a serious issue, and I thus decide to put it in the Discussions part. Currently, I am trying to implement a new environment and have referred to the PendulumEnv.jl for an example. (I have a background mainly in control theory, which is closely related to RL though.)
The
_step!
function seems interesting to me. The basic interaction in RL is(s, a) --> (s', r)
, wherer
denotes the reward we get by taking actiona
at states
. However, in_step!
, the reward (i.e.,-costs
therein) is calculated beforea
is applied: it depends on the old states
and the actiona
.In my opinion, it seems to be an improper choice, and the reward should depend on
s'
anda
instead (i.e., aftera
updates the environment). You can imagine a specific scenario for the underlying rationality. In the above line computingcosts
, the sign ofa
does not matter, which means, even if you apply a reverse force (torque), you still get the same reward, which is not what we have expected. (This tutorial explains the dynamics of a simple pendulum.)I can make a PR possibly next week if you think the above statement is reasonable.
Beta Was this translation helpful? Give feedback.
All reactions