We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
博主您好,近来我也学习了莫烦的强化学习课程并改写了tf的代码为torch。 近期参考了您的代码,但有一点小疑问。在计算td_error的时候,为什么q_v = self.Critic_eval(bs,ba)中的输入是ba而不是从actor-eval中输出的action,然后这两个action有什么区别呢? 代码如下:
a_ = self.Actor_target(bs_) # 这个网络不及时更新参数, 用于预测 Critic 的 Q_target 中的 action q_ = self.Critic_target(bs_,a_) # 这个网络不及时更新参数, 用于给出 Actor 更新参数时的 Gradient ascent 强度 q_target = br+GAMMA*q_ # q_target = 负的 #print(q_target) q_v = self.Critic_eval(bs,ba) #print(q_v) td_error = self.loss_td(q_target,q_v)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
博主您好,近来我也学习了莫烦的强化学习课程并改写了tf的代码为torch。
近期参考了您的代码,但有一点小疑问。在计算td_error的时候,为什么q_v = self.Critic_eval(bs,ba)中的输入是ba而不是从actor-eval中输出的action,然后这两个action有什么区别呢?
代码如下:
The text was updated successfully, but these errors were encountered: