Skip to content

关于参数更新的一些小疑问 #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Zwette opened this issue Aug 2, 2020 · 0 comments
Open

关于参数更新的一些小疑问 #1

Zwette opened this issue Aug 2, 2020 · 0 comments

Comments

@Zwette
Copy link

Zwette commented Aug 2, 2020

博主您好,近来我也学习了莫烦的强化学习课程并改写了tf的代码为torch。
近期参考了您的代码,但有一点小疑问。在计算td_error的时候,为什么q_v = self.Critic_eval(bs,ba)中的输入是ba而不是从actor-eval中输出的action,然后这两个action有什么区别呢?
代码如下:

a_ = self.Actor_target(bs_)  # 这个网络不及时更新参数, 用于预测 Critic 的 Q_target 中的 action
q_ = self.Critic_target(bs_,a_)  # 这个网络不及时更新参数, 用于给出 Actor 更新参数时的 Gradient ascent 强度
q_target = br+GAMMA*q_  # q_target = 负的
#print(q_target)
q_v = self.Critic_eval(bs,ba)
#print(q_v)
td_error = self.loss_td(q_target,q_v)
@Zwette Zwette closed this as completed Aug 2, 2020
@Zwette Zwette reopened this Aug 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant