-
Notifications
You must be signed in to change notification settings - Fork 77
Open
Description
Hello
Did anybody successfully train using this code? We don't get the pinball (VideoPinball-v0) to do usefull stuff.
There seems to be a subtle bug in the calculation of the loss function. According to the nature paper (see Algorithm 1) the Q-Value of the target function should be the maximum. However in the code dqn in function doMinibatch (line 122)
its
q_target_max = np.argmax(q_target, axis=1)
and thus not the maximum. Shouldn't that be
q_target_max = np.amax(q_target, axis=1)
Cheers,
Oliver
Metadata
Metadata
Assignees
Labels
No labels