Skip to content

Wrong Loss Function? #7

@oduerr

Description

@oduerr

Hello

Did anybody successfully train using this code? We don't get the pinball (VideoPinball-v0) to do usefull stuff.

There seems to be a subtle bug in the calculation of the loss function. According to the nature paper (see Algorithm 1) the Q-Value of the target function should be the maximum. However in the code dqn in function doMinibatch (line 122)

its

q_target_max = np.argmax(q_target, axis=1)

and thus not the maximum. Shouldn't that be

q_target_max =  np.amax(q_target, axis=1)

Cheers,
Oliver

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions