Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the necessity of copying current_q? #3

Open
Bennett561 opened this issue Sep 17, 2021 · 1 comment
Open

the necessity of copying current_q? #3

Bennett561 opened this issue Sep 17, 2021 · 1 comment

Comments

@Bennett561
Copy link

I dont quite see the necessity of copying current_q in func train (dqn_agent).. since you just need the shape of it, right, why not use np.zeros_like...?

    def train(self, batch):
        """
        Trains the underlying network with a batch of gameplay experiences to
        help it better predict the Q values.
        :param batch: a batch of gameplay experiences
        :return: training loss
        """
        state_batch, next_state_batch, action_batch, reward_batch, done_batch \
            = batch
        current_q = self.q_net(state_batch).numpy()
        target_q = np.copy(current_q)
        next_q = self.target_q_net(next_state_batch).numpy()
        max_next_q = np.amax(next_q, axis=1)
        for i in range(state_batch.shape[0]):
            target_q_val = reward_batch[i]
            if not done_batch[i]:
                target_q_val += 0.95 * max_next_q[i]
            target_q[i][action_batch[i]] = target_q_val
        training_history = self.q_net.fit(x=state_batch, y=target_q, verbose=0)
        loss = training_history.history['loss']
        return loss
@SebastianKotstein
Copy link

SebastianKotstein commented Feb 11, 2022

According to my understanding the step of calculating current Q-values matrix and copying them into target Q-Values matrix is required since the following for loop updates only those elements of the target Q-Values matrix whose indices match the chosen actions. Therefore, all other elements of the target Q-Values matrix must be equal to the values of the current Q-values matrix, otherwise, for instance, if target Q-Values would be initialized with zeros or random values, the optimizer would try to fit the model against these zero or random values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants