the necessity of copying current_q? #3

Bennett561 · 2021-09-17T16:24:37Z

I dont quite see the necessity of copying current_q in func train (dqn_agent).. since you just need the shape of it, right, why not use np.zeros_like...?

    def train(self, batch):
        """
        Trains the underlying network with a batch of gameplay experiences to
        help it better predict the Q values.
        :param batch: a batch of gameplay experiences
        :return: training loss
        """
        state_batch, next_state_batch, action_batch, reward_batch, done_batch \
            = batch
        current_q = self.q_net(state_batch).numpy()
        target_q = np.copy(current_q)
        next_q = self.target_q_net(next_state_batch).numpy()
        max_next_q = np.amax(next_q, axis=1)
        for i in range(state_batch.shape[0]):
            target_q_val = reward_batch[i]
            if not done_batch[i]:
                target_q_val += 0.95 * max_next_q[i]
            target_q[i][action_batch[i]] = target_q_val
        training_history = self.q_net.fit(x=state_batch, y=target_q, verbose=0)
        loss = training_history.history['loss']
        return loss

SebastianKotstein · 2022-02-11T11:18:34Z

According to my understanding the step of calculating current Q-values matrix and copying them into target Q-Values matrix is required since the following for loop updates only those elements of the target Q-Values matrix whose indices match the chosen actions. Therefore, all other elements of the target Q-Values matrix must be equal to the values of the current Q-values matrix, otherwise, for instance, if target Q-Values would be initialized with zeros or random values, the optimizer would try to fit the model against these zero or random values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the necessity of copying current_q? #3

the necessity of copying current_q? #3

Bennett561 commented Sep 17, 2021

SebastianKotstein commented Feb 11, 2022 •

edited

Loading

the necessity of copying current_q? #3

the necessity of copying current_q? #3

Comments

Bennett561 commented Sep 17, 2021

SebastianKotstein commented Feb 11, 2022 • edited Loading

SebastianKotstein commented Feb 11, 2022 •

edited

Loading