You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I dont quite see the necessity of copying current_q in func train (dqn_agent).. since you just need the shape of it, right, why not use np.zeros_like...?
deftrain(self, batch):
""" Trains the underlying network with a batch of gameplay experiences to help it better predict the Q values. :param batch: a batch of gameplay experiences :return: training loss """state_batch, next_state_batch, action_batch, reward_batch, done_batch \
=batchcurrent_q=self.q_net(state_batch).numpy()
target_q=np.copy(current_q)
next_q=self.target_q_net(next_state_batch).numpy()
max_next_q=np.amax(next_q, axis=1)
foriinrange(state_batch.shape[0]):
target_q_val=reward_batch[i]
ifnotdone_batch[i]:
target_q_val+=0.95*max_next_q[i]
target_q[i][action_batch[i]] =target_q_valtraining_history=self.q_net.fit(x=state_batch, y=target_q, verbose=0)
loss=training_history.history['loss']
returnloss
The text was updated successfully, but these errors were encountered:
According to my understanding the step of calculating current Q-values matrix and copying them into target Q-Values matrix is required since the following for loop updates only those elements of the target Q-Values matrix whose indices match the chosen actions. Therefore, all other elements of the target Q-Values matrix must be equal to the values of the current Q-values matrix, otherwise, for instance, if target Q-Values would be initialized with zeros or random values, the optimizer would try to fit the model against these zero or random values.
I dont quite see the necessity of copying current_q in func train (dqn_agent).. since you just need the shape of it, right, why not use np.zeros_like...?
The text was updated successfully, but these errors were encountered: