10000 episodes and maximum cutoff of 100 using FTA |
10000 episodes and maximum cutoff of 100 using ReLU |
Using ReLU activation function with 50000 episodes |
First experiment, DQN with soft update of target policy |
In my next experiments, Instead of using the soft update of the target network's weights, I used the method of replacing the target weights with the policy network weights. It got better but still was very noisy. I didn't test this version with a larger number of episodes. Instead, I reduced the maximum timesteps for each episode. The default maximum length of an episode is 200 for the Taxi environment, but I reduced it to 100, and the learning process improved.
Second experiment, changing the soft update |