![]() |
---|
10000 episodes and maximum cutoff of 100 using FTA |
![]() |
---|
10000 episodes and maximum cutoff of 100 using ReLU |
![]() |
---|
Using ReLU activation function with 50000 episodes |
![]() |
---|
First experiment, DQN with soft update of target policy |
In my next experiments, Instead of using the soft update of the target network's weights, I used the method of replacing the target weights with the policy network weights. It got better but still was very noisy. I didn't test this version with a larger number of episodes. Instead, I reduced the maximum timesteps for each episode. The default maximum length of an episode is 200 for the Taxi environment, but I reduced it to 100, and the learning process improved.
![]() |
---|
Second experiment, changing the soft update |