rl-playground/DQN at main · arya-ebrahimi/rl-playground · GitHub

Name		Name	Last commit message	Last commit date
parent directory ..
.models		.models
.rewards		.rewards
images		images
runs		runs
README.md		README.md
dqn.py		dqn.py
fta.py		fta.py
test.py		test.py
test_fta.py		test_fta.py
train.py		train.py

README.md

DQN on gym Taxi environment

Comparison of ReLU and FTA


10000 episodes and maximum cutoff of 100 using FTA


10000 episodes and maximum cutoff of 100 using ReLU

Final Results using ReLU


Using ReLU activation function with 50000 episodes

My first experiment was a DQN similar to the one proposed in Pytorch, but it didn't learn anything.


First experiment, DQN with soft update of target policy

In my next experiments, Instead of using the soft update of the target network's weights, I used the method of replacing the target weights with the policy network weights. It got better but still was very noisy. I didn't test this version with a larger number of episodes. Instead, I reduced the maximum timesteps for each episode. The default maximum length of an episode is 200 for the Taxi environment, but I reduced it to 100, and the learning process improved.


Second experiment, changing the soft update