Reinforcement learning project.
The environment is Cliff Walking, the detailed information can be read in [A3.pdf].
The experiment shows that Sarsa method tends to choose a safer path while Q-learning tends to choose the optimal path.
Just run Qlearning.py or Sarsa.py. And you can get plotted figure if you modify python file a bit.