This project was conducted in the context of the CS-456 lecture, given by Professor Gerstner Wulfram at EPFL.
It uses the library Gym by OpenAI to model the environment and was the opportunity to implement several variation of the REINFORCE Policy Gradient algorithm.