The implementation of PPO algorithm based on the Unity3d environment.
We use ml-agents to connect Unity game environment with learning algorithm based on python runtime. In the training and inference stage, we use TensorFlow to build our neural network.