TODO:

Create vectorized agent which takes a vectoredized env and sends it to the agent one at a time. This allows us to still reap the benefit of having all environments simulate at the same time, although it does not vectorize the learning algorithm.
Get PPO working with n envs
Get comp agent working