- Create vectorized agent which takes a vectoredized env and sends it to the agent one at a time. This allows us to still reap the benefit of having all environments simulate at the same time, although it does not vectorize the learning algorithm.
- Get PPO working with n envs
- Get comp agent working