Consolidation of work for the RL concept learning project targeted for AAAI '23. This work is inspired by the "Simulating Early Word Learning in Situated Connectionist Agents" paper by Deepmind: https://cognitivesciencesociety.org/cogsci20/papers/0155/0155.pdf
We use vision-language models to learn various spatial relationships in a simulated RL environment. The environment is generated using Unity 3D and writing of several C# scripts. The Unity ML-Agents package is used to interface between the models and the agent in the Unity environment.