Ground vision project In this project an innovative and promising architecture is used to perform a ground vision task on the RefCOCOg dataset, with CLIP as backbone.