Add vision-language models #8435

trawler0 · 2024-05-20T19:08:14Z

🚀 The feature

Add support for vision-language models like CLIP or LIT.

Motivation, pitch

Dear torchvision team,
I am sorry if I missed discussions about this or a specific reason why you have chosen not to implement vision language models. The current trend in compute vision is heavily drifting towards vision language models like CLIP.
It might be a consideration to add support for at least some of these models.

Alternatives

No response

Additional context

No response

NicolasHug · 2024-05-24T12:26:42Z

Hi @trawler0 , thanks for the feature request. We certainly acknowledge the prevalence of vision-language models, but at this time we're not prioritizing the addition of new models in torchvision and instead focus on the lower parts of the stack like preproc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vision-language models #8435

Add vision-language models #8435

trawler0 commented May 20, 2024

NicolasHug commented May 24, 2024

Add vision-language models #8435

Add vision-language models #8435

Comments

trawler0 commented May 20, 2024

🚀 The feature

Motivation, pitch

Alternatives

Additional context

NicolasHug commented May 24, 2024