You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for vision-language models like CLIP or LIT.
Motivation, pitch
Dear torchvision team,
I am sorry if I missed discussions about this or a specific reason why you have chosen not to implement vision language models. The current trend in compute vision is heavily drifting towards vision language models like CLIP.
It might be a consideration to add support for at least some of these models.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Hi @trawler0 , thanks for the feature request. We certainly acknowledge the prevalence of vision-language models, but at this time we're not prioritizing the addition of new models in torchvision and instead focus on the lower parts of the stack like preproc.
馃殌 The feature
Add support for vision-language models like CLIP or LIT.
Motivation, pitch
Dear torchvision team,
I am sorry if I missed discussions about this or a specific reason why you have chosen not to implement vision language models. The current trend in compute vision is heavily drifting towards vision language models like CLIP.
It might be a consideration to add support for at least some of these models.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: