Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vision-language models #8435

Open
trawler0 opened this issue May 20, 2024 · 1 comment
Open

Add vision-language models #8435

trawler0 opened this issue May 20, 2024 · 1 comment

Comments

@trawler0
Copy link

馃殌 The feature

Add support for vision-language models like CLIP or LIT.

Motivation, pitch

Dear torchvision team,
I am sorry if I missed discussions about this or a specific reason why you have chosen not to implement vision language models. The current trend in compute vision is heavily drifting towards vision language models like CLIP.
It might be a consideration to add support for at least some of these models.

Alternatives

No response

Additional context

No response

@NicolasHug
Copy link
Member

Hi @trawler0 , thanks for the feature request. We certainly acknowledge the prevalence of vision-language models, but at this time we're not prioritizing the addition of new models in torchvision and instead focus on the lower parts of the stack like preproc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants