The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
-
Updated
Jul 11, 2024
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Text from the video is extracted and saved into a .docx file in the form of notes.
GSoC - TV show segmentation
Add a description, image, and links to the video-text-recognition topic page so that developers can more easily learn about it.
To associate your repository with the video-text-recognition topic, visit your repo's landing page and select "manage topics."