Asking for a simple script to get text and video features #24

yotammarton · 2023-06-19T21:14:05Z

First of all - Amazing work on this one.

I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:

model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)

[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths
The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script.

Thank you :)

The text was updated successfully, but these errors were encountered:

jingli18 · 2023-06-21T11:00:06Z

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data?
Thanks a lot

HellwayXue · 2023-07-03T09:17:18Z

First of all - Amazing work on this one.

I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:
model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)
[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script.

Thank you :)

Hi, we are intergrating CLIP-ViP into Huggingface transformers. I believe it will be more easily called. Please keep an eye on it.

HellwayXue · 2023-07-03T09:24:39Z

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 . For auxiliary captions, please download from this link: Azure Blob Link

jingli18 · 2023-07-03T11:08:42Z

Thanks a lot！

…

On Mon, Jul 3, 2023 at 5:24 PM HellwayXue ***@***.***> wrote: Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot Hi, for ASR texts, please refer to #7 <#7> . For auxiliary captions, please download from this link: Azure Blob Link <https://hdvila.blob.core.windows.net/dataset/hdvila_ofa_captions_db.zip?sp=r&st=2023-03-16T04:58:26Z&se=2026-03-01T12:58:26Z&spr=https&sv=2021-12-02&sr=b&sig=EYE%2Bj11VWfQ6G5dZ8CKlOOpL3ckmmNqpAtUgBy3OGDM%3D> — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKIWUVABMN6IXKTE5DBSOT3XOKFWFANCNFSM6AAAAAAZMNCGAM> . You are receiving this because you commented.Message ID: ***@***.***>

Spark001 · 2023-08-10T08:34:30Z

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 . For auxiliary captions, please download from this link: Azure Blob Link

@HellwayXue Thanks for providing the auxiliary captions.
But how to open the data.mdb files ? I tried Access and VisualStudio but they did not work...

MVPavan · 2023-09-27T10:41:42Z

First of all - Amazing work on this one.
I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:
model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)
[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script.
Thank you :)
Hi, we are intergrating CLIP-ViP into Huggingface transformers. I believe it will be more easily called. Please keep an eye on it.

Hi @HellwayXue, any update on integration with HuggingFace? Thank you:)

eisneim · 2023-11-16T02:24:47Z

@MVPavan @yotammarton i'v created a simple example here: https://github.com/eisneim/clip-vip_video_search

someshfengde · 2024-01-16T13:59:05Z

Hi @MVPavan can you please suggest what configuration of GPUs are required to run this model.
( just for making inference on it )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asking for a simple script to get text and video features #24

Asking for a simple script to get text and video features #24

yotammarton commented Jun 19, 2023

jingli18 commented Jun 21, 2023

HellwayXue commented Jul 3, 2023

HellwayXue commented Jul 3, 2023

jingli18 commented Jul 3, 2023 via email

Spark001 commented Aug 10, 2023 •

edited

Loading

MVPavan commented Sep 27, 2023

eisneim commented Nov 16, 2023

someshfengde commented Jan 16, 2024

Asking for a simple script to get text and video features #24

Asking for a simple script to get text and video features #24

Comments

yotammarton commented Jun 19, 2023

jingli18 commented Jun 21, 2023

HellwayXue commented Jul 3, 2023

HellwayXue commented Jul 3, 2023

jingli18 commented Jul 3, 2023 via email

Spark001 commented Aug 10, 2023 • edited Loading

MVPavan commented Sep 27, 2023

eisneim commented Nov 16, 2023

someshfengde commented Jan 16, 2024

Spark001 commented Aug 10, 2023 •

edited

Loading