MAViC: Multimodal Active Learning for Video Captioning

This is the offical codebase for the work "MAViC: Multimodal Active Learning for Video Captioning" (arxiv link to be added)

We base our code on SwinBERT (https://github.com/microsoft/SwinBERT). Refer the details mentioned there for more instructions on how to download datasets, and train models.

Through this work we explore active learning for video captioning, and have introduced a novel method named MAViC, which utilises our proposed Semantically Aware Sequential Entropy (SASE) acquisition function to discourage querying less-informative samples which exhibit high entropy due to semantically similar captions. We also extended our approach to capture the model uncertainty in the visual dimension by feature perturbation (M-SASE-FP) and model perturbation (M-SASE-MP) and propose multimodal extension of SASE termed as M-SASE in our study.

To Run MAViC

First, choose the approach from (SE, M-SASE, M-SASE-MP, M-SASE-FP)

cd MAViC/{approach}/src/tasks/
Run run_caption_VidSwinBert.py using default parameters, using 5% data
Run run_caption_VidSwinBert_inference.py using the best checkpiont to get top_selected_samples.pkl, i.e. indices frome the unlabelled set.
Add the above indices with the previous train indices and run step 1

References

[1] SwinBERT, https://github.com/microsoft/SwinBERT

[2] VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation, https://github.com/VALUE-Leaderboard/EvaluationTools

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
M-SASE-FP		M-SASE-FP
M-SASE-MP		M-SASE-MP
M-SASE		M-SASE
SE		SE
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAViC: Multimodal Active Learning for Video Captioning

To Run MAViC

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

xavierthomas22/MAViC

Folders and files

Latest commit

History

Repository files navigation

MAViC: Multimodal Active Learning for Video Captioning

To Run MAViC

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages