Skip to content

Latest commit

 

History

History
71 lines (54 loc) · 2.49 KB

README.md

File metadata and controls

71 lines (54 loc) · 2.49 KB

Attributes and SVOs for Video Captioning

This implementation is based on "Syntax-Aware Action Targeting for Video Captioning" (code) which is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

  • Python 3.6
  • PyTorch 1.1
  • CUDA 10.0

This repo includes an edited version (coco-caption) of the Python 3 coco evaluation protocols (edited to load CIDEr corpus)

Data

The datasets and their features can be downloaded from my Google Drive along with pre-trained models resultant from the experiments:

Experiments

View my experiments and results

Train

To train on MSVD:

python train.py --dataset msvd 
                --captioner_type lstm 
                --model_id lstm_1 
                --batch_size 8 
                --test_batch_size 8 
                --max_epochs 100

To train on MSR-VTT:

python train.py --dataset msrvtt 
                --captioner_type lstm 
                --model_id lstm_1  
                --batch_size 8 
                --test_batch_size 4 
                --max_epochs 200

Test / Evaluate

Testing occurs automatically at the end of training, if you would like to run separately use evaluate.py To evaluate on MSVD:

python evaluate.py --dataset msvd 
                   --captioner_type lstm 
                   --model_id lstm_1 
                   --test_batch_size 8 

To evaluate on MSR-VTT:

python evaluate.py --dataset msrvtt 
                   --captioner_type lstm 
                   --model_id lstm_1  
                   --test_batch_size 4 

Acknowledgements

  • PyTorch implementation of SAAT
  • PyTorch implementation of CST
  • PyTorch implementation of SCST