This implementation is based on "Syntax-Aware Action Targeting for Video Captioning" (code) which is based on "Consensus-based Sequence Training for Video Captioning".
- Python 3.6
- PyTorch 1.1
- CUDA 10.0
This repo includes an edited version (coco-caption
) of the Python 3 coco evaluation protocols (edited to load CIDEr corpus)
The datasets and their features can be downloaded from my Google Drive along with pre-trained models resultant from the experiments:
View my experiments and results
To train on MSVD:
python train.py --dataset msvd
--captioner_type lstm
--model_id lstm_1
--batch_size 8
--test_batch_size 8
--max_epochs 100
To train on MSR-VTT:
python train.py --dataset msrvtt
--captioner_type lstm
--model_id lstm_1
--batch_size 8
--test_batch_size 4
--max_epochs 200
Testing occurs automatically at the end of training, if you would like to run separately use evaluate.py
To evaluate on MSVD:
python evaluate.py --dataset msvd
--captioner_type lstm
--model_id lstm_1
--test_batch_size 8
To evaluate on MSR-VTT:
python evaluate.py --dataset msrvtt
--captioner_type lstm
--model_id lstm_1
--test_batch_size 4