Skip to content

Latest commit

 

History

History
69 lines (57 loc) · 3.26 KB

README.md

File metadata and controls

69 lines (57 loc) · 3.26 KB

Compositional Video Synthesis with Action Graphs (ICML 2021)

ag2vid

Back to Project Page.

Release

  • CATER training code and eval - DONE
  • Something-Something V2 training code and eval- TODO
  • Pretrained models - TODO

Installation

We recommend you to use Anaconda to create a conda environment:

conda create -n ag2vid python=3.7 pip

Then, activate the environment:

conda activate ag2vid

Installation:

conda install pytorch==1.4.0 torchvision==0.5.0 -c pytorch
pip install -r requirements.txt

Data

CATER

Download and extract CATER data:

cd <project_root>/data/CATER/max2action
wget https://cmu.box.com/shared/static/jgbch9enrcfvxtwkrqsdbitwvuwnopl0.zip && unzip jgbch9enrcfvxtwkrqsdbitwvuwnopl0.zip
wget https://cmu.box.com/shared/static/922x4qs3feynstjj42muecrlch1o7pmv.zip && unzip 922x4qs3feynstjj42muecrlch1o7pmv.zip
wget https://cmu.box.com/shared/static/7svgta3kqat1jhe9kp0zuptt3vrvarzw.zip && unzip 7svgta3kqat1jhe9kp0zuptt3vrvarzw.zip

Training

CATER

python -m scripts.train --checkpoint_every=5000 --batch_size=2 --dataset=cater --frames_per_action=4 --run_name=train_cater --image_size=256,256 --include_dummies=1 --gpu_ids=0

Note: on the first training epoch, images will be cached in the CATER dataset folder. The training should take around a week on a single V100 GPU. If you have smaller GPUs you can try to reduce batch size and image resolution (e.g, use 128,128).

Eval

A model with example validation outputs is saved every 5k iteration in the <code_root>/output/timestamp_<run_name> folder.

To run a specific checkpoint and test it:

python -m scripts.test --checkpoint <path/to/checkpoint.pt> --output_dir <save_dir> --save_actions 1

Note: this script assumes the parent directory of the checkpoint file contains the run_args.json file which includes some training configuration like dataset, etc.

Citation

@article{bar2020compositional,
  title={Compositional video synthesis with action graphs},
  author={Bar, Amir and Herzig, Roei and Wang, Xiaolong and Chechik, Gal and Darrell, Trevor and Globerson, Amir},
  journal={arXiv preprint arXiv:2006.15327},
  year={2020}
}

Related Works

If you liked this work, here are few other related works you might be intereted in: Compositional Video Prediction (ICCV 2019), HOI-GAN (ECCV 2020), Semantic video prediction (preprint).

Acknowlegments

Our work relies on other works like SPADE, Vid2Vid, sg2im, and CanonicalSg2IM.