Skip to content

Latest commit

 

History

History
135 lines (115 loc) · 5.47 KB

README.md

File metadata and controls

135 lines (115 loc) · 5.47 KB

Universal Video Style Transfer via Crystallization, Separation, and Blending (CSBNet)

This is the official PyTorch implementation for Universal Video Style Transfer via Crystallization, Separation, and Blending

Abstract

Universal video style transfer aims to migrate arbitrary styles to input videos. However, how to maintain the temporal consistency of videos while achieving high-quality arbitrary style transfer is still a hard nut to crack. To resolve this dilemma, in this paper, we propose the CSBNet which involves three key modules: 1) the Crystallization (Cr) Module that generates several orthogonal crystal nuclei, representing hierarchical stability-aware content and style components, from raw VGG features; 2) the Separation (Sp) Module that separates these crystal nuclei to generate the stability-enhanced content and style features; 3) the Blending (Bd) Module to cross-blend these stability-enhanced content and style features, producing more stable and higher-quality stylized videos. Moreover, we also introduce a new pair of component enhancement losses to improve network performance. Extensive qualitative and quantitative experiments are conducted to demonstrate the effectiveness and superiority of our CSBNet. Compared with the state-of-the-art models, it not only produces temporally more consistent and stable results for arbitrary videos but also achieves higher-quality stylizations for arbitrary images.

[Paper] [Code]

Framework

image

Results

(Try it yourself in Testing!)

Preparation

Requirements

  • Python >=3.6.

  • PyTorch >=1.7

  • torchvision

  • opencv-python

  • imageio-ffmpeg

  • scipy

  • numpy

Download the Pretrained Models

  • Download the pretrained encoder: VGGNet Google-Drive

  • (Optional) Download the pretrained model: CSBNet (KC=4, KS=-10) Google-Drive

You can put these two files in the folder "models". An example directory hierarchy is:

CSBNet
|--- models
      |--- {The pretrained model <VGGNet>.pth}
      |--- {The pretrained model <CSBNet>.pth}
      

Testing

Image Test

Prepare two folders for content images (N) and style images (M). You'll get N*M stylized images.

python test_image.py \
--content_dir <The path to a single image or a directory> \
--style_dir <The path to a single image or a directory> \
--KC 4 --KS -10 \
--output_dir <The path of the output directory> \
--vgg_path <The path of the pretained vgg-net model> \
--csbnet_path <The path of the csbnet pretrained model> \

You can also use the default configration by using the command below:

python test_image.py \
--content_dir <The path to a single image or a directory> \
--style_dir <The path to a single image or a directory> \

Video Test

python test_video.py \
--content_dir <The path to a single video or a directory> \
--style_dir <The path to a single image or a directory> \
--KC 4 --KS -10 \
--output_dir <The path of the output directory> \
--vgg_path <The path of the pretained vgg-net model> \
--csbnet_path <The path of the csbnet pretrained model> \

Similarly, you can also use the default configration by using the command below:

python test_video.py \
--content_dir <The path to a single video or a directory> \
--style_dir <The path to a single image or a directory> \

Training

python train.py \
--content_dir <The path to the content dataset> \
--style_dir <The path to the style dataset> \

There are other options for custom training, the meanings of these options are as follows:

--KC <The value of KC>
--KS <The value of KS>
--lambda_<XXX> <Loss weight>
--vgg_path <The path of the pretrained vggnet>
--lr <Learning rate>
--lr_decay <Learning rate decay>
--max_iter <The number of iteration>
--batch_size <The batch size>
--gpu_num <We provided muilt-gpu training, you only need to specify the number of gpu numbers used for training>

For example, if you want to train on four GPUs (ids=0,1,3,5), and the sum of batch-sizes are 8 you can use the command below:

CUDA_VISIBLE_DEVICES=0,1,3,5 python train.py \
--gpu_num 4 \
--batch_size 8 \
--content_dir <The path to the content dataset> \
--style_dir <The path to the style dataset> \
--<other training options>

Citation

@inproceedings{lu2022universal,
  title={Universal Video Style Transfer via Crystallization, Separation, and Blending},
  author={Lu, Haofei and Wang, Zhizhong},
  booktitle={Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI)},
  pages={4957--4965},
  year={2022}
}

Acknowledgments

The code in this repository is based on MCCNet. Thanks for both their paper and code.