Skip to content

[CVPR 2023] MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

License

Notifications You must be signed in to change notification settings

Meta-Portrait/MetaPortrait

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaPortrait

Teaser

This repo is the official implementation of "MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation" (CVPR 2023).

By Bowen Zhang*, Chenyang Qi*, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang and Fang Wen.

Paper | Project Page | Code

Abstract

In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects. First, as opposed to interpolating from sparse flow, we claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields. Second, inspired by face-swapping methods, we adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait. Although the proposed model surpasses prior generation fidelity on established benchmarks, to further make the talking head generation qualified for real usage, personalized fine-tuning is usually needed. However, this process is rather computationally demanding that is unaffordable to standard users. To solve this, we propose a fast adaptation model using a meta-learning approach. The learned model can be adapted to a high-quality personalized model as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement module is proposed to improve the fine details while ensuring temporal coherency. Extensive experiments prove the significant superiority of our approach over the state of the arts in both one-shot and personalized settings.

Todo

  • Release the inference code of base model and temporal super-resolution model
  • Release the training code of base model
  • Release the training code of super-resolution model

Setup Environment

git clone https://github.com/Meta-Portrait/MetaPortrait.git
cd MetaPortrait
conda env create -f environment.yml
conda activate meta_portrait_base

# if you use GPU that only support cuda11, you may reinstall the torch build with cu11
# pip uninstall torch
# pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

Base Model

Inference Base Model

Download the checkpoint of base model and put it to base_model/checkpoint. We provide preprocessed example data for inference, you could download the data, unzip and put it to data. The directory structure should like this:

data
├── 0
│   ├── imgs
│   │   ├── 00000000.png
│   │   ├── ...
│   ├── ldmks
│   │   ├── 00000000_ldmk.npy
│   │   ├── ...
│   └── thetas
│       ├── 00000000_theta.npy
│       ├── ...
├── src_0_id.npy
├── src_0_ldmk.npy
├── src_0.png
├── src_0_theta.npy
└── src_map_dict.pkl

You could generate results of self reconstruction on 256x256 resolution by running:

cd base_model
python inference.py --save_dir result --config config/meta_portrait_256_eval.yaml --ckpt checkpoint/ckpt_base.pth.tar

Train Base Model from Scratch

Train the warping network first using the following command:

cd base_model
python main.py --config config/meta_portrait_256_pretrain_warp.yaml --fp16 --stage Warp --task Pretrain

Then, modify the path of warp_ckpt in config/meta_portrait_256_pretrain_full.yaml and joint train the warping network and refinement network using the following command:

python main.py --config config/meta_portrait_256_pretrain_full.yaml --fp16 --stage Full --task Pretrain

Meta Training for Faster Personalization of Base Model

You could start from the standard pretrained checkpoint and further optimize the personalized adaptation speed of the model by utilizing meta-learning using the following command:

python main.py --config config/meta_portrait_256_pretrain_meta_train.yaml --fp16 --stage Full --task Meta --remove_sn --ckpt /path/to/standard_pretrain_ckpt

Temporal Super-resolution Model

Set the root path to sr_model

Data and checkpoint

Download the checkpoint using the bash command

cd sr_model
bash download_sr.sh

Unzip the package and keep the file structure like

pretrained_ckpt
├── temporal_gfpgan.pth
├── GFPGANv1.3.pth
...
data
├── HDTF_warprefine
│   ├── gt
│   ├── lq
│   ├── ...
Basicsr
Experimental_root
options

Installation Bash command

# Install a modified basicsr - https://github.com/xinntao/BasicSR

cd Basicsr
pip install -r requirements.txt
python setup.py develop

# Install facexlib - https://github.com/xinntao/facexlib
# We use face detection and face restoration helper in the facexlib package
pip install facexlib
cd ..
pip install -r requirements.txt
# python setup.py develop

Quick Inference

ckpt for inference: pretrained_ckpt/temporal_gfpgan.pth

Enhance the result from our base model without calculating the metrics:

python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 Experimental_root/test.py -opt options/test/same_id_demo.yml --launcher pytorch

You may adjust the nproc_per_node to the number of GPUs on your own machine. Finally, check the result at results/temporal_gfpgan_same_id_temporal_super_resolution.

Demo training

In the paper result, we train on the training split of hdtf dataset. Here we first provide a demo training code to train on the small demo dataset

CUDA_VISIBLE_DEVICES=1 python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 Experimental_root/train.py -opt options/train/train_sr_hdtf.yml --launcher pytorch

The intermediate result can be check at /home/cqiaa/talkinghead/MetaPortrait/sr_model/experiments/train_sr_hdtf/visualization/00000001/hdtf_random

Citing MetaPortrait

@misc{zhang2022metaportrait,
      title={MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation}, 
      author={Bowen Zhang and Chenyang Qi and Pan Zhang and Bo Zhang and HsiangTao Wu and Dong Chen and Qifeng Chen and Yong Wang and Fang Wen},
      year={2022},
      eprint={2212.08062},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

This code borrows heavily from FOMM, thanks the authors for sharing their code and models.

Maintenance

This is the codebase for our research work. Please open a GitHub issue for any help. If you have any questions regarding the technical details, feel free to contact [email protected] or [email protected].