Realistic-Neural-Talking-Head-Models

Implementation of Few-Shot Adversarial Learning of Realistic Neural Talking Head Models (Egor Zakharov et al.). https://arxiv.org/abs/1905.08233

This repo is based on https://github.com/vincent-thevenin/Realistic-Neural-Talking-Head-Models

My changes to the original repo

Download caffe-trained version of VGG19 converted to pytorch .

As there are some layer names mismatching in the converted model,

change VGG19_caffe_weight_path in params.py to your path and run

python change_vgg19_caffelayer_name.py

Main code changes in loss_generator.py:

self.vgg19_caffe_RGB_mean = torch.FloatTensor([123.68, 116.779, 103.939]).view(1, 3, 1, 1).to(device) # RGB order
self.vggface_caffe_RGB_mean = torch.FloatTensor([129.1863,104.7624,93.5940]).view(1, 3, 1, 1).to(device) # RGB order

x_vgg19 = x * 255  - self.vgg19_caffe_RGB_mean
x_vgg19 = x_vgg19[:,[2,1,0],:,:]
x_hat_vgg19 = x_hat * 255 - self.vgg19_caffe_RGB_mean
x_hat_vgg19 = x_hat_vgg19[:,[2,1,0],:,:]
x_vggface = x * 255 - self.vggface_caffe_RGB_mean
x_vggface = x_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W
x_hat_vggface = x_hat * 255 - self.vggface_caffe_RGB_mean
x_hat_vggface = x_hat_vggface[:,[2,1,0],:,:] # B RGB H W -> B BGR H W

Explanations

The vgg19 and vggface loss mentioned in the paper are caffe trained version, the input should be in range 0-255, and in BGR order.

However, in the original repo, vgg19 and vggface takes images in RGB order with value 0-1, but the losses weights the same as paper, i.e. vgg19_weight=1.5e-1, vggface_weight=2.5e-2, which cause these two losses to be very small compared to other loss terms.

So either change the weight of the losses, or change the pretrained model to caffe pretrained version to balance the losses.

For me, I download the caffe version of vgg19 from https://github.com/jcjohnson/pytorch-vgg, and use 0-255, BGR order to calculate vgg loss.

Results

The following results are generated from the same person (id_08696) with different driving videos.

Click the images to view video results on Youtube

1. Feed forward without finetuning

2. Fine tuning for 100 epochs

As we can see, identity gap exists in feed forward results, but can be briged by finetuning.

3. More results:

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.github		.github
dataset		dataset
examples		examples
loss		loss
network		network
params		params
webcam_demo		webcam_demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
change_vgg19_caffelayer_name.py		change_vgg19_caffelayer_name.py
embedder_inference.py		embedder_inference.py
finetuning_training.py		finetuning_training.py
init_Wi.py		init_Wi.py
requirements.txt		requirements.txt
train.py		train.py
video_inference.py		video_inference.py
webcam_inference.py		webcam_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realistic-Neural-Talking-Head-Models

My changes to the original repo

Explanations

Results

About

Releases

Packages

Languages

License

Jarvisss/Realistic-Neural-Talking-Head-Models

Folders and files

Latest commit

History

Repository files navigation

Realistic-Neural-Talking-Head-Models

My changes to the original repo

Explanations

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages