[DeepLearning深度学习]Emotional Talking Head Generation项目

本项目旨在梳理和总结本人在Emotional Talking Head Generation方向的学习和工作

"Emotions shape ourselves, determine who we are, and affect our daily behaviors"

方向概述

个人理解：Talking Head Generation这个方向在于生成真实的人脸图像或者连续真实的人脸视频帧，目前已经有一定的发展；然而，可以精确控制Face Emotion、Lip Movement、Talking Style、Blink、Regional Facial Movement这些人脸细节的方法仍在探索中，因此产生Emotional Talking Head Generation这条道路。目前，主要是以3DMM和GAN作为核心，以Blend、Alignment等等方法作为辅助去探索新的框架和优化约束方法。

Example From NED（CVPR2022）

更新列表-2023.03.26

一、方向脉络梳理

Based on 2D image

GANmut（CVPR2021）
- GANmut
- 摘要：
- 细节：
ICface（WACV2020）
- ICface
- 摘要：
- 细节：

Based on 3DMM

NED（CVPR2022）
- NED
- 摘要：
- 细节：
DSM（ECCV2022）
- DSM
- 摘要：
- 细节：
  - 表情编辑方式：训练一个VA值到3DMM表情参数的映射网络，实现emotion label到VA值再到3DMM表情参数的转换，从而可以利用emotion label来控制情绪的变化
  - 图片生成方式：
    - 三维重建：采用3DMM形式的重建，主要提取face landmark、camera、identity、exp等参数，合成nmfc和eye video图片帧
    - 训练一个person-specific的renderer，便于适应多种情绪风格的图像生成

二、复现工作

DSM

由于本论文作者没有开源代码，本人只能尝试按照论文中描述的细节去尽可能地还原效果，复现代码即将公布

论文链接

本文的整体网络框架

Framework

Person-specific dataset of facial expressions

Valence-arousal values

参考Emonet论文
3D expression coefficients

参考head2head++

Expression decoder network

参考原论文描述

The network consists of 6 fully-connected layers with 4096, 2048, 1024, 512, 128 and 64 units per layer respectively and with Rectified Linear Units (RELU) to introduce non-linearities

模型本质

训练出2D的人脸VA向量到3D表情向量exp50的映射（此处选用了DECA作为人脸重建方法，和原论文中的exp30有不同，但本质思想都是一样的）是一个person-specific的模型映射，需要针对不同object进行训练
网络结构
- 6层的fully connected layers：4096, 2048, 1024, 512, 128 and 64
- 激活函数：relu
- 防止过拟合：dropout
训练细节
- 训练率：e-3
- batchsize：32
- 优化器：Adam
- epoch：1000

Synthesis of photorealistic manipulated videos

Implement

代码实现非官方提供的，是本人自己实现的，如有错误，欢迎指导和交流

Dataset Prepare

Valence-arousal values
```
./scripts/read_va_test_name.sh train
./scripts/emonet_test.sh train
```
可以使用一下代码实现VA space的可视化
```
./scripts/show_va_space.sh
```
以下是训练集中每个ID的VA space可视化结果
3D expression coefficients
```
./scripts/pre_all.sh
```

Train

Expression decoder network
```
./scripts/train_expDecoder.sh train
```
Renderer
```
./scripts/train_renderer.sh
```

Test

emotion label ---> VA value

使用一下代码可以进行emotion label到VA space的随机采样，并且进行B-spline interpolation，从而实现VA pair和时间序列帧的一一对应
```
./scripts/show_va_space.sh
python ./EmoNet/caiyang.py
```
随机采样结果如下：

B-spline interpolation结果如下：
```
此处暂时不做此操作
```
2D VA vector ---> 50D expression vector

使用训练好的person-specific的expression decoder来推理出50D expression vector
```
./scripts/label2exp.sh
```
50D expression vector ---> generated frames

使用renderer将nmfc和eye videos映射为frame images
```
./scripts/postprocess.sh
```

GANmut

本论文提供了训练和测试代码，因此直接使用提供的代码，适用到我们的实验设置中

论文链接

ICface

三、知识点总结

机器学习常见metric

Dataloader

Reference

Thanks to：

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
codes		codes
ideas		ideas
imgs		imgs
notes		notes
papers		papers
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[DeepLearning深度学习]Emotional Talking Head Generation项目

方向概述

更新列表-2023.03.26

一、方向脉络梳理

Based on 2D image

Based on 3DMM

二、复现工作

DSM

Framework

Person-specific dataset of facial expressions

Expression decoder network

Synthesis of photorealistic manipulated videos

Implement

Dataset Prepare

Train

Test

GANmut

ICface

三、知识点总结

Reference

About

Releases

Packages

guohua-zhang/DeepLearning-Emotional-Talking-Head-Generation

Folders and files

Latest commit

History

Repository files navigation

[DeepLearning深度学习]Emotional Talking Head Generation项目

方向概述

更新列表-2023.03.26

一、方向脉络梳理

Based on 2D image

Based on 3DMM

二、复现工作

DSM

Framework

Person-specific dataset of facial expressions

Expression decoder network

Synthesis of photorealistic manipulated videos

Implement

Dataset Prepare

Train

Test

GANmut

ICface

三、知识点总结

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages