Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework
Junkun Jiang, and Jie Chen*, Hong Kong Baptist University
* Corresponding author
Paper | Project Page | BU-MCV lab | HKBU-VSComputing
The code is tested on Windows with
pytorch 1.10.2
torchvision 0.11.3
CUDA 11.3.1
We suggest using the virtual environment and an easy-to-use package/environment manager such as conda to maintain the project.
conda create -n icassp python=3.6
conda activate icassp
# install pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# install the rest of the dependencies
pip install -r requirements.txt
Following DiffPose, we utilize the GMM-fitted pose data as input during training and testing. Please use this link provided by DiffPose to download the data. Please put those npz files into the ./data
directory.
Here are explanations of the input data:
./data/data_2d_h36m_cpn_ft_h36m_dbb_gmm.npz # 2D estimated poses sampled from a GMM
./data/data_2d_h36m_gt_gmm.npz # 2D ground-truth poses sampled from a GMM
./data/data_3d_h36m.npz # 3D ground-truth poses
The pretrained 2D-to-3D lifting model can be downloaded from the following table. All weights come from DiffPose.
Name | Description | URL |
---|---|---|
gcn_xyz_cpn.pth | Trained on 2D estimated input | link |
gcn_xyz_gt.pth | Trained on 2D gt input | link |
Please put them in the folder ckpts
.
To speed up the 2D sampling process, we prepare a simple script to normalize the sampled 2D poses to the UV space in advance. Please run the following command.
python prepare_2d_poses.py
To train a diffusion model from scratch, simply paste the following command to your console, after the icassp
environment has been activated.
python train.py \
--config cfgs/cfg_cpn.yml \ # config for 2D estimated pose input
--exp exp \ # experiment root path
--doc human36m_cpn # the name of the folder for storing weights, config.yml, log, etc.
python train.py \
--config cfgs/cfg_cpn.yml \ # config for 2D ground-truth pose input
--exp exp \ # experiment root path
--doc human36m_gt # the name of the folder for storing weights, config.yml, log, etc.
The pretrained diffusion model can be downloaded from the following table.
Name | Description | URL |
---|---|---|
ckpt_cpn.pth | Trained on 2D estimated input | link |
ckpt_gt.pth | Trained on 2D gt input | link |
Similarly, please put them in the folder ckpts
and run the following command.
python eval.py \
--config cfgs/cfg_cpn.yml \ # config for 2D estimated pose input
--exp exp \ # experiment root path
--doc human36m_cpn # the name of the folder for storing weights, config.yml, log, etc.
python eval.py \
--config cfgs/cfg_gt.yml \ # config for 2D ground-truth pose input
--exp exp \ # experiment root path
--doc human36m_gt # the name of the folder for storing weights, config.yml, log, etc.
The results will be displayed in the console like:
===Action=== ==p#1 mm== =p#2 mm=
Directions 43.33 34.59
...
Average 49.40 39.05
If you use our code/models in your research, please cite our paper 🙌 :
@inproceedings{jiang2024diff,
title={Exploring Latent Cross-Channel Embedding for Accurate 3d Human Pose Reconstruction in a Diffusion Framework},
author={Jiang, Junkun and Chen, Jie},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={7870-7874},
doi={10.1109/ICASSP48485.2024.10448487},
year={2024}
}
Many thanks to the following open-source repositories for their help in developing our project.
- The diffusion learning-based monocular 3D pose estimation DiffPose. We thank them for their great work ❤️. The main structure is built on it.
- The GCN backbone Graformer.
- The evaluation code from VideoPose3D.
- The diffusion pipeline from DDIM.