Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework

Junkun Jiang, and Jie Chen*, Hong Kong Baptist University

* Corresponding author

Paper | Project Page | BU-MCV lab | HKBU-VSComputing

How to deploy

Dependencies

The code is tested on Windows with

pytorch                   1.10.2
torchvision               0.11.3
CUDA                      11.3.1

We suggest using the virtual environment and an easy-to-use package/environment manager such as conda to maintain the project.

conda create -n icassp python=3.6
conda activate icassp
# install pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# install the rest of the dependencies
pip install -r requirements.txt

Dataset

Following DiffPose, we utilize the GMM-fitted pose data as input during training and testing. Please use this link provided by DiffPose to download the data. Please put those npz files into the ./data directory.

Here are explanations of the input data:

./data/data_2d_h36m_cpn_ft_h36m_dbb_gmm.npz  # 2D estimated poses sampled from a GMM
./data/data_2d_h36m_gt_gmm.npz               # 2D ground-truth poses sampled from a GMM
./data/data_3d_h36m.npz                      # 3D ground-truth poses

Prepare 2D-to-3D lifter

The pretrained 2D-to-3D lifting model can be downloaded from the following table. All weights come from DiffPose.

Name	Description	URL
gcn_xyz_cpn.pth	Trained on 2D estimated input	link
gcn_xyz_gt.pth	Trained on 2D gt input	link

Please put them in the folder ckpts.

Prepare 2D normalized poses

To speed up the 2D sampling process, we prepare a simple script to normalize the sampled 2D poses to the UV space in advance. Please run the following command.

python prepare_2d_poses.py

Training

To train a diffusion model from scratch, simply paste the following command to your console, after the icassp environment has been activated.

python train.py \
--config cfgs/cfg_cpn.yml \  # config for 2D estimated pose input
--exp exp \                  # experiment root path
--doc human36m_cpn           # the name of the folder for storing weights, config.yml, log, etc.

python train.py \
--config cfgs/cfg_cpn.yml \  # config for 2D ground-truth pose input
--exp exp \                  # experiment root path
--doc human36m_gt            # the name of the folder for storing weights, config.yml, log, etc.

Evaluation

The pretrained diffusion model can be downloaded from the following table.

Name	Description	URL
ckpt_cpn.pth	Trained on 2D estimated input	link
ckpt_gt.pth	Trained on 2D gt input	link

Similarly, please put them in the folder ckpts and run the following command.

python eval.py \
--config cfgs/cfg_cpn.yml \  # config for 2D estimated pose input
--exp exp \                  # experiment root path
--doc human36m_cpn           # the name of the folder for storing weights, config.yml, log, etc.

python eval.py \
--config cfgs/cfg_gt.yml \   # config for 2D ground-truth pose input
--exp exp \                  # experiment root path
--doc human36m_gt            # the name of the folder for storing weights, config.yml, log, etc.

The results will be displayed in the console like:

===Action=== ==p#1 mm== =p#2 mm=
Directions    43.33      34.59
...
Average       49.40      39.05

Bibtex

If you use our code/models in your research, please cite our paper 🙌 :

@inproceedings{jiang2024diff,
  title={Exploring Latent Cross-Channel Embedding for Accurate 3d Human Pose Reconstruction in a Diffusion Framework},
  author={Jiang, Junkun and Chen, Jie},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7870-7874},
  doi={10.1109/ICASSP48485.2024.10448487},
  year={2024}
}

Acknowledgement

Many thanks to the following open-source repositories for their help in developing our project.

The diffusion learning-based monocular 3D pose estimation DiffPose. We thank them for their great work ❤️. The main structure is built on it.
The GCN backbone Graformer.
The evaluation code from VideoPose3D.
The diffusion pipeline from DDIM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework

How to deploy

Dependencies

Dataset

Prepare 2D-to-3D lifter

Prepare 2D normalized poses

Training

Evaluation

Bibtex

Acknowledgement

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cfgs		cfgs
ckpts		ckpts
data		data
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
eval.py		eval.py
prepare_2d_poses.py		prepare_2d_poses.py
requirements.txt		requirements.txt
train.py		train.py

License

jjkislele/monoMotionDiff

Folders and files

Latest commit

History

Repository files navigation

Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework

How to deploy

Dependencies

Dataset

Prepare 2D-to-3D lifter

Prepare 2D normalized poses

Training

Evaluation

Bibtex

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages