Skip to content

[ICASSP'24 Oral] Official code for "Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework"


Notifications You must be signed in to change notification settings


Repository files navigation

Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework

Junkun Jiang, and Jie Chen*, Hong Kong Baptist University

* Corresponding author

Paper | Project Page | BU-MCV lab | HKBU-VSComputing

How to deploy


The code is tested on Windows with

pytorch                   1.10.2
torchvision               0.11.3
CUDA                      11.3.1

We suggest using the virtual environment and an easy-to-use package/environment manager such as conda to maintain the project.

conda create -n icassp python=3.6
conda activate icassp
# install pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# install the rest of the dependencies
pip install -r requirements.txt


Following DiffPose, we utilize the GMM-fitted pose data as input during training and testing. Please use this link provided by DiffPose to download the data. Please put those npz files into the ./data directory.

Here are explanations of the input data:

./data/data_2d_h36m_cpn_ft_h36m_dbb_gmm.npz  # 2D estimated poses sampled from a GMM
./data/data_2d_h36m_gt_gmm.npz               # 2D ground-truth poses sampled from a GMM
./data/data_3d_h36m.npz                      # 3D ground-truth poses

Prepare 2D-to-3D lifter

The pretrained 2D-to-3D lifting model can be downloaded from the following table. All weights come from DiffPose.

Name Description URL
gcn_xyz_cpn.pth Trained on 2D estimated input link
gcn_xyz_gt.pth Trained on 2D gt input link

Please put them in the folder ckpts.

Prepare 2D normalized poses

To speed up the 2D sampling process, we prepare a simple script to normalize the sampled 2D poses to the UV space in advance. Please run the following command.



To train a diffusion model from scratch, simply paste the following command to your console, after the icassp environment has been activated.

python \
--config cfgs/cfg_cpn.yml \  # config for 2D estimated pose input
--exp exp \                  # experiment root path
--doc human36m_cpn           # the name of the folder for storing weights, config.yml, log, etc.
python \
--config cfgs/cfg_cpn.yml \  # config for 2D ground-truth pose input
--exp exp \                  # experiment root path
--doc human36m_gt            # the name of the folder for storing weights, config.yml, log, etc.


The pretrained diffusion model can be downloaded from the following table.

Name Description URL
ckpt_cpn.pth Trained on 2D estimated input link
ckpt_gt.pth Trained on 2D gt input link

Similarly, please put them in the folder ckpts and run the following command.

python \
--config cfgs/cfg_cpn.yml \  # config for 2D estimated pose input
--exp exp \                  # experiment root path
--doc human36m_cpn           # the name of the folder for storing weights, config.yml, log, etc.
python \
--config cfgs/cfg_gt.yml \   # config for 2D ground-truth pose input
--exp exp \                  # experiment root path
--doc human36m_gt            # the name of the folder for storing weights, config.yml, log, etc.

The results will be displayed in the console like:

===Action=== ==p#1 mm== =p#2 mm=
Directions    43.33      34.59
Average       49.40      39.05


If you use our code/models in your research, please cite our paper 🙌 :

  title={Exploring Latent Cross-Channel Embedding for Accurate 3d Human Pose Reconstruction in a Diffusion Framework},
  author={Jiang, Junkun and Chen, Jie},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},


Many thanks to the following open-source repositories for their help in developing our project.

  • The diffusion learning-based monocular 3D pose estimation DiffPose. We thank them for their great work ❤️. The main structure is built on it.
  • The GCN backbone Graformer.
  • The evaluation code from VideoPose3D.
  • The diffusion pipeline from DDIM.


[ICASSP'24 Oral] Official code for "Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework"






