GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates

Project Page | Paper

World-Grounded Human Motion Recovery via Gravity-View Coordinates
Zehong Shen^*, Huaijin Pi^*, Yan Xia, Zhi Cen, Sida Peng^†, Zechen Hu, Hujun Bao, Ruizhen Hu, Xiaowei Zhou
SIGGRAPH Asia 2024

Setup

Please see installation for details.

Quick Start

Demo

Demo entries are provided in tools/demo. Use -s to skip visual odometry if you know the camera is static, otherwise the camera will be estimated by DPVO. We also provide a script demo_folder.py to inference a entire folder.

python tools/demo/demo.py --video=docs/example_video/tennis.mp4 -s
python tools/demo/demo_folder.py -f inputs/demo/folder_in -d outputs/demo/folder_out -s
python -m tools.demo.demo_multiperson --video=docs/example_video/two_persons.mp4 --output_root outputs/demo_mp --recreate_video

TODO:

Make the rendering videos the same fps as the input video.
Check pp_static_joint_cam in ./hmr4d/model/gvhmr/utils/postprocess.py, which might be used for the -s option in the demo script.

Reproduce

Test: To reproduce the 3DPW, RICH, and EMDB results in a single run, use the following command:
```
python tools/train.py global/task=gvhmr/test_3dpw_emdb_rich exp=gvhmr/mixed/mixed ckpt_path=inputs/checkpoints/gvhmr/gvhmr_siga24_release.ckpt
```
To test individual datasets, change global/task to gvhmr/test_3dpw, gvhmr/test_rich, or gvhmr/test_emdb.
Train: To train the model, use the following command:
```
# The gvhmr_siga24_release.ckpt is trained with 2x4090 for 420 epochs, note that different GPU settings may lead to different results.
python tools/train.py exp=gvhmr/mixed/mixed
```
During training, note that we do not employ post-processing as in the test script, so the global metrics results will differ (but should still be good for comparison with baseline methods).

Here's a draft for the "Different from the original repo" section in the README:

Different from the original repo

This version of the repository includes modifications to support multi-person HMR:

Multi-person tracking:
- Updated the Tracker class to return bounding boxes for multiple people using get_all_tracks instead of get_one_track.
- Modified preprocessing to handle multiple person detections and features.
Multi-person pose estimation:
- Adapted the VitPoseExtractor to process multiple people simultaneously.
- Updated the feature extraction process to handle batches of multiple people.
Multi-person SMPL reconstruction:
- Modified the DemoPL class to predict SMPL parameters for multiple people.
- Updated the rendering process to handle multiple SMPL models in both in-camera and global coordinate systems.
Rendering improvements:
- Implemented merged faces creation for rendering multiple SMPL models simultaneously.
- Added support for retargeting global translations to better align with in-camera positions.
New demo script:
- Added demo_multiperson.py to showcase the multi-person reconstruction pipeline.
- Includes options for batch processing and verbose output for debugging.
Performance optimizations:
- Introduced batch processing for VitPose and feature extraction to improve efficiency.

Here's a draft for the "Results format" section in the README:

Results format

Preprocessing results

/preprocess/bbx.pt:
- Contains bounding box information for multiple people
- bbx_xyxy: Tensor of shape (P, L, 4), where P is the number of people and L is the number of frames
- bbx_xys: Tensor of shape (P, L, 3), containing center coordinates and scale for each bounding box
/preprocess/slam_results.pt:
- Camera pose estimation results (if not using static camera)
- NumPy array of shape (L, 7), where each row contains [x, y, z, qx, qy, qz, qw]
/preprocess/vitpose.pt:
- 2D pose estimation results
- Tensor of shape (P, L, 17, 3), where 17 is the number of keypoints and 3 represents [x, y, confidence]
/preprocess/vit_features.pt:
- Image features extracted from the video frames
- Tensor of shape (P, L, 1024), where 1024 is the feature dimension

GVHMR reconstruction results

The main reconstruction results are stored in hmr4d_results.pt, which contains the following keys:

smpl_params_global and smpl_params_incam:
- SMPL parameters for global and in-camera coordinate systems
- Each contains:
  - body_pose: Tensor of shape (P, L, 63)
  - betas: Tensor of shape (P, L, 10)
  - global_orient: Tensor of shape (P, L, 3)
  - transl: Tensor of shape (P, L, 3)
K_fullimg:
- Camera intrinsic matrix
- Tensor of shape (L, 3, 3), same across all frames
net_outputs:
- Additional network outputs (not used for now)

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{shen2024gvhmr,
  title={World-Grounded Human Motion Recovery via Gravity-View Coordinates},
  author={Shen, Zehong and Pi, Huaijin and Xia, Yan and Cen, Zhi and Peng, Sida and Hu, Zechen and Bao, Hujun and Hu, Ruizhen and Zhou, Xiaowei},
  booktitle={SIGGRAPH Asia Conference Proceedings},
  year={2024}
}

Acknowledgement

We thank the authors of WHAM, 4D-Humans, and ViTPose-Pytorch for their great works, without which our project/code would not be possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates

Project Page | Paper

Setup

Quick Start

Google Colab demo for GVHMR

HuggingFace demo for GVHMR

Demo

TODO:

Reproduce

Different from the original repo

Results format

Preprocessing results

GVHMR reconstruction results

Citation

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates

Project Page | Paper

Setup

Quick Start

Google Colab demo for GVHMR

HuggingFace demo for GVHMR

Demo

TODO:

Reproduce

Different from the original repo

Results format

Preprocessing results

GVHMR reconstruction results

Citation

Acknowledgement