This repo provides code and models for GDRNPP_BOP2022, winner (most of the awards) of the BOP Challenge 2022 at ECCV'22 [slides].
[18/03/2025] Our paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)! Paper is available at [arXiv].
Download the 6D pose datasets from the
BOP website and
VOC 2012
for background images.
Please also download the test_bboxes
from
here OneDrive (password: groupji) or BaiDuYunPan(password: vp58).
The structure of datasets
folder should look like below:
datasets/
├── BOP_DATASETS # https://bop.felk.cvut.cz/datasets/
├──tudl
├──lmo
├──ycbv
├──icbin
├──hb
├──itodd
└──tless
└──VOCdevkit
Download the trained models at Onedrive (password: groupji) or BaiDuYunPan(password: 10t3) and put them in the folder ./output
.
- Ubuntu 18.04/20.04, CUDA 10.1/10.2/11.6, python >= 3.7, PyTorch >= 1.9, torchvision
- Install
detectron2
from source sh scripts/install_deps.sh
- Compile the cpp extensions for
-
farthest points sampling (fps)
-
flow
-
uncertainty pnp
-
ransac_voting
-
chamfer distance
-
egl renderer
sh ./scripts/compile_all.sh
We adopt yolox as the detection method. We used stronger data augmentation and ranger optimizer.
Download the pretrained model at Onedrive (password: groupji) or BaiDuYunPan(password: aw68) and put it in the folder pretrained_models/yolox
. Then use the following command:
./det/yolox/tools/train_yolox.sh <config_path> <gpu_ids> (other args)
./det/yolox/tools/test_yolox.sh <config_path> <gpu_ids> <ckpt_path> (other args)
The difference between this repo and GDR-Net (CVPR2021) mainly includes:
- Domain Randomization: We used stronger domain randomization operations than the conference version during training.
- Network Architecture: We used a more powerful backbone Convnext rather than resnet-34, and two mask heads for predicting amodal mask and visible mask separately.
- Other training details, such as learning rate, weight decay, visible threshold, and bounding box type.
./core/gdrn_modeling/train_gdrn.sh <config_path> <gpu_ids> (other args)
For example:
./core/gdrn_modeling/train_gdrn.sh configs/gdrn/ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv.py 0
./core/gdrn_modeling/test_gdrn.sh <config_path> <gpu_ids> <ckpt_path> (other args)
For example:
./core/gdrn_modeling/test_gdrn.sh configs/gdrn/ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv.py 0 output/gdrn/ycbv/convnext_a6_AugCosyAAEGray_BG05_mlL1_DMask_amodalClipBox_classAware_ycbv/model_final_wo_optim.pth
We utilize depth information to further refine the estimated pose. We provide two types of refinement: fast refinement and iterative refinement.
For fast refinement, we compare the rendered object depth and the observed depth to refine translation. Run
./core/gdrn_modeling/test_gdrn_depth_refine.sh <config_path> <gpu_ids> <ckpt_path> (other args)
For iterative refinement, please checkout to the pose_refine branch for details.
If you use GDRNPP in your research, please use the following BibTeX entries.
@article{liu2025gdrnpp,
title = {GDRNPP: A Geometry-guided and Fully Learning-based Object Pose Estimator},
author = {Liu, Xingyu and Zhang, Ruida and Zhang, Chenyangguang and Wang, Gu and Tang, Jiwen and Li, Zhigang and Ji, Xiangyang},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year = {2025},
}
@InProceedings{Wang_2021_GDRN,
title = {{GDR-Net}: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation},
author = {Wang, Gu and Manhardt, Fabian and Tombari, Federico and Ji, Xiangyang},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {16611-16621}
}