✂CoHD: A Counting-Aware🔢 Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo*, Yinghao Wu*, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang
Tsinghua University And Tencent
- [2025/08/17] 🔥🔥🔥 The training code and checkpoints are released.
- [2025/06/28] 🔥🔥🔥 COHD is accepted by ICCV 2025.
The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving complex multiple/non-target scenarios. Recent approaches address GRES by directly extending the well-adopted RES frameworks with object-existence identification. However, these approaches tend to encode multi-granularity object information into a single representation, which makes it difficult to precisely represent comprehensive objects of different granularity. Moreover, the simple binary object-existence identification across all referent scenarios fails to specify their inherent differences, incurring ambiguity in object understanding. To tackle the above issues, we propose a Counting-Aware Hierarchical Decoding framework (CoHD) for GRES. By decoupling the intricate referring semantics into different granularity with a visual-linguistic hierarchy, and dynamic aggregating it with intra- and inter-selection, CoHD boosts multi-granularity comprehension with the reciprocal benefit of the hierarchical nature. Furthermore, we incorporate the counting ability by embodying multiple/single/non-target scenarios into count- and category-level supervision, facilitating comprehensive object perception. Experimental results on gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO benchmarks demonstrate the effectiveness and rationality of CoHD which outperforms state-of-the-art GRES methods by a remarkable margin.
Env: The code is training using CUDA 11.3
torch 1.12.1
torchvision 0.13.1
Python 3.8.8
(other versions may also be fine)
Dependencies:
- Install Detectron2
- Run
sh make.sh
undergres_model/modeling/pixel_decoder/ops
- Install other required packages:
pip install -r requirements.txt
Dataset:
Please download the annotations from Dataset
dataset
├── grefcoco
│ ├── grefs(unc).json
│ ├── instances.json
│ ├── cateid2coco.json
│ ├── cocoidtosuper.json
└── images
└── train2014
├── COCO_train2014_xxxxxxxxxxxx.jpg
├── COCO_train2014_xxxxxxxxxxxx.jpg
└── ...
Note that, prepare the Swin-Base and Swin-Tiny pretrained model according to RELA
🚀 Training Scripts
- Swin-Base
bash scripts/grefcoco/train_base.sh
- Swin-Tiny
bash scripts/grefcoco/train_tiny.sh
🚀 Evaluation Scripts
- Swin-Base
bash scripts/grefcoco/eval_base.sh
- Swin-Tiny
bash scripts/grefcoco/eval_tiny.sh
- Grefcoco Validation Set
Method | Backbone | gIoU | cIoU | N-acc. | Checkpoint |
---|---|---|---|---|---|
CoHD | Swin-T | 65.89 | 62.95 | 60.96 | Model |
CoHD | Swin-B | 68.42 | 65.17 | 63.38 | Model |
- Release RefZOM training and evaluation script
- Release R-RefCOCO training and evaluation script
- Relase Refcoco training and evaluation script
Code in this repository is built upon several public repositories. Thanks for the wonderful work ReLA! !
if you find it helpful, please cite
@article{luo2024cohd,
title={CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation},
author={Luo, Zhuoyan and Wu, Yinghao and Cheng Tianheng and Liu, Yong and Xiao, Yicheng and Wang Hongfa and Zhang, Xiao-Ping and Yang, Yujiu},
journal={arXiv preprint arXiv:2405.15658},
year={2024}
}