This repository is the official PyTorch implementation of our zero-shot cosod framework. [arXiv]
Co-salient Object Detection (CoSOD) endeavors to replicate the human visual system’s capacity to recognize common and salient objects within a collection of images. Despite recent advancements in deep learning models, these models still rely on training with well-annotated CoSOD datasets. The exploration of training-free zero-shot CoSOD frameworks has been limited. In this paper, taking inspiration from the zero-shot transfer capabilities of foundational computer vision models, we introduce the first zero-shot CoSOD framework that harnesses these models without any training process. To achieve this, we introduce two novel components in our proposed framework: the group prompt generation (GPG) module and the co-saliency map generation (CMP) module. We evaluate the framework’s performance on widely-used datasets and observe impressive results. Our approach surpasses existing unsupervised methods and even outperforms fully supervised methods developed before 2020, while remaining competitive with some fully supervised methods developed before 2022.
- quantitative results
- qualitative results
-
Environment
conda create -n zscosod python=3.9 conda activate zscosod pip install -e . pip install -r requirements.txt
-
Datasets preparation
Download all the test datasets from my google-drive or BaiduYun (fetch code: qwt8). The file directory structure is as follows:
+-- zs-cosod | +-- data | +-- CoSal2015 (Testing Dataset) | +-- img (Image Groups) | +-- gt (Ground Truth Groups) | +-- blip2-caption (Image Caption) | +-- CoCA (Testing Dataset) | +-- CoSOD3k (Testing Dataset) | ...
-
Test and evalutation
Download the ckeckpoints of TSDN and SAM from google-drive | BaiduYun (fetch code: be34). Place the ckpt folder in the main directory. Here is a command example of testing our model (test CoSal2015 with vit-base backbone).
1. sh sd-dino/extract_feat.sh (Feature Extraction by StableDiffusion-1.5 and DINOv2-base) 2. sh A2S-v2/inference_sod.sh (Saliency Map Generation by Unsupervised TSDN) 3. sh inference_cosod.sh (CoSaliency Map Generation)
Run the following command to evaluate your prediction results,the metrics include max F-measure, S-measure, and MAE.
CUDA_VISIBLE_DEVICES=0 python evaluate.py --pred_root results --datasets CoSal2015
For more metrics, CoSOD evaluation toolbox eval-co-sod is strongly recommended.
@article{DBLP:journals/corr/abs-2309-05499,
author = {Haoke Xiao and
Lv Tang and
Bo Li and
Zhiming Luo and
Shaozi Li},
title = {Zero-Shot Co-salient Object Detection Framework},
journal = {CoRR},
volume = {abs/2309.05499},
year = {2023}
}
Our code is largely based on the following open-source projects: ODISE, dino-vit-features (official implementation), dino-vit-features (Kamal Gupta's implementation), SAM, and TSDN. Our heartfelt gratitude goes to the developers of these resources!
Feel free to leave issues here or send me e-mails ([email protected]).