A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

News

Please remain tuned, this repo will be maintained on a week-to-week basis.

27/06/2024: NeRF and 3DGS based 3D scene understanding is added.
05/06/2024: Our 2nd version manuscript is accepted by TPAMI.

Bibtex

If you find our survey helpful, please consider citing our paper:

@article{survey-ovd-ovs,
  title={A survey on open-vocabulary detection and segmentation: Past, present, and future},
  author={Zhu, Chaoyang and Chen, Long},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

✨ PR is welcome!

Though we aim to cover every paper, still chances may happen that some works are missing. We believe the repository should be maintained by the community. Peer review is welcome and will be highly appreciated, if you are the authors and find our recordings incorrect, don't hesitate to contact me and fire a PR.

General Overview

In this survey, we cover two settings (zero-shot and open-vocabulary) and six tasks (object detection, semantic/instance/panoptic segmentation, 3D scene understanding, and video understanding). We pivot on the permission to weak supervision signals and the usage of weak supervision signals to build a taxonomy that is universal across these diverse settings and tasks. The weak supervision signals can be image-text pairs or large vision-language models. Below is a general overview of each methodology.

In current literature, zero-shot and open-vocabulary are used interchangeably, however, we highlight their subtle differences through the evolvement from traditional zero-shot to the newly formulated open-vocabulary setting.

Zero-Shot Object Detection
- Visual-Semantic Space Mapping
- Novel Visual Feature Synthesis
Zero-Shot Segmentation
- Zero-Shot Semantic Segmentation
  - Visual-Semantic Space Mapping
  - Novel Visual Feature Synthesis
- Zero-Shot Instance Segmentation
Open-Vocabulary Object Detection
Open-Vocabulary Segmentation
Open-Vocabulary 3D Scene Understanding
Open-Vocabulary Video Understanding
- Open-Vocabulary Video Instance Segmentation
Acknowledgement

Zero-Shot Object Detection

Visual-Semantic Space Mapping

Venue	Paper Abbr	Full Title	Project
ECCV'18	ZSDv1	Zero-Shot Object Detection	N/A
ACCV'18 & IJCV'20	ZSDv2	Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts	N/A
AAAI'20	CA-ZSR	Context-Aware Zero-Shot Recognition	Code
AAAI'19	ZSD-TD	Zero-Shot Object Detection with Textual Descriptions	N/A
ACCV'20	BLC	Background Learnable Cascade for Zero-Shot Object Detection	Code
ICCV'19	TL-ZSD	Transductive Learning for Zero-Shot Object Detection	N/A
arXiv'23	SSB	Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline	N/A
WACV'20	MS-Zero	A Multi-Space Approach to Zero-Shot Object Detection	N/A
TCSVT'19	ZS-YOLO	Zero Shot Detection	N/A
AAAI'21	DPIF	Inference Fusion with Associative Semantics for Unseen Object Detection	Code
TPAMI'21	ContrastZSD	Semantics-Guided Contrastive Network for Zero-Shot Object detection	N/A
IJCAI'20	ZSD-CNN	Zero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual Space	N/A

Novel Visual Feature Synthesis

Venue	Paper Abbr	Paper Title	Project
CVPR'20	DELO	Dont Even Look Once: Synthesizing Features for Zero-Shot Detection	N/A
ACCV'20	SU	Synthesizing the Unseen for Zero-shot Object Detection	Code
AAAI'20	GTNet	GTNet: Generative Transfer Network for Zero-Shot Object Detection	Code
CVPR'22	RRFS	Robust Region Feature Synthesizer for Zero-Shot Object Detection	Code

Zero-Shot Segmentation

Zero-Shot Semantic Segmentation

Visual-Semantic Space Mapping

Venue	Paper Abbr	Paper Title	Project
CVPR'20	SPNet	Semantic Projection Network for Zero- and Few-Label Semantic Segmentation	Code
NeurIPS'20	ULZSS	Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation	Code
ICCV'21	JoEm	Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation	Code
ICCVW'19	VM	Zero-Shot Semantic Segmentation via Variational Mapping	N/A
ICCV'21	PMOSR	Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation	N/A

Novel Visual Feature Synthesis

Venue	Paper Abbr	Paper Title	Project
NeurIPS'19	ZS3Net	Zero-Shot Semantic Segmentation	Code
NeurIPS'20	CSRL	Consistent Structural Relation Learning for Zero-Shot Segmentation	N/A
MM'20	CaGNet	Context-aware Feature Generation for Zero-shot Semantic Segmentation	Code
ICCV'21	SIGN	SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation	Code

Zero-Shot Instance Segmentation

Venue	Paper Abbr	Paper Title	Project
CVPR'21	ZSIS	Zero-Shot Instance Segmentation	Code

Open-Vocabulary Object Detection

Region-Aware Training

Venue	Paper Abbr	Paper Title	Project
CVPR'21	OVR-CNN	Open-Vocabulary Object Detection Using Captions	Code
GCPR'22	LocOv	Localized Vision-Language Matching for Open-vocabulary Object Detection	Code
arXiv'23	MMC-Det	Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection	N/A
NeurIPS'22	DetCLIP	DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection	N/A
CVPR'23	DetCLIPv2	DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment	N/A
CVPR'24	DetCLIPv3	DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection	N/A
AAAI'24	WSOVOD	Weakly Supervised Open-Vocabulary Object Detection	Code
CVPR'23	RO-ViT	Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
N/A
ICCV'23	CFM-ViT	Contrastive Feature Masking Open-Vocabulary Vision Transformer	N/A
ICCV'23	DITO	Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection	Code
ICLR'23	VLDet	Learning Object-Language Alignments for Open-Vocabulary Object Detection	Code
ICCV'23	GOAT	Open-Vocabulary Object Detection With an Open Corpus
N/A
ECCV'22	OV-DETR	Open-Vocabulary DETR with Conditional MatchingCode
arXiv'23	Prompt-OVD	Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection	N/A
CVPR'23	CORA	CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching	N/A
ICCV'23	EdaDet	EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment	Code
ICCV'21	MDETR	MDETR: Modulated Detection for End-to-End Multi-Modal Understanding	Code
ECCV'22	MAVL	Class-agnostic Object Detection with Multi-modal Transformer	Code
NeurIPS'24	MQ-Det	Multi-modal Queried Object Detection in the Wild	Code
CVPR'24	YOLO-World	Real-Time Open-Vocabulary Object Detection	Code
MM'23	SGDN	Open-Vocabulary Object Detection via Scene Graph Discovery	N/A
CVPR'24	USE	USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation	N/A

Pseudo-Labeling

Venue	Paper Abbr	Paper Title	Project
CVPR'22	RegionCLIP	RegionCLIP: Region-based Language-Image Pretraining	Code
ECCV'22	VL-PLM	Exploiting Unlabeled Data with Vision and Language Models for Object Detection	Code
CVPR'22	GLIP	Grounded Language-Image Pre-training	Code
NeurIPS'22	GLIPv2	GLIPv2: Unifying Localization and VL
Understanding
Code
arXiv'23	Grounding-DINO	Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection	Code
ECCV'22	PromptDet	PromptDet: Towards Open-vocabulary Detection using Uncurated Images	Code
arXiv'23	SAS-Det	Taming Self-Training for Open-Vocabulary Object Detection	Code
ECCV'22	PB-OVD	Open Vocabulary Object Detection with Pseudo Bounding-Box Labels	Code
AAAI'24	CLIM	CLIM: Contrastive Language-Image Mosaic for Region Representation	Code
arXiv'22	VTP-OVD	Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection	N/A
AAAI'24	ProxyDet	ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection	Code
NeurIPS'23	CoDet	CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection	Code
ECCV'22	Detic	Detecting Twenty-thousand Classes using Image-level Supervision	Code
ICML'23	MMC	Multi-Modal Classifiers for Open-Vocabulary Object Detection	Code
arXiv'23	3Ways	Three ways to improve feature alignment for open vocabulary detectio	N/A
arXiv'23	PLAC	Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection	N/A
arXiv'23	PCL	Open-Vocabulary Object Detection using Pseudo Caption Labels
N/A
NeurIPS'24	OWLv2	Scaling Open-Vocabulary Object Detection	Code

Knowledge Distillation

Venue	Paper Abbr	Paper Title	Project
ICLR'22	ViLD	Open-vocabulary Object Detection via Vision and Language Knowledge Distillation	Code
ICDMW'22	ZSD-YOLO	Zero-shot Object Detection Through Vision-Language Embedding Alignment	Code
WACV'24	LP-OVOD	LP-OVOD: Open-Vocabulary Object Detection by Linear Probing	Code
arXiv'23	EZSD	Efficient Feature Distillation for Zero-shot Annotation Object Detection	Code
AAAI'24	SIC-CADS	Simple Image-level Classification Improves Open-vocabulary Object Detection	Code
CVPR'23	BARON	Aligning Bag of Regions for Open-Vocabulary Object Detection	Code
CVPR'23	OADP	Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection	Code
arXiv'23	GridCLIP	GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning	N/A
NeurIPS'22	RKDWTF	Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection	Code
ICCV'23	DK-DETR	Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection	Code
CVPR'22	HierKD	Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation	Code
CVPR'22	DetPro	Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model	Code
arXiv'23	CLIPSelf	CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction	Code
CVPR'24	SAMP	Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection	N/A
IJCV'24	OV-DAR	OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition	N/A
CVPR'24	LBP	Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection	N/A

Transfer Learning

Venue	Paper Abbr	Paper Title	Project
ECCV'22	OWL-ViT	Simple Open-Vocabulary Object Detection with Vision Transformers	Code
CVPR'23	UniDetector	Detecting Everything in the Open World: Towards Universal Object Detection	Code
ICLR'23	F-VLM	F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models	Code
CVPR'23	ScaleDet	ScaleDet: A Scalable Multi-Dataset Object Detector	N/A
ICCV'23	OpenSeed	A Simple Framework for Open-Vocabulary Segmentation and Detection	Code
arXiv'23	DRR	What Makes Good Open-Vocabulary Detector: A Disassembling Perspective	N/A
arXiv'23	Sambor	Boosting Segment Anything Model Towards Open-Vocabulary Learning	Code

Open-Vocabulary Segmentation

Open-Vocabulary Semantic Segmentation

Region-Aware Training

Venue	Paper Abbr	Paper Title	Project
ECCV'22	OpenSeg	Scaling Open-Vocabulary Image Segmentation with Image-Level Labels	N/A
arXiv'23	SLIC	SILC: Improving Vision Language Pretraining with Self-Distillation	N/A
CVPR'22	GroupViT	GroupViT: Semantic Segmentation Emerges from Text Supervision	Code
ECCV'22	ViL-Seg	Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding	N/A
ICML'23	SegCLIP	SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation	Code
CVPR'23	OVSegmentor	Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision	Code
CVPR'23	PACL	Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning	N/A
CVPR'23	TCL	Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Code
ECCV'22	SimSeg	A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model	Code

Pseudo-Labeling

Venue	Paper Abbr	Paper Title	Project
ECCV'22	TTD	Open-Vocabulary Semantic Segmentation Using Test-Time Distillation	N/A

Knowledge Distillation

Venue	Paper Abbr	Paper Title	Project
arXiv'23	GKC	Global Knowledge Calibration for Fast Open-Vocabulary Segmentation	N/A
arXiv'23	SAM-CLIP	SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding	N/A
ICCV'23	ZeroSeg	Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only	Code

Transfer Learning

Venue	Paper Abbr	Paper Title	Project
ICLR'22	LSeg	Language-driven Semantic Segmentation	Code
CVPR'23	SAZS	Delving Into Shape-Aware Zero-Shot Semantic Segmentation	Code
MM'23	CEL	Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation	N/A
CVPR'22	ZegFormer	Decoupling Zero-Shot Semantic Segmentation	Code
NeurIPS'22	ReCo	ReCo: Retrieve and Co-segment for Zero-shot Transfer	Project
arXiv'23	SCAN	Open-Vocabulary Segmentation with Semantic-Assisted Calibration	N/A
ECCV'22	ZSSeg	A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model	Code
ECCV'22	MaskCLIP	Extract Free Dense Labels from CLIP	Code
arXiv'23	CLIP-DINOiser	CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation	Code
PRCV'23	MVP-SEG	MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation	N/A
arXiv'23	OVDiff	Diffusion Models for Zero-Shot Open-Vocabulary Segmentation	Project
WACV'24	FOSSIL	FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval	N/A
NeurIPS'24	POMP	Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition	Code
NeurIPS'24	AttrSeg	AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation	N/A
arXiv'23	PnP-OVSS	Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models	Code
arXiv'23	TagAlign	TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification	Project
arXiv'23	SelfSeg	Auto-Vocabulary Semantic Segmentation	N/A
CVPR'22	DenseCLIP	DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting	Code
CVPR'23	OVSeg	Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP	Code
arXiv'23	CAT-Seg	CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation	Code
arXiv'23	SED	SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation	Code
NeurIPS'23	MAFT	Learning Mask-aware CLIP Representations for Zero-Shot Segmentation	Code
arXiv'23	TagCLIP	TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation	N/A
CVPR'23	ZegCLIP	ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation	Code
CVPR'22	CLIPSeg	Image Segmentation Using Text and Image Prompts	Code
CVPR'23	SAN	Side Adapter Network for Open-Vocabulary Semantic Segmentation	Code
arXiv'23	CLIP Surgery	CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks	Code
arXiv'23	CaR	CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor	Project
arXiv'24	Cascade-CLIP	Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation	Code
arXiv'24	OpenDAS	OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation	Project
arXiv'24	H-CLIP	Parameter-efficient Fine-tuning in Hyperspherical
Space for Open-vocabulary Semantic Segmentation	N/A

Open-Vocabulary Instance Segmentation

Region-Aware Training

Venue	Paper Abbr	Paper Title	Project
ICCV'23	CGG	Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation	Code
CVPR'23	D2Zero	Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation	Code

Pseudo-Labeling

Venue	Paper Abbr	Paper Title	Project
CVPR'23	XPM	Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling	Code
CVPR'23	Mask-free OVIS	Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations	Code
arXiv'23	MosaicFusion	MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation	Code

Knowledge Distillation

Venue	Paper Abbr	Paper Title	Project
arXiv'24	OV-SAM	Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively	Code

Open-Vocabulary Panoptic Segmentation

Region-Aware Training

Venue	Paper Abbr	Paper Title	Project
arXiv'24	Uni-OVSeg	Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision	Code
CVPR'23	X-Decoder	Generalized Decoding for Pixel, Image, and Language	Code
CVPR'24	APE	Learning active tactile perception through belief-space control	Code

Knowledge Distillation

Venue	Paper Abbr	Paper Title	Project
CVPR'23	PADing	Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation	Code

Transfer Learning

Venue	Paper Abbr	Paper Title	Project
NeurIPS'23	FC-CLIP	Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP	Code
CVPR'23	FreeSeg	FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation	Project
arXiv'24	PosSAM	PosSAM: Panoptic Open-vocabulary Segment Anything	Project
ICCV'23	MasQCLIP	MasQCLIP for Open-Vocabulary Universal Image Segmentation	Project
CVPR'23	OMG-Seg	OMG-Seg: Is One Model Good Enough For All Segmentation?	Code
arXiv'23	Semantic-SAM	Semantic-SAM: Segment and Recognize Anything at Any Granularity	Code
CVPR'23	ODISE	Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models	Code
NeurIPS'23	HIPIE	Hierarchical Open-vocabulary Universal Image Segmentation	Code
ICML'23	MaskCLIP	Open-Vocabulary Universal Image Segmentation with MaskCLIP	Project
ICCV'23	OPSNet	Open-vocabulary Panoptic Segmentation with Embedding Modulation	N/A

Open-Vocabulary 3D Scene Understanding

Open-Vocabulary 3D Detection

Venue	Paper Abbr	Paper Title	Project
CVPR'23	OV-3DET	Open-Vocabulary Point-Cloud Object Detection without 3D Annotation	Code
AAAI'24	FM-OV3D	FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection	Code
arXiv'23	OpenSight	OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection	N/A
NeurIPS'23	CoDA	CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection	Code
arXiv'23	L3Det	Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection	N/A

Open-Vocabulary 3D Segmentation

Open-Vocabulary 3D Semantic Segmentation

Venue	Paper Abbr	Paper Title	Project
arXiv'21	SeCondPoint	Language-Level Semantics Conditioned 3D Point Cloud Segmentation	N/A
3DV'21	3DGenZ	Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds	Code
CVPR'23	OpenScene	OpenScene: 3D Scene Understanding with Open Vocabularies	Project
CVPR'23	PLA	PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Code
arXiv'23	RegionPLC	RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding	Project

Open-Vocabulary 3D Instance Segmentation

Venue	Paper Abbr	Paper Title	Project
NeurIPS'23	OpenMask3D	OpenMask3D: Open-Vocabulary 3D Instance Segmentation	Project
CVPR'24	MaskClustering	MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation	Project
arXiv'23	OpenIns3D	OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation	Project
arXiv'23	Open3DIS	Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance	Project
arXiv'24	OpenSU3D	OpenSU3D: Open World 3D Scene Understanding using Foundation Models	Project
arXiv'24	Search3D	Search3D: Hierarchical Open-Vocabulary 3D Segmentation	N/A

NeRF and 3DGS based

NeRF (Neural Radiance Field) and 3DGS (3D Gaussian Splatting) are hot topics for novel view synthesis in a holistic scene. They leverage multi-view consistency learning inherently imposed in the 3D model to help 2D image segmentation or directly perform 3D semantic segmentation over points (voxel or gaussian) in the scene.

Venue	Paper Abbr	Paper Title	Project
ICCV'21	Semantic-NeRF	In-Place Scene Labelling and Understanding With Implicit Scene Representation	Code
NeurIPS'22	FFD	Decomposing NeRF for Editing via Feature Field Distillation	Code
arXiv'23	Gaussian Grouping	Gaussian Grouping: Segment and Edit Anything in 3D Scenes	Code
ICCV'23	LERF	LERF: Language Embedded Radiance Fields	Project
NeurIPS'23	3DOVS	Weakly Supervised 3D Open-vocabulary Segmentation	Code
arXiv'24	OpenGaussian	OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding	Project
arXiv'24	OV-NeRF	OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding	Code
arXiv'24	Semantic Gaussians	Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting	Project
arXiv'24	FMGS	FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding	Project
CVPR'24	LEGaussians	Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding	Code
CVPR'24	LangSplat	LangSplat: 3D Language Gaussian Splatting	Project
CVPR'24	Feature 3DGS	Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields	Code

Open-Vocabulary Video Understanding

Open-Vocabulary Video Instance Segmentation

Venue	Paper Abbr	Paper Title	Project
ICCV'23	OV2Seg	Towards Open-Vocabulary Video Instance Segmentation	Code
arXiv'23	OpenVIS	OpenVIS: Open-vocabulary Video Instance Segmentation	Code
arXiv'24	BriVIS	Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation	Code

Files

README.md

Latest commit

History

README.md

File metadata and controls

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

News

Bibtex

✨ PR is welcome!

General Overview

Table of Contents

Zero-Shot Object Detection

Visual-Semantic Space Mapping

Novel Visual Feature Synthesis

Zero-Shot Segmentation

Zero-Shot Semantic Segmentation

Visual-Semantic Space Mapping

Novel Visual Feature Synthesis

Zero-Shot Instance Segmentation

Open-Vocabulary Object Detection

Region-Aware Training

Pseudo-Labeling

Knowledge Distillation

Transfer Learning

Open-Vocabulary Segmentation

Open-Vocabulary Semantic Segmentation

Region-Aware Training

Pseudo-Labeling

Knowledge Distillation

Transfer Learning

Open-Vocabulary Instance Segmentation

Region-Aware Training

Pseudo-Labeling

Knowledge Distillation

Open-Vocabulary Panoptic Segmentation

Region-Aware Training

Knowledge Distillation

Transfer Learning

Open-Vocabulary 3D Scene Understanding

Open-Vocabulary 3D Detection

Open-Vocabulary 3D Segmentation

Open-Vocabulary 3D Semantic Segmentation

Open-Vocabulary 3D Instance Segmentation

NeRF and 3DGS based

Open-Vocabulary Video Understanding

Open-Vocabulary Video Instance Segmentation