Please remain tuned, this repo will be maintained on a week-to-week basis.
- 27/06/2024: NeRF and 3DGS based 3D scene understanding is added.
- 05/06/2024: Our 2nd version manuscript is accepted by TPAMI.
If you find our survey helpful, please consider citing our paper:
@article{survey-ovd-ovs,
title={A survey on open-vocabulary detection and segmentation: Past, present, and future},
author={Zhu, Chaoyang and Chen, Long},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024}
}
Though we aim to cover every paper, still chances may happen that some works are missing. We believe the repository should be maintained by the community. Peer review is welcome and will be highly appreciated, if you are the authors and find our recordings incorrect, don't hesitate to contact me and fire a PR.
In this survey, we cover two settings (zero-shot and open-vocabulary) and six tasks (object detection, semantic/instance/panoptic segmentation, 3D scene understanding, and video understanding). We pivot on the permission to weak supervision signals and the usage of weak supervision signals to build a taxonomy that is universal across these diverse settings and tasks. The weak supervision signals can be image-text pairs or large vision-language models. Below is a general overview of each methodology.
In current literature, zero-shot and open-vocabulary are used interchangeably, however, we highlight their subtle differences through the evolvement from traditional zero-shot to the newly formulated open-vocabulary setting.
- Zero-Shot Object Detection
- Zero-Shot Segmentation
- Open-Vocabulary Object Detection
- Open-Vocabulary Segmentation
- Open-Vocabulary 3D Scene Understanding
- Open-Vocabulary Video Understanding
- Acknowledgement
Venue | Paper Abbr | Full Title | Project |
---|---|---|---|
ECCV'18 | ZSDv1 | Zero-Shot Object Detection | N/A |
ACCV'18 & IJCV'20 | ZSDv2 | Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts | N/A |
AAAI'20 | CA-ZSR | Context-Aware Zero-Shot Recognition | Code |
AAAI'19 | ZSD-TD | Zero-Shot Object Detection with Textual Descriptions | N/A |
ACCV'20 | BLC | Background Learnable Cascade for Zero-Shot Object Detection | Code |
ICCV'19 | TL-ZSD | Transductive Learning for Zero-Shot Object Detection | N/A |
arXiv'23 | SSB | Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline | N/A |
WACV'20 | MS-Zero | A Multi-Space Approach to Zero-Shot Object Detection | N/A |
TCSVT'19 | ZS-YOLO | Zero Shot Detection | N/A |
AAAI'21 | DPIF | Inference Fusion with Associative Semantics for Unseen Object Detection | Code |
TPAMI'21 | ContrastZSD | Semantics-Guided Contrastive Network for Zero-Shot Object detection | N/A |
IJCAI'20 | ZSD-CNN | Zero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual Space | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'20 | DELO | Dont Even Look Once: Synthesizing Features for Zero-Shot Detection | N/A |
ACCV'20 | SU | Synthesizing the Unseen for Zero-shot Object Detection | Code |
AAAI'20 | GTNet | GTNet: Generative Transfer Network for Zero-Shot Object Detection | Code |
CVPR'22 | RRFS | Robust Region Feature Synthesizer for Zero-Shot Object Detection | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'20 | SPNet | Semantic Projection Network for Zero- and Few-Label Semantic Segmentation | Code |
NeurIPS'20 | ULZSS | Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation | Code |
ICCV'21 | JoEm | Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation | Code |
ICCVW'19 | VM | Zero-Shot Semantic Segmentation via Variational Mapping | N/A |
ICCV'21 | PMOSR | Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
NeurIPS'19 | ZS3Net | Zero-Shot Semantic Segmentation | Code |
NeurIPS'20 | CSRL | Consistent Structural Relation Learning for Zero-Shot Segmentation | N/A |
MM'20 | CaGNet | Context-aware Feature Generation for Zero-shot Semantic Segmentation | Code |
ICCV'21 | SIGN | SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'21 | ZSIS | Zero-Shot Instance Segmentation | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'21 | OVR-CNN | Open-Vocabulary Object Detection Using Captions | Code |
GCPR'22 | LocOv | Localized Vision-Language Matching for Open-vocabulary Object Detection | Code |
arXiv'23 | MMC-Det | Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection | N/A |
NeurIPS'22 | DetCLIP | DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection | N/A |
CVPR'23 | DetCLIPv2 | DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment | N/A |
CVPR'24 | DetCLIPv3 | DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection | N/A |
AAAI'24 | WSOVOD | Weakly Supervised Open-Vocabulary Object Detection | Code |
CVPR'23 | RO-ViT | Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | |
N/A | |||
ICCV'23 | CFM-ViT | Contrastive Feature Masking Open-Vocabulary Vision Transformer | N/A |
ICCV'23 | DITO | Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection | Code |
ICLR'23 | VLDet | Learning Object-Language Alignments for Open-Vocabulary Object Detection | Code |
ICCV'23 | GOAT | Open-Vocabulary Object Detection With an Open Corpus | |
N/A | |||
ECCV'22 | OV-DETR | Open-Vocabulary DETR with Conditional MatchingCode | |
arXiv'23 | Prompt-OVD | Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection | N/A |
CVPR'23 | CORA | CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching | N/A |
ICCV'23 | EdaDet | EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment | Code |
ICCV'21 | MDETR | MDETR: Modulated Detection for End-to-End Multi-Modal Understanding | Code |
ECCV'22 | MAVL | Class-agnostic Object Detection with Multi-modal Transformer | Code |
NeurIPS'24 | MQ-Det | Multi-modal Queried Object Detection in the Wild | Code |
CVPR'24 | YOLO-World | Real-Time Open-Vocabulary Object Detection | Code |
MM'23 | SGDN | Open-Vocabulary Object Detection via Scene Graph Discovery | N/A |
CVPR'24 | USE | USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'22 | RegionCLIP | RegionCLIP: Region-based Language-Image Pretraining | Code |
ECCV'22 | VL-PLM | Exploiting Unlabeled Data with Vision and Language Models for Object Detection | Code |
CVPR'22 | GLIP | Grounded Language-Image Pre-training | Code |
NeurIPS'22 | GLIPv2 | GLIPv2: Unifying Localization and VL | |
Understanding | |||
Code | |||
arXiv'23 | Grounding-DINO | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | Code |
ECCV'22 | PromptDet | PromptDet: Towards Open-vocabulary Detection using Uncurated Images | Code |
arXiv'23 | SAS-Det | Taming Self-Training for Open-Vocabulary Object Detection | Code |
ECCV'22 | PB-OVD | Open Vocabulary Object Detection with Pseudo Bounding-Box Labels | Code |
AAAI'24 | CLIM | CLIM: Contrastive Language-Image Mosaic for Region Representation | Code |
arXiv'22 | VTP-OVD | Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection | N/A |
AAAI'24 | ProxyDet | ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection | Code |
NeurIPS'23 | CoDet | CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection | Code |
ECCV'22 | Detic | Detecting Twenty-thousand Classes using Image-level Supervision | Code |
ICML'23 | MMC | Multi-Modal Classifiers for Open-Vocabulary Object Detection | Code |
arXiv'23 | 3Ways | Three ways to improve feature alignment for open vocabulary detectio | N/A |
arXiv'23 | PLAC | Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | N/A |
arXiv'23 | PCL | Open-Vocabulary Object Detection using Pseudo Caption Labels | |
N/A | |||
NeurIPS'24 | OWLv2 | Scaling Open-Vocabulary Object Detection | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ICLR'22 | ViLD | Open-vocabulary Object Detection via Vision and Language Knowledge Distillation | Code |
ICDMW'22 | ZSD-YOLO | Zero-shot Object Detection Through Vision-Language Embedding Alignment | Code |
WACV'24 | LP-OVOD | LP-OVOD: Open-Vocabulary Object Detection by Linear Probing | Code |
arXiv'23 | EZSD | Efficient Feature Distillation for Zero-shot Annotation Object Detection | Code |
AAAI'24 | SIC-CADS | Simple Image-level Classification Improves Open-vocabulary Object Detection | Code |
CVPR'23 | BARON | Aligning Bag of Regions for Open-Vocabulary Object Detection | Code |
CVPR'23 | OADP | Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | Code |
arXiv'23 | GridCLIP | GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning | N/A |
NeurIPS'22 | RKDWTF | Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection | Code |
ICCV'23 | DK-DETR | Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection | Code |
CVPR'22 | HierKD | Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation | Code |
CVPR'22 | DetPro | Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model | Code |
arXiv'23 | CLIPSelf | CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | Code |
CVPR'24 | SAMP | Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection | N/A |
IJCV'24 | OV-DAR | OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition | N/A |
CVPR'24 | LBP | Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ECCV'22 | OWL-ViT | Simple Open-Vocabulary Object Detection with Vision Transformers | Code |
CVPR'23 | UniDetector | Detecting Everything in the Open World: Towards Universal Object Detection | Code |
ICLR'23 | F-VLM | F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models | Code |
CVPR'23 | ScaleDet | ScaleDet: A Scalable Multi-Dataset Object Detector | N/A |
ICCV'23 | OpenSeed | A Simple Framework for Open-Vocabulary Segmentation and Detection | Code |
arXiv'23 | DRR | What Makes Good Open-Vocabulary Detector: A Disassembling Perspective | N/A |
arXiv'23 | Sambor | Boosting Segment Anything Model Towards Open-Vocabulary Learning | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ECCV'22 | OpenSeg | Scaling Open-Vocabulary Image Segmentation with Image-Level Labels | N/A |
arXiv'23 | SLIC | SILC: Improving Vision Language Pretraining with Self-Distillation | N/A |
CVPR'22 | GroupViT | GroupViT: Semantic Segmentation Emerges from Text Supervision | Code |
ECCV'22 | ViL-Seg | Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding | N/A |
ICML'23 | SegCLIP | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | Code |
CVPR'23 | OVSegmentor | Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision | Code |
CVPR'23 | PACL | Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning | N/A |
CVPR'23 | TCL | Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs | |
Code | |||
ECCV'22 | SimSeg | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ECCV'22 | TTD | Open-Vocabulary Semantic Segmentation Using Test-Time Distillation | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
arXiv'23 | GKC | Global Knowledge Calibration for Fast Open-Vocabulary Segmentation | N/A |
arXiv'23 | SAM-CLIP | SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding | N/A |
ICCV'23 | ZeroSeg | Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ICLR'22 | LSeg | Language-driven Semantic Segmentation | Code |
CVPR'23 | SAZS | Delving Into Shape-Aware Zero-Shot Semantic Segmentation | Code |
MM'23 | CEL | Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation | N/A |
CVPR'22 | ZegFormer | Decoupling Zero-Shot Semantic Segmentation | Code |
NeurIPS'22 | ReCo | ReCo: Retrieve and Co-segment for Zero-shot Transfer | Project |
arXiv'23 | SCAN | Open-Vocabulary Segmentation with Semantic-Assisted Calibration | N/A |
ECCV'22 | ZSSeg | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | Code |
ECCV'22 | MaskCLIP | Extract Free Dense Labels from CLIP | Code |
arXiv'23 | CLIP-DINOiser | CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation | Code |
PRCV'23 | MVP-SEG | MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation | N/A |
arXiv'23 | OVDiff | Diffusion Models for Zero-Shot Open-Vocabulary Segmentation | Project |
WACV'24 | FOSSIL | FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval | N/A |
NeurIPS'24 | POMP | Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition | Code |
NeurIPS'24 | AttrSeg | AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation | N/A |
arXiv'23 | PnP-OVSS | Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models | Code |
arXiv'23 | TagAlign | TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification | Project |
arXiv'23 | SelfSeg | Auto-Vocabulary Semantic Segmentation | N/A |
CVPR'22 | DenseCLIP | DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting | Code |
CVPR'23 | OVSeg | Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP | Code |
arXiv'23 | CAT-Seg | CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation | Code |
arXiv'23 | SED | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | Code |
NeurIPS'23 | MAFT | Learning Mask-aware CLIP Representations for Zero-Shot Segmentation | Code |
arXiv'23 | TagCLIP | TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation | N/A |
CVPR'23 | ZegCLIP | ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation | Code |
CVPR'22 | CLIPSeg | Image Segmentation Using Text and Image Prompts | Code |
CVPR'23 | SAN | Side Adapter Network for Open-Vocabulary Semantic Segmentation | Code |
arXiv'23 | CLIP Surgery | CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks | Code |
arXiv'23 | CaR | CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor | Project |
arXiv'24 | Cascade-CLIP | Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation | Code |
arXiv'24 | OpenDAS | OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation | Project |
arXiv'24 | H-CLIP | Parameter-efficient Fine-tuning in Hyperspherical | |
Space for Open-vocabulary Semantic Segmentation | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ICCV'23 | CGG | Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Code |
CVPR'23 | D2Zero | Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'23 | XPM | Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling | Code |
CVPR'23 | Mask-free OVIS | Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations | Code |
arXiv'23 | MosaicFusion | MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
arXiv'24 | OV-SAM | Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
arXiv'24 | Uni-OVSeg | Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision | Code |
CVPR'23 | X-Decoder | Generalized Decoding for Pixel, Image, and Language | Code |
CVPR'24 | APE | Learning active tactile perception through belief-space control | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'23 | PADing | Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
NeurIPS'23 | FC-CLIP | Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP | Code |
CVPR'23 | FreeSeg | FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation | Project |
arXiv'24 | PosSAM | PosSAM: Panoptic Open-vocabulary Segment Anything | Project |
ICCV'23 | MasQCLIP | MasQCLIP for Open-Vocabulary Universal Image Segmentation | Project |
CVPR'23 | OMG-Seg | OMG-Seg: Is One Model Good Enough For All Segmentation? | Code |
arXiv'23 | Semantic-SAM | Semantic-SAM: Segment and Recognize Anything at Any Granularity | Code |
CVPR'23 | ODISE | Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | Code |
NeurIPS'23 | HIPIE | Hierarchical Open-vocabulary Universal Image Segmentation | Code |
ICML'23 | MaskCLIP | Open-Vocabulary Universal Image Segmentation with MaskCLIP | Project |
ICCV'23 | OPSNet | Open-vocabulary Panoptic Segmentation with Embedding Modulation | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
CVPR'23 | OV-3DET | Open-Vocabulary Point-Cloud Object Detection without 3D Annotation | Code |
AAAI'24 | FM-OV3D | FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection | Code |
arXiv'23 | OpenSight | OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection | N/A |
NeurIPS'23 | CoDA | CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection | Code |
arXiv'23 | L3Det | Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection | N/A |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
arXiv'21 | SeCondPoint | Language-Level Semantics Conditioned 3D Point Cloud Segmentation | N/A |
3DV'21 | 3DGenZ | Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds | Code |
CVPR'23 | OpenScene | OpenScene: 3D Scene Understanding with Open Vocabularies | Project |
CVPR'23 | PLA | PLA: Language-Driven Open-Vocabulary 3D Scene Understanding | |
Code | |||
arXiv'23 | RegionPLC | RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding | Project |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
NeurIPS'23 | OpenMask3D | OpenMask3D: Open-Vocabulary 3D Instance Segmentation | Project |
CVPR'24 | MaskClustering | MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation | Project |
arXiv'23 | OpenIns3D | OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation | Project |
arXiv'23 | Open3DIS | Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance | Project |
arXiv'24 | OpenSU3D | OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Project |
arXiv'24 | Search3D | Search3D: Hierarchical Open-Vocabulary 3D Segmentation | N/A |
NeRF (Neural Radiance Field) and 3DGS (3D Gaussian Splatting) are hot topics for novel view synthesis in a holistic scene. They leverage multi-view consistency learning inherently imposed in the 3D model to help 2D image segmentation or directly perform 3D semantic segmentation over points (voxel or gaussian) in the scene.
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ICCV'21 | Semantic-NeRF | In-Place Scene Labelling and Understanding With Implicit Scene Representation | Code |
NeurIPS'22 | FFD | Decomposing NeRF for Editing via Feature Field Distillation | Code |
arXiv'23 | Gaussian Grouping | Gaussian Grouping: Segment and Edit Anything in 3D Scenes | Code |
ICCV'23 | LERF | LERF: Language Embedded Radiance Fields | Project |
NeurIPS'23 | 3DOVS | Weakly Supervised 3D Open-vocabulary Segmentation | Code |
arXiv'24 | OpenGaussian | OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding | Project |
arXiv'24 | OV-NeRF | OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding | Code |
arXiv'24 | Semantic Gaussians | Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting | Project |
arXiv'24 | FMGS | FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding | Project |
CVPR'24 | LEGaussians | Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding | Code |
CVPR'24 | LangSplat | LangSplat: 3D Language Gaussian Splatting | Project |
CVPR'24 | Feature 3DGS | Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields | Code |
Venue | Paper Abbr | Paper Title | Project |
---|---|---|---|
ICCV'23 | OV2Seg | Towards Open-Vocabulary Video Instance Segmentation | Code |
arXiv'23 | OpenVIS | OpenVIS: Open-vocabulary Video Instance Segmentation | Code |
arXiv'24 | BriVIS | Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation | Code |