Chaoyang Zhu ,
Long Chen*
Please remain tuned, this repo will be maintained on a week-to-week basis.
27/06/2024: NeRF and 3DGS based 3D scene understanding is added.
05/06/2024: Our 2nd version manuscript is accepted by TPAMI.
If you find our survey helpful, please consider citing our paper:
@article {survey-ovd-ovs ,
title ={ A survey on open-vocabulary detection and segmentation: Past, present, and future} ,
author ={ Zhu, Chaoyang and Chen, Long} ,
journal ={ IEEE Transactions on Pattern Analysis and Machine Intelligence} ,
year ={ 2024}
}
Though we aim to cover every paper, still chances may happen that some works are missing. We believe the repository should be maintained by the community. Peer review is welcome and will be highly appreciated, if you are the authors and find our recordings incorrect, don't hesitate to contact me and fire a PR.
In this survey, we cover two settings (zero-shot and open-vocabulary) and six tasks (object detection, semantic/instance/panoptic segmentation, 3D scene understanding, and video understanding). We pivot on the permission to weak supervision signals and the usage of weak supervision signals to build a taxonomy that is universal across these diverse settings and tasks. The weak supervision signals can be image-text pairs or large vision-language models. Below is a general overview of each methodology.
In current literature, zero-shot and open-vocabulary are used interchangeably, however, we highlight their subtle differences through the evolvement from traditional zero-shot to the newly formulated open-vocabulary setting.
Zero-Shot Object Detection
Visual-Semantic Space Mapping
Venue
Paper Abbr
Full Title
Project
ECCV'18
ZSDv1
Zero-Shot Object Detection
N/A
ACCV'18 & IJCV'20
ZSDv2
Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts
N/A
AAAI'20
CA-ZSR
Context-Aware Zero-Shot Recognition
Code
AAAI'19
ZSD-TD
Zero-Shot Object Detection with Textual Descriptions
N/A
ACCV'20
BLC
Background Learnable Cascade for Zero-Shot Object Detection
Code
ICCV'19
TL-ZSD
Transductive Learning for Zero-Shot Object Detection
N/A
arXiv'23
SSB
Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline
N/A
WACV'20
MS-Zero
A Multi-Space Approach to Zero-Shot Object Detection
N/A
TCSVT'19
ZS-YOLO
Zero Shot Detection
N/A
AAAI'21
DPIF
Inference Fusion with Associative Semantics for Unseen Object Detection
Code
TPAMI'21
ContrastZSD
Semantics-Guided Contrastive Network for Zero-Shot Object detection
N/A
IJCAI'20
ZSD-CNN
Zero-Shot Object Detection via Learning an Embedding from Semantic Space to Visual Space
N/A
Novel Visual Feature Synthesis
Venue
Paper Abbr
Paper Title
Project
CVPR'20
DELO
Dont Even Look Once: Synthesizing Features for Zero-Shot Detection
N/A
ACCV'20
SU
Synthesizing the Unseen for Zero-shot Object Detection
Code
AAAI'20
GTNet
GTNet: Generative Transfer Network for Zero-Shot Object Detection
Code
CVPR'22
RRFS
Robust Region Feature Synthesizer for Zero-Shot Object Detection
Code
Zero-Shot Semantic Segmentation
Visual-Semantic Space Mapping
Venue
Paper Abbr
Paper Title
Project
CVPR'20
SPNet
Semantic Projection Network for Zero- and Few-Label Semantic Segmentation
Code
NeurIPS'20
ULZSS
Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation
Code
ICCV'21
JoEm
Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
Code
ICCVW'19
VM
Zero-Shot Semantic Segmentation via Variational Mapping
N/A
ICCV'21
PMOSR
Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation
N/A
Novel Visual Feature Synthesis
Venue
Paper Abbr
Paper Title
Project
NeurIPS'19
ZS3Net
Zero-Shot Semantic Segmentation
Code
NeurIPS'20
CSRL
Consistent Structural Relation Learning for Zero-Shot Segmentation
N/A
MM'20
CaGNet
Context-aware Feature Generation for Zero-shot Semantic Segmentation
Code
ICCV'21
SIGN
SIGN: Spatial-information Incorporated Generative Network for Generalized Zero-shot Semantic Segmentation
Code
Zero-Shot Instance Segmentation
Venue
Paper Abbr
Paper Title
Project
CVPR'21
ZSIS
Zero-Shot Instance Segmentation
Code
Open-Vocabulary Object Detection
Venue
Paper Abbr
Paper Title
Project
CVPR'21
OVR-CNN
Open-Vocabulary Object Detection Using Captions
Code
GCPR'22
LocOv
Localized Vision-Language Matching for Open-vocabulary Object Detection
Code
arXiv'23
MMC-Det
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
N/A
NeurIPS'22
DetCLIP
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
N/A
CVPR'23
DetCLIPv2
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
N/A
CVPR'24
DetCLIPv3
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
N/A
AAAI'24
WSOVOD
Weakly Supervised Open-Vocabulary Object Detection
Code
CVPR'23
RO-ViT
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
N/A
ICCV'23
CFM-ViT
Contrastive Feature Masking Open-Vocabulary Vision Transformer
N/A
ICCV'23
DITO
Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection
Code
ICLR'23
VLDet
Learning Object-Language Alignments for Open-Vocabulary Object Detection
Code
ICCV'23
GOAT
Open-Vocabulary Object Detection With an Open Corpus
N/A
ECCV'22
OV-DETR
Open-Vocabulary DETR with Conditional MatchingCode
arXiv'23
Prompt-OVD
Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection
N/A
CVPR'23
CORA
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
N/A
ICCV'23
EdaDet
EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
Code
ICCV'21
MDETR
MDETR: Modulated Detection for End-to-End Multi-Modal Understanding
Code
ECCV'22
MAVL
Class-agnostic Object Detection with Multi-modal Transformer
Code
NeurIPS'24
MQ-Det
Multi-modal Queried Object Detection in the Wild
Code
CVPR'24
YOLO-World
Real-Time Open-Vocabulary Object Detection
Code
MM'23
SGDN
Open-Vocabulary Object Detection via Scene Graph Discovery
N/A
CVPR'24
USE
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
N/A
Venue
Paper Abbr
Paper Title
Project
CVPR'22
RegionCLIP
RegionCLIP: Region-based Language-Image Pretraining
Code
ECCV'22
VL-PLM
Exploiting Unlabeled Data with Vision and Language Models for Object Detection
Code
CVPR'22
GLIP
Grounded Language-Image Pre-training
Code
NeurIPS'22
GLIPv2
GLIPv2: Unifying Localization and VL
Understanding
Code
arXiv'23
Grounding-DINO
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Code
ECCV'22
PromptDet
PromptDet: Towards Open-vocabulary Detection using Uncurated Images
Code
arXiv'23
SAS-Det
Taming Self-Training for Open-Vocabulary Object Detection
Code
ECCV'22
PB-OVD
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
Code
AAAI'24
CLIM
CLIM: Contrastive Language-Image Mosaic for Region Representation
Code
arXiv'22
VTP-OVD
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
N/A
AAAI'24
ProxyDet
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection
Code
NeurIPS'23
CoDet
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Code
ECCV'22
Detic
Detecting Twenty-thousand Classes using Image-level Supervision
Code
ICML'23
MMC
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Code
arXiv'23
3Ways
Three ways to improve feature alignment for open vocabulary detectio
N/A
arXiv'23
PLAC
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection
N/A
arXiv'23
PCL
Open-Vocabulary Object Detection using Pseudo Caption Labels
N/A
NeurIPS'24
OWLv2
Scaling Open-Vocabulary Object Detection
Code
Venue
Paper Abbr
Paper Title
Project
ICLR'22
ViLD
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Code
ICDMW'22
ZSD-YOLO
Zero-shot Object Detection Through Vision-Language Embedding Alignment
Code
WACV'24
LP-OVOD
LP-OVOD: Open-Vocabulary Object Detection by Linear Probing
Code
arXiv'23
EZSD
Efficient Feature Distillation for Zero-shot Annotation Object Detection
Code
AAAI'24
SIC-CADS
Simple Image-level Classification Improves Open-vocabulary Object Detection
Code
CVPR'23
BARON
Aligning Bag of Regions for Open-Vocabulary Object Detection
Code
CVPR'23
OADP
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Code
arXiv'23
GridCLIP
GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning
N/A
NeurIPS'22
RKDWTF
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
Code
ICCV'23
DK-DETR
Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection
Code
CVPR'22
HierKD
Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation
Code
CVPR'22
DetPro
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
Code
arXiv'23
CLIPSelf
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Code
CVPR'24
SAMP
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
N/A
IJCV'24
OV-DAR
OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
N/A
CVPR'24
LBP
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
N/A
Venue
Paper Abbr
Paper Title
Project
ECCV'22
OWL-ViT
Simple Open-Vocabulary Object Detection with Vision Transformers
Code
CVPR'23
UniDetector
Detecting Everything in the Open World: Towards Universal Object Detection
Code
ICLR'23
F-VLM
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Code
CVPR'23
ScaleDet
ScaleDet: A Scalable Multi-Dataset Object Detector
N/A
ICCV'23
OpenSeed
A Simple Framework for Open-Vocabulary Segmentation and Detection
Code
arXiv'23
DRR
What Makes Good Open-Vocabulary Detector: A Disassembling Perspective
N/A
arXiv'23
Sambor
Boosting Segment Anything Model Towards Open-Vocabulary Learning
Code
Open-Vocabulary Segmentation
Open-Vocabulary Semantic Segmentation
Venue
Paper Abbr
Paper Title
Project
ECCV'22
OpenSeg
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
N/A
arXiv'23
SLIC
SILC: Improving Vision Language Pretraining with Self-Distillation
N/A
CVPR'22
GroupViT
GroupViT: Semantic Segmentation Emerges from Text Supervision
Code
ECCV'22
ViL-Seg
Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding
N/A
ICML'23
SegCLIP
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
Code
CVPR'23
OVSegmentor
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Code
CVPR'23
PACL
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
N/A
CVPR'23
TCL
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Code
ECCV'22
SimSeg
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
Code
Venue
Paper Abbr
Paper Title
Project
ECCV'22
TTD
Open-Vocabulary Semantic Segmentation Using Test-Time Distillation
N/A
Venue
Paper Abbr
Paper Title
Project
arXiv'23
GKC
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
N/A
arXiv'23
SAM-CLIP
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
N/A
ICCV'23
ZeroSeg
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
Code
Venue
Paper Abbr
Paper Title
Project
ICLR'22
LSeg
Language-driven Semantic Segmentation
Code
CVPR'23
SAZS
Delving Into Shape-Aware Zero-Shot Semantic Segmentation
Code
MM'23
CEL
Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation
N/A
CVPR'22
ZegFormer
Decoupling Zero-Shot Semantic Segmentation
Code
NeurIPS'22
ReCo
ReCo: Retrieve and Co-segment for Zero-shot Transfer
Project
arXiv'23
SCAN
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
N/A
ECCV'22
ZSSeg
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
Code
ECCV'22
MaskCLIP
Extract Free Dense Labels from CLIP
Code
arXiv'23
CLIP-DINOiser
CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
Code
PRCV'23
MVP-SEG
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation
N/A
arXiv'23
OVDiff
Diffusion Models for Zero-Shot Open-Vocabulary Segmentation
Project
WACV'24
FOSSIL
FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval
N/A
NeurIPS'24
POMP
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Code
NeurIPS'24
AttrSeg
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation
N/A
arXiv'23
PnP-OVSS
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Code
arXiv'23
TagAlign
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Project
arXiv'23
SelfSeg
Auto-Vocabulary Semantic Segmentation
N/A
CVPR'22
DenseCLIP
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Code
CVPR'23
OVSeg
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Code
arXiv'23
CAT-Seg
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Code
arXiv'23
SED
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Code
NeurIPS'23
MAFT
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
Code
arXiv'23
TagCLIP
TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation
N/A
CVPR'23
ZegCLIP
ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation
Code
CVPR'22
CLIPSeg
Image Segmentation Using Text and Image Prompts
Code
CVPR'23
SAN
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Code
arXiv'23
CLIP Surgery
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
Code
arXiv'23
CaR
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Project
arXiv'24
Cascade-CLIP
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Code
arXiv'24
OpenDAS
OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation
Project
arXiv'24
H-CLIP
Parameter-efficient Fine-tuning in Hyperspherical
Space for Open-vocabulary Semantic Segmentation
N/A
Open-Vocabulary Instance Segmentation
Venue
Paper Abbr
Paper Title
Project
ICCV'23
CGG
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Code
CVPR'23
D2Zero
Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation
Code
Venue
Paper Abbr
Paper Title
Project
CVPR'23
XPM
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
Code
CVPR'23
Mask-free OVIS
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
Code
arXiv'23
MosaicFusion
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
Code
Venue
Paper Abbr
Paper Title
Project
arXiv'24
OV-SAM
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Code
Open-Vocabulary Panoptic Segmentation
Venue
Paper Abbr
Paper Title
Project
arXiv'24
Uni-OVSeg
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Code
CVPR'23
X-Decoder
Generalized Decoding for Pixel, Image, and Language
Code
CVPR'24
APE
Learning active tactile perception through belief-space control
Code
Venue
Paper Abbr
Paper Title
Project
CVPR'23
PADing
Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation
Code
Venue
Paper Abbr
Paper Title
Project
NeurIPS'23
FC-CLIP
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Code
CVPR'23
FreeSeg
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
Project
arXiv'24
PosSAM
PosSAM: Panoptic Open-vocabulary Segment Anything
Project
ICCV'23
MasQCLIP
MasQCLIP for Open-Vocabulary Universal Image Segmentation
Project
CVPR'23
OMG-Seg
OMG-Seg: Is One Model Good Enough For All Segmentation?
Code
arXiv'23
Semantic-SAM
Semantic-SAM: Segment and Recognize Anything at Any Granularity
Code
CVPR'23
ODISE
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Code
NeurIPS'23
HIPIE
Hierarchical Open-vocabulary Universal Image Segmentation
Code
ICML'23
MaskCLIP
Open-Vocabulary Universal Image Segmentation with MaskCLIP
Project
ICCV'23
OPSNet
Open-vocabulary Panoptic Segmentation with Embedding Modulation
N/A
Open-Vocabulary 3D Scene Understanding
Open-Vocabulary 3D Detection
Venue
Paper Abbr
Paper Title
Project
CVPR'23
OV-3DET
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
Code
AAAI'24
FM-OV3D
FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection
Code
arXiv'23
OpenSight
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
N/A
NeurIPS'23
CoDA
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Code
arXiv'23
L3Det
Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection
N/A
Open-Vocabulary 3D Segmentation
Open-Vocabulary 3D Semantic Segmentation
Venue
Paper Abbr
Paper Title
Project
arXiv'21
SeCondPoint
Language-Level Semantics Conditioned 3D Point Cloud Segmentation
N/A
3DV'21
3DGenZ
Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds
Code
CVPR'23
OpenScene
OpenScene: 3D Scene Understanding with Open Vocabularies
Project
CVPR'23
PLA
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Code
arXiv'23
RegionPLC
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Project
Open-Vocabulary 3D Instance Segmentation
Venue
Paper Abbr
Paper Title
Project
NeurIPS'23
OpenMask3D
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Project
CVPR'24
MaskClustering
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Project
arXiv'23
OpenIns3D
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Project
arXiv'23
Open3DIS
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Project
arXiv'24
OpenSU3D
OpenSU3D: Open World 3D Scene Understanding using Foundation Models
Project
arXiv'24
Search3D
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
N/A
NeRF (Neural Radiance Field ) and 3DGS (3D Gaussian Splatting ) are hot topics for novel view synthesis in a holistic scene. They leverage multi-view consistency learning inherently imposed in the 3D model to help 2D image segmentation or directly perform 3D semantic segmentation over points (voxel or gaussian) in the scene.
Venue
Paper Abbr
Paper Title
Project
ICCV'21
Semantic-NeRF
In-Place Scene Labelling and Understanding With Implicit Scene Representation
Code
NeurIPS'22
FFD
Decomposing NeRF for Editing via Feature Field Distillation
Code
arXiv'23
Gaussian Grouping
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Code
ICCV'23
LERF
LERF: Language Embedded Radiance Fields
Project
NeurIPS'23
3DOVS
Weakly Supervised 3D Open-vocabulary Segmentation
Code
arXiv'24
OpenGaussian
OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
Project
arXiv'24
OV-NeRF
OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding
Code
arXiv'24
Semantic Gaussians
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting
Project
arXiv'24
FMGS
FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding
Project
CVPR'24
LEGaussians
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Code
CVPR'24
LangSplat
LangSplat: 3D Language Gaussian Splatting
Project
CVPR'24
Feature 3DGS
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Code
Open-Vocabulary Video Understanding
Open-Vocabulary Video Instance Segmentation
Venue
Paper Abbr
Paper Title
Project
ICCV'23
OV2Seg
Towards Open-Vocabulary Video Instance Segmentation
Code
arXiv'23
OpenVIS
OpenVIS: Open-vocabulary Video Instance Segmentation
Code
arXiv'24
BriVIS
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
Code