Skip to content

Yioutpi/Awesome-3D-Understanding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Awesome-3D-Understanding

Table of Contents

  • Awesome Papers
    • 3D Scene Understanding
    • Open-Vocabulary Indoor Scene Understanding
    • 3D Vision Grounding
    • 3D Multimodal LLMs
  • Awesome Datasets
    • Basic Indoor Scenes
    • Basic Outdor Scenes
    • Language-assitant Tasks
    • Datasets of Multimodal Instruction Tuning

Awesome Papers

Open-Vocabulary Indoor Scene Understanding

Title Venue Date Code Demo
Star
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
ECCV 2024-07-18 Github -
Star
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
ECCV 2024-07-13 Github -

3D Scene Understanding

Title Venue Date Code Demo
Star
A Unified Framework for 3D Scene Understanding
Arxiv 2024-07-03 Github -
Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding CVPR 2023 - -

3D Vision Grounding

Title Venue Date Code Demo
Star
Multi-branch Collaborative Learning Network for 3D Visual Grounding
ECCV 2024-07-10 Github -

3D Multimodal LLMs

Title Venue Date Code Demo
Star
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
ECCV 2024-04-31 Github -

Object-level

  • PointLLM: Empowering Large Language Models to Understand Point Clouds [Paper] [Homepage] [Github]
  • Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following [Paper] [Demo] [Github]

Scenes-level

3D With CLIP

  • ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding [Paper] [Github]
  • ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding [Paper] [Github]
  • OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding [Paper] [Github] [Homepage]
  • CLIP 2 : Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data [Paper] [Github]
  • CLIP Goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition [Paper] [Github]
  • CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training [Paper] [Github]
  • Uni3D: Exploring Unified 3D Representation at Scale [Paper] [Github]
  • MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation [Paper] [Github]

3D-Dataset

Object-level

  • OmniObject3D (CVPR 2023 Award Candidate): real-scanned 3D objects(6K), 190 classes [Paper] [Homepage]
  • Objaverse-XL: 3D Objects(10M+) [Paper] [Homepage] [Dataset]
  • Cap3D: 3D-Text pairs(660K) [Paper] [Download]
  • ULIP - Objaverse Triplets: 3D Point Clouds(800K)-Images(10M)-Language(100M) Triplets, [Download]
  • ULIP - ShapeNet Triplets: 3D Point Clouds(52.5K)-Images(3M)-Language(30M) Triplets,[Download]

Scene-level

  • ScanRefer: 3D object localization in RGB-D scans using natural language
  • SQA3D: 650 Scenes, 6.8K situations, 20.4k descriptions and 33.4k diverse reasoning questions for these situations[Paper] [Homepage]

Survey

  • Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation [Paper]
  • JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues [Paper]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published