Segment-Anything on 3D scenes meshes. This project is still on its early stage, I will keep updating it with more features and better documentation.
Watch the video of the entire segmentation progress on YouTube.
Segment Any Scene extends the capabilities of the Segment Anything Model, originally designed for 2D images, to work with 3D scene meshes.
The method can be summarized as:
- Rasterize the mesh triangles into each frame, assigning triangle face indices to frames.
- Create a list of trackers, one for each triangle, with attributes for voting.
- Update the trackers using SAM masks and triangle indices.
- Find the most common object ID within each mask, avoiding duplicates.
- Assign new object IDs if no previous ID is available for a mask.
- Incrementally update the voting system as new frames are processed.
In short, it is a voting system where each triangle in the mesh is voted by SAM masks, ensuring object IDs are assigned and updated across frames.
- Add more description about the details of the implementation
-
Create a new conda environment and activate it.
conda create -n samscene python=3.10 -y conda activate samscene
-
Install Semantic-SAM by following their instructions here.
pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113 pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git' pip install git+https://github.com/cocodataset/panopticapi.git pip install git+https://github.com/UX-Decoder/Semantic-SAM.git
Note: you may encounter error like below when running
import semantic_sam
:ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
You can compile
MultiScaleDeformableAttention
CUDA op with the following commands:git clone [email protected]:facebookresearch/Mask2Former.git cd Mask2Former/mask2former/modeling/pixel_decoder/ops sh make.sh
-
Install PyTorch3D, please refer to their installation guide for more details.
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
-
Install this package.
pip install -e .
- add a standalone semantic-sam inference package for easier installation
- add a installation script for easier installation all packages
Download multiscan example dataset from HuggingFace.
huggingface-cli download --resume-download ysmao/multiscan_example --local-dir ./data/multiscan/scene_00021_00 --local-dir-use-symlinks False --repo-type dataset
For downloading the entire multiscan dataset, please refer to Multiscan Dataset.
- Add support for ScanNet dataset
- Add support for Arbitrary 3D scene meshes with/without camera trajectories
Decode the MultiScan example data from mp4 to images, and jsonl camera trajectories to json files.
python examples/prepare_dataset.py output=outputs
The output will be saved to outputs/scene_00021_00
folder.
Download the pre-trained Semantic-SAM model here to ./models
folder.
python examples/prepare_segmentation.py
The output will be saved to outputs/scene_00021_00/semantic_sam
folder.
Note: This step will take a while to run, about an hour for 1916 frames (~2 sec/frame) on my single GPU device, as SAM auto sample on the entire image will take quite some time. Will tryout some more efficient inference method in the future, such as the methods listed here or more recent SAM2.
python examples/prepare_masks3d.py
The output will be saved to outputs/scene_00021_00/masks3d
folder, which includes:
face_to_object.npy
: the mapping from each face to the object ID.colored_mesh.ply
: the colored mesh with object IDs.
Many thanks to the great works that lay the foundation of this project:
We also encourage readers to check another great work for 3D scene segmentation with 2D SAM masks:
The main difference between this project and SegmentAnything3D is that we are working on 3D meshes, instead of working on point clouds, and we don't use the Felzenswalb and Huttenlocher's Graph Based Segmentation for post-processing.
If you find this project useful for your research, please consider citing:
@Misc{mao2024samscene,
title={SegmentAnyScene Github Page},
author={Yongsen Mao and Manolis Savva},
year={2024}
}