Skip to content

Open source impl of **MV-DUSt3R+ Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds** from Meta Reality Labs. Project page https://mv-dust3rp.github.io/

License

Notifications You must be signed in to change notification settings

facebookresearch/mvdust3r

Repository files navigation

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Paper | Website | Video | Data | Checkpoints

Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu,Rakesh Ranjan, Alexander Schwing, Zhicheng Yan

TL;DR

Multi-view Pose-free RGB-only 3D reconstruction in one step. Also supports for new view synthesis and relative pose estimation.

Please see more visual results and video on our website!

Update Logs

  • 2025-1-1: A gradio demo, all checkpoints, training/evaluation code and training/evaluation trajectories of ScanNet.
  • 2025-1-8: demo view selection improved, better quality for multiple rooms.

Installation

We only test this on a linux server and CUDA=12.4

  1. Clone MV-DUSt3R+
git clone https://github.com/facebookresearch/mvdust3r.git
cd mvdust3r
  1. Install the virtual environment under anaconda.
./install.sh

(version of pytorch and pytorch3d should be changed if you need other CUDA version.)

  1. (Optional for faster runtime) Compile the cuda kernels for RoPE (the same as DUSt3R and Croco)
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Checkpoints

Please download checkpoints here to the folder checkpoints before trying demo and evaluation.

Name Description
MVD.pth MV-DUSt3R
MVDp_s1.pth MV-DUSt3R+ trained on stage 1 (8 views)
MVDp_s2.pth MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views)
DUSt3R_ViTLarge_BaseDecoder_224_linear.pth the pretrained DUSt3R model. Our training is finetuned upon it

Gradio Demo

python demo.py --weights ./checkpoints/{CHECKPOINT}

You will see the UI like this:

The input can be multiple images (we do not support a single image) or a video. You will see the pointcloud along with predicted camera poses (3DGS visualization as future work).

The confidence threshold controls how many low confidence points should be filtered. The No. of video frames is only valid when the input is a video and controls how many frames are uniformly selected from the video for reconstruction.

Note that the demo's inference is slower than what claimed in the paper due to overheads of gradio and model loading. If you need faster runtime, please use our evaluation code.

some tips to improve quality especially for multiple rooms.

Data

We use five data for training and test: ScanNet, ScanNet++, HM3D, Gibson, MP3D. Please go to their website to sign contract, download and extract them in the folder data. Here are more instructions.

Currently we released the trajectories of ScanNet for evaluation. Please download it to the folder trajectories More trajectories for training and more data will be released later.

Evaluation

Here we have the following scripts for evaluation on ScanNet in the folder scripts:

Name Description
test_mvd.sh MV-DUSt3R
test_mvdp_stage1.sh MV-DUSt3R+ trained on stage 1 (8 views)
test_mvdp_stage2.sh MV-DUSt3R+ trained on stage 1 then stage 2 (mixed 4~12 views)

They should reproduce the paper's result on ScanNet (Tab. 2, 3, 4, S2, S3, and S5).

Training

We are still preparing for the releasing of trajectories of training data and code of trajectory generation. Here we also put training scripts in the folder scripts, which can provide more information about our training.

Name Description
train_mvd.sh MV-DUSt3R, loaded from DUSt3R to finetune
train_mvdp_stage1.sh MV-DUSt3R+ training on stage 1 (8 views), loaded from DUSt3R to finetune
train_mvdp_stage2.sh MV-DUSt3R+ trained on stage 1 finetuning on stage 2 (mixed 4~12 views)

See here for more hyperparameter explanations.

Citation

@article{tang2024mv,
  title={MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds},
  author={Tang, Zhenggang and Fan, Yuchen and Wang, Dilin and Xu, Hongyu and Ranjan, Rakesh and Schwing, Alexander and Yan, Zhicheng},
  journal={arXiv preprint arXiv:2412.06974},
  year={2024}
}

License

We use CC BY-NC 4.0

Acknowledgement

Many thanks to:

About

Open source impl of **MV-DUSt3R+ Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds** from Meta Reality Labs. Project page https://mv-dust3rp.github.io/

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages