Skip to content

Latest commit

 

History

History
executable file
·
182 lines (113 loc) · 7.06 KB

README.md

File metadata and controls

executable file
·
182 lines (113 loc) · 7.06 KB

Unifying Flow, Stereo and Depth Estimation

Haofei Xu · Jing Zhang · Jianfei Cai · Hamid Rezatofighi · Fisher Yu · Dacheng Tao · Andreas Geiger

TPAMI 2023

Logo

A unified model for three motion and 3D perception tasks.

Logo

We achieve the 1st places on Sintel (clean), Middlebury (rms metric) and Argoverse benchmarks.

This project is developed based on our previous works:

Updates

  • 2025-01-04: Check out DepthSplat for a modern multi-view depth model, which leverages monocular depth (Depth Anything V2) to significantly improve the robustness of UniMatch.

  • 2025-01-04: The UniMatch depth model served as the foundational backbone of MVSplat (ECCV 2024, Oral) for sparse-view feed-forward 3DGS reconstruction.

Installation

Our code is developed based on pytorch 1.9.0, CUDA 10.2 and python 3.8. Higher version pytorch should also work well.

We recommend using conda for installation:

conda env create -f conda_environment.yml
conda activate unimatch

Alternatively, we also support installing with pip:

bash pip_install.sh

To use the depth models from DepthSplat, you need to create a new conda environment with higher version dependencies:

conda create -y -n depthsplat-depth python=3.10
conda activate depthsplat-depth
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
pip install tensorboard==2.9.1 einops opencv-python>=4.8.1.78 matplotlib

Model Zoo

A large number of pretrained models with different speed-accuracy trade-offs for flow, stereo and depth are available at MODEL_ZOO.md.

Check out DepthSplat's Model Zoo for better depth models.

We assume the downloaded weights are located under the pretrained directory.

Otherwise, you may need to change the corresponding paths in the scripts.

Demo

Given an image pair or a video sequence, our code supports generating prediction results of optical flow, disparity and depth.

Please refer to scripts/gmflow_demo.sh, scripts/gmstereo_demo.sh, scripts/gmdepth_demo.sh and scripts/depthsplat_depth_demo.sh for example usages.

kitti_demo.mp4

Datasets

The datasets used to train and evaluate our models for all three tasks are given in DATASETS.md

Evaluation

The evaluation scripts used to reproduce the numbers in our paper are given in scripts/gmflow_evaluate.sh, scripts/gmstereo_evaluate.sh and scripts/gmdepth_evaluate.sh.

For submission to KITTI, Sintel, Middlebury and ETH3D online test sets, you can run scripts/gmflow_submission.sh and scripts/gmstereo_submission.sh to generate the prediction results. The results can be submitted directly.

Training

All training scripts for different model variants on different datasets can be found in scripts/*_train.sh.

We support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with

tensorboard --logdir checkpoints

and then access http://localhost:6006 in your browser.

Citation

@article{xu2023unifying,
  title={Unifying Flow, Stereo and Depth Estimation},
  author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Yu, Fisher and Tao, Dacheng and Geiger, Andreas},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2023}
}

This work is a substantial extension of our previous conference paper GMFlow (CVPR 2022, Oral), please consider citing GMFlow as well if you found this work useful in your research.

@inproceedings{xu2022gmflow,
  title={GMFlow: Learning Optical Flow via Global Matching},
  author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Tao, Dacheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8121-8130},
  year={2022}
}

Please consider citing DepthSplat if DepthSplat's depth model is used in your research.

@article{xu2024depthsplat,
      title   = {DepthSplat: Connecting Gaussian Splatting and Depth},
      author  = {Xu, Haofei and Peng, Songyou and Wang, Fangjinhua and Blum, Hermann and Barath, Daniel and Geiger, Andreas and Pollefeys, Marc},
      journal = {arXiv preprint arXiv:2410.13862},
      year    = {2024}
    }

Acknowledgements

This project would not have been possible without relying on some awesome repos: RAFT, LoFTR, DETR, Swin, mmdetection and Detectron2. We thank the original authors for their excellent work.