Skip to content

PLASS-Lab/STF-Depth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STF-Depth: Semantic and Temporal Fusion Depth Estimation

STF-Depth stands for Semantic and Temporal Fusion Depth Estimation. It is a pipeline designed to improve the inaccuracies of single-image depth estimation.

This project leverages Temporal Fusion from the video domain and Semantic Fusion via segmentation to enhance inter-frame consistency and generate more realistic depth information.

Overview

overall structure

System Overview System Overview System Overview


Key Features

  • Multi-Model Pipeline: Utilizes state-of-the-art deep learning models to generate depth and segmentation maps for each video frame.
    • Depth Estimation: MiDaS (DPT-Large)
    • Semantic Segmentation: DeepLabV3
    • Panoptic Segmentation: OneFormer
  • Automated Processing: Automatically handles the entire process from frame extraction to model inference and result saving for specified input video folders.
  • Result Caching: Caches intermediate results (.pkl) for processed videos, enabling faster re-runs for visualization or further processing by skipping the inference step.
  • Visualization: Saves output results from each model as image files for intuitive inspection.
  • Evaluation: Includes tools for quantitative evaluation on standard datasets like NYU Depth V2 and KITTI.

Installation

All dependencies for this project are managed via a Conda virtual environment.

1. Create and Activate Conda Environment

Create and activate a Conda environment named stfdepth.

# Create environment from the provided yaml file
conda env create -f conda.yaml

# Activate the environment
conda activate stfdepth

Note: A conda.yaml file is provided in the repository.


Usage

1. Inference (run.py)

This script processes videos or images to estimate depth maps with semantic and temporal fusion.

  1. Prepare Input Data: Place your video files (.mp4, .avi, etc.) or image folders inside data/input/<dataset_name>/.

    • Default dataset name is vp_test. So, place files in data/input/vp_test/.
  2. Run Script:

    # Activate Conda environment
    conda activate stfdepth
    
    # Run inference
    python run.py

Command Line Arguments

  • --input_dir: Directory containing input datasets (default: data/input)
  • --output_dir: Directory to save final results (default: data/output)
  • --working_dir: Directory for intermediate files (frames, .pkl, visualizations) (default: data/working)
  • --datasets: List of dataset names to process (default: ["vp_test"])
  • --visualize: Flag to enable saving visualization results (default: False)

2. Evaluation (test.py)

This script evaluates the depth estimation performance against Ground Truth (GT).

  1. Prepare Data: Structure your data as follows:

    • Input Images: test/data/input/<dataset_name>/input/
    • Ground Truth: test/data/input/<dataset_name>/gt/

    See the Evaluation Datasets section below for details on preparing NYU and KITTI datasets.

  2. Run Script:

    python test.py --datasets nyu kitti

Command Line Arguments

  • --input_dir: Root directory for test data (default: ./test/data/input)
  • --output_dir: Directory to save evaluation results (default: ./test/data/output)
  • --datasets: List of datasets to evaluate (default: ["nyu", "kitti"])

Evaluation Datasets

This project uses standard datasets for quantitative evaluation. Helper scripts are provided in the test/ directory to convert raw datasets into the required format.

1. NYU Depth V2 (Indoor)

The NYU Depth V2 dataset is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.

  • Download: You can download the raw dataset from the official website.

  • Preparation:

    1. Download the raw dataset (scene folders containing INDEX.txt and raw images).
    2. Use the test/convert_nyu.py script to synchronize RGB and Depth frames and convert them.
    # Edit 'original_dir' and 'converted_dir' in test/convert_nyu.py before running
    python test/convert_nyu.py
    • This script synchronizes frames based on timestamps, generates .mp4 videos for input, and saves stacked depth maps (.npy) for ground truth.

2. KITTI (Outdoor)

The KITTI dataset is a popular benchmark for autonomous driving, including depth prediction tasks.

  • Download: Download the "depth completion" or "depth prediction" dataset from the KITTI Vision Benchmark Suite.

  • Preparation:

    1. Download the validation set (e.g., val_selection_cropped).
    2. Use the test/convert_kitti.py script to format the data.
    # Edit 'original_dir' and 'converted_dir' in test/convert_kitti.py before running
    python test/convert_kitti.py
    • This script matches images with their corresponding ground truth depth maps and converts them to .png (input) and .npy (GT) formats.

📂 Project Structure

.
├── data
│   ├── input/                # Input data directory
│   │   └── vp_test/          # Default dataset folder
│   ├── working/              # Intermediate results (frames, .pkl, visualizations)
│   └── output/               # Final results
├── test
│   └── data/                 # Data for evaluation (input images and GT)
├── run.py                    # Main inference script
├── test.py                   # Evaluation script
├── conda.yaml                # Conda environment configuration
└── README.md                 # Project documentation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages