This repository provides the baseline code of MVSS task, i.e., MVNet.
The code requires python>=3.7
, as well as pytorch>=1.9
and torchvision>=0.10
. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended. Then, please clone this repo.
-
Clone this repo.
$ git clone https://github.com/jiwei0921/MVSS-Baseline.git $ cd MVSS-Baseline-main/MVNet
-
Install dependencies.
$ conda create -n MVSS $ conda activate MVSS $ conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch $ pip install scipy $ pip install tqdm $ pip install visdom $ pip install matplotlib
First download the MVSeg
dataset and pretrained model ckpt. Then the model can be used in just a few adaptions to start training:
- Set your MVSeg dataset path in
datasets/mvss_dataset.py
and put the ckpt in filesave/*
. - Perform training from scratch, with
bash train.sh
in two stage. First for warm-up, second for MVNet.
or Perform training based on our warm-up ckpt, withbash train.sh
for second stage.
Meanwhile, the segmentation maps can be generated by loading the pretrained model ckpt, with:
- Set your MVSeg dataset path in
datasets/mvss_dataset.py
and put the ckpt in filesave/*
. - Specify testset name in
test.sh
, e.g.,--split-mode test
, or--split-mode test_night
. - Perform inference, with
bash test.sh
Here is the reference for MVSeg dataset color visualization.
[
(0, 0, 0), # 0: background(unlabeled)
(0, 0, 142), # 1: Car
(0, 60, 100), # 2: Bus
(0, 0, 230), # 3: Motorcycle
(119, 11, 32), # 4: Bicycle
(255, 0, 0), # 5: Pedestrian
(0, 139, 139), # 6: Motorcyclist
(255, 165, 150), # 7: Bicyclist
(192, 64, 0), # 8: Cart
(211, 211, 211), # 9: Bench
(100, 33, 128), # 10: Umbrella
(117, 79, 86), # 11: Box
(153, 153, 153), # 12: Pole
(190, 122, 222), # 13: Street_lamp
(250, 170, 30), # 14: Traffic_light
(220, 220, 0), # 15: Traffic_sign
(222, 142, 35), # 16: Car_stop
(205, 155, 155), # 17: Color_cone
(70, 130, 180), # 18: Sky
(128, 64, 128), # 19: Road
(244, 35, 232), # 20: Sidewalk
(0, 0, 70), # 21: Curb
(107, 142, 35), # 22: Vegetation
(152, 251, 152), # 23: Terrain
(70, 70, 70), # 24: Building
(110, 80, 100) # 25: Ground
]
@InProceedings{ji2023mvss,
title = {Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline},
author = {Ji, Wei and Li, Jingjing and Bian, Cheng and Zhou, Zongwei and Zhao, Jiaying and Yuille, Alan L. and Cheng, Li},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {1094-1104}
}
This repository was originally built from LMANet. It was modified and extended to support our multispectral video setting.