This repository contains the original implementation of our paper:
Single-stage Semantic Segmentation from Image Labels
Nikita Araslanov and Stefan Roth
CVPR 2020. [pdf] [supp]
[arXiv]
Contact: Nikita Araslanov [email protected]
We attain competitive results by training a single network model for segmentation in a self-supervised fashion using only image-level annotations (one run of 20 epochs on Pascal VOC). |
-
Minimum requirements. This project was originally developed with Python 3.6, PyTorch 1.0 and CUDA 9.0. The training requires at least two Titan X GPUs (12Gb memory each).
-
Setup your Python environment. Please, clone the repository and install the dependencies. We recommend using Anaconda 3 distribution:
conda create -n <environment_name> --file requirements.txt
-
Download and link to the dataset. We train our model on the original Pascal VOC 2012 augmented with the SBD data (10K images in total). Download the data from:
Link to the data:
ln -s <your_path_to_voc> <project>/data/voc ln -s <your_path_to_sbd> <project>/data/sbd
Make sure that the first directory in
data/voc
isVOCdevkit
; the first directory indata/sbd
isbenchmark_RELEASE
. -
Download pre-trained models. Download the initial weights (pre-trained on ImageNet) for the backbones you are planning to use and place them into
<project>/models/weights/
.Backbone Initial Weights Comment WideResNet38 ilsvrc-cls_rna-a1_cls1000_ep-0001.pth (402M) Converted from mxnet VGG16 vgg16_20M.pth (79M) Converted from Caffe ResNet50 resnet50-19c8e357.pth PyTorch official ResNet101 resnet101-5d3b4d8f.pth PyTorch official
The directory launch
contains template bash scripts for training, inference and evaluation.
Training. For each run, you need to specify names of two variables, for example
EXP=baselines
RUN_ID=v01
Running bash ./launch/run_voc_resnet38.sh
will create a directory ./logs/pascal_voc/baselines/v01
with tensorboard events and will save snapshots into ./snapshots/pascal_voc/baselines/v01
.
Inference. To generate final masks, please, use the script ./launch/infer_val.sh
. You will need to specify:
EXP
andRUN_ID
you used for training;OUTPUT_DIR
the path where to save the masks;FILELIST
specifies the file to the data split;SNAPSHOT
specifies the model suffix in the formate000Xs0.000
. For example,e020Xs0.928
;- (optionally)
EXTRA_ARGS
specify additional arguments to the inference script.
Evaluation. To compute IoU of the masks, please, run ./launch/eval_seg.sh
. You will need to specify SAVE_DIR
that contains the masks and FILELIST
specifying the split for evaluation.
For testing, we provide our pre-trained WideResNet38 model:
Backbone | Val | Val (+ CRF) | Link |
---|---|---|---|
WideResNet38 | 59.7 | 62.7 | model_enc_e020Xs0.928.pth (527M) |
The also release the masks predicted by this model:
Split | IoU | IoU (+ CRF) | Link | Comment |
---|---|---|---|---|
train-clean (VOC+SBD) | 64.7 | 66.9 | train_results_clean.tgz (2.9G) | Reported IoU is for VOC |
val-clean | 63.4 | 65.3 | val_results_clean.tgz (423M) | |
val | 59.7 | 62.7 | val_results.tgz (427M) | |
test | 62.7 | 64.3 | test_results.tgz (368M) |
The suffix -clean
means we used ground-truth image-level labels to remove masks of the categories not present in the image.
These masks are commonly used as pseudo ground truth to train another segmentation model in fully supervised regime.
We thank PyTorch team, and Jiwoon Ahn for releasing his code that helped in the early stages of this project.
We hope that you find this work useful. If you would like to acknowledge us, please, use the following citation:
@InProceedings{Araslanov:2020:SSS,
author = {Araslanov, Nikita and Roth, Stefan},
title = {Single-Stage Semantic Segmentation From Image Labels},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
pages = {4253--4262}
year = {2020}
}