Scene Instance Segmentation is to segment an image into object instances. The task is pixel-wise classification similar to scene parsing, but it requires the proposed algorithm to extract object instances from the image as well. The motivation of this task is two folds: 1) Push the research of semantic segmentation towards instance segmentation; 2) Let there be more synergy among object detection, semantic segmentation, and the scene parsing. The data share semantic categories with scene parsing task, but comes with object instance annotations for 100 categories. The evaluation metric is Average Precision (AP) over all the 100 semantic categories.
- We encourage all participants of this task to take part in the COCO instance segmentation challenge as well.
- Download the images here. Note that Images are the same for all the three tasks in Places Challenge 2017.
- Download the instance segmentation annotations here. After untarring the data file, the directory structure should be similar to the following,
the training images:
images/training/ADE_train_00000001.jpg
images/training/ADE_train_00000002.jpg
...
the validation images:
images/training/ADE_val_00000001.jpg
images/training/ADE_val_00000002.jpg
...
the testing images:
images/testing/ADE_test_00000001.jpg
...
the corresponding instance annotation masks for the training images and validation images:
annotations_instance/training/ADE_train_00000001.png
annotations_instance/training/ADE_train_00000002.png
...
annotations_instance/validation/ADE_val_00000001.png
annotations_instance/validation/ADE_val_00000002.png
...
In the instance annotation masks, the R(ed) channel encodes category ID, and the G(reen) channel encodes instance ID. Each object instance has a unique instance ID regardless of its category ID. In the dataset, all images have <256 object instances.
The submission file should be a single .json file containing all the predictions in RLE:
[{
"image_id" : int, # IMPORTANT: image_id should match file_name according to imgCatIds.json
"category_id" : int,
"segmentation" : RLE,
"score" : float,
}]
The performance of the instance segmentation algorithms will be evaluated by Average Precision (AP, or mAP), following COCO evaluation metrics. For each image, we take at most 255 top-scoring instance masks across all categories. For each instance mask prediction, we only count it when its IoU with ground truth is above a certain threshold. We take 10 IoU thresholds of 0.50:0.05:0.95 for evaluation. The final AP is averaged across 10 IoU thresholds and 100 categories.
You can refer to COCO evaluation page for more explanation: http://mscoco.org/dataset/#detections-eval
For everyone's reference, Mask-RCNN with ResNet-FPN-50 backbone achieves 20.0 mAP on the validation set.
- Only backbone is pretrained on ImageNet; RPN+Bbox+Mask are jointly trained; single-scale training and testing.
This was achieved by Hang Zhao when he was an intern at Facebook. We thank Kaiming He and Ross Girshick for the code pointers and suggestions.
To run the evaluation demo:
cd instancesegmentation/evaluation
- Convert annotations of validation set (*.png) into RLE format (.json), by
python convert_anns_to_json_dataset.py
- Install COCO API: https://github.com/pdollar/coco
- Prepare your results in the submission format (.json)
- python eval_main.py --dataset_json DATASET_JSON --preds_json PREDS_JSON