Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi
This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation.
Note:
† denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 512x512 | 42.06 | 43.36 | 35M | config | checkpoint |
SeMask-S FPN | SeMask Swin-S | 512x512 | 45.92 | 47.63 | 56M | config | checkpoint |
SeMask-B FPN | SeMask Swin-B† | 512x512 | 49.35 | 50.98 | 96M | config | checkpoint |
SeMask-L FPN | SeMask Swin-L† | 640x640 | 51.89 | 53.52 | 211M | config | checkpoint |
SeMask-L MaskFormer | SeMask Swin-L† | 640x640 | 54.75 | 56.15 | 219M | config | checkpoint |
SeMask-L Mask2Former | SeMask Swin-L† | 640x640 | 56.41 | 57.52 | 222M | config | checkpoint |
SeMask-L Mask2Former FaPN | SeMask Swin-L† | 640x640 | 56.88 | 58.25 | 227M | config | checkpoint |
SeMask-L Mask2Former MSFaPN | SeMask Swin-L† | 640x640 | 57.00 | 58.25 | 224M | config | checkpoint |
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 768x768 | 74.92 | 76.56 | 34M | config | checkpoint |
SeMask-S FPN | SeMask Swin-S | 768x768 | 77.13 | 79.14 | 56M | config | checkpoint |
SeMask-B FPN | SeMask Swin-B† | 768x768 | 77.70 | 79.73 | 96M | config | checkpoint |
SeMask-L FPN | SeMask Swin-L† | 768x768 | 78.53 | 80.39 | 211M | config | checkpoint |
SeMask-L Mask2Former | SeMask Swin-L† | 512x1024 | 83.97 | 84.98 | 222M | config | checkpoint |
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 512x512 | 37.53 | 38.88 | 35M | config | checkpoint |
SeMask-S FPN | SeMask Swin-S | 512x512 | 40.72 | 42.27 | 56M | config | checkpoint |
SeMask-B FPN | SeMask Swin-B† | 512x512 | 44.63 | 46.30 | 96M | config | checkpoint |
SeMask-L FPN | SeMask Swin-L† | 640x640 | 47.47 | 48.54 | 211M | config | checkpoint |
We provide the codebase with SeMask incorporated into various models. Please check the setup instructions inside the corresponding folders:
- SeMask-FPN: Setup Instructions
- SeMask-MaskFormer: Setup Instructions
- SeMask-Mask2Former: Setup Instructions
- SeMask-FaPN: Setup Instructions
@inproceedings{jain2023semask,
title={SeMask: Semantically Masked Transformers for Semantic Segmentation},
author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
year={2023},
booktitle={ICCV Workshops 2023},
}
Code is based heavily on the following repositories: Swin-Transformer-Semantic-Segmentation, Mask2Former, MaskFormer and FaPN-full.