For details see MetaFormer is Actually What You Need for Vision (CVPR 2022 Oral).
Please note that we just simply follow the hyper-parameters of PVT which may not be the optimal ones for PoolFormer. Feel free to tune the hyper-parameters to get better performance.
Install MMDetection v2.19.0 from souce cocde,
or
pip install mmdet==2.19.0 --user
Apex (optional):
git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user
If you would like to disable apex, modify the type of runner as EpochBasedRunner
and comment out the following code block in the configuration files:
fp16 = None
optimizer_config = dict(
type="DistOptimizerHook",
update_interval=1,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
use_fp16=True,
)
Note: Since we write PoolFormer backbone code of detection and segmentation in a same file which requires to install both MMDetection v2.19.0 and MMSegmentation v0.19.0. Please continue to install MMSegmentation or modify the backone code.
Dockerfile_mmdetseg is the docker file that I use to set up the environment for detection and segmentation. You can also refer to it.
Prepare COCO according to the guidelines in MMDetection v2.19.0.
Method | Backbone | Pretrain | Lr schd | Aug | box AP | mask AP | Config | Download |
---|---|---|---|---|---|---|---|---|
RetinaNet | PoolFormer-S12 | ImageNet-1K | 1x | No | 36.2 | - | config | log & model |
RetinaNet | PoolFormer-S24 | ImageNet-1K | 1x | No | 38.9 | - | config | log & model |
RetinaNet | PoolFormer-S36 | ImageNet-1K | 1x | No | 39.5 | - | config | log & model |
Mask R-CNN | PoolFormer-S12 | ImageNet-1K | 1x | No | 37.3 | 34.6 | config | log & model |
Mask R-CNN | PoolFormer-S24 | ImageNet-1K | 1x | No | 40.1 | 37.0 | config | log & model |
Mask R-CNN | PoolFormer-S36 | ImageNet-1K | 1x | No | 41.0 | 37.7 | config | log & model |
All the models can also be downloaded by BaiDu Yun (password: esac).
To evaluate PoolFormer-S12 + RetinaNet on COCO val2017 on a single node with 8 GPUs run:
FORK_LAST3=1 dist_test.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox
To evaluate PoolFormer-S12 + Mask R-CNN on COCO val2017, run:
dist_test.sh configs/mask_rcnn_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox segm
To train PoolFormer-S12 + RetinaNet on COCO train2017 on a single node with 8 GPUs for 12 epochs run:
FORK_LAST3=1 dist_train.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py 8
To train PoolFormer-S12 + Mask R-CNN on COCO train2017:
dist_train.sh configs/mask_rcnn_poolformer_s12_fpn_1x_coco.py 8
@article{yu2021metaformer,
title={MetaFormer is Actually What You Need for Vision},
author={Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng},
journal={arXiv preprint arXiv:2111.11418},
year={2021}
}
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.