Merge pull request #8 from airctic/mmdet-2.25.0

Update to mmdet 2.25.0
airctic · Jun 3, 2022 · a20b9b3 · a20b9b3
2 parents edceb5d + b78b844
commit a20b9b3
Show file tree

Hide file tree

Showing 206 changed files with 7,841 additions and 2,399 deletions.
diff --git a/configs/_base_/datasets/openimages_detection.py b/configs/_base_/datasets/openimages_detection.py
@@ -0,0 +1,65 @@
+# dataset settings
+dataset_type = 'OpenImagesDataset'
+data_root = 'data/OpenImages/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations', with_bbox=True, denorm_bbox=True),
+    dict(type='Resize', img_scale=(1024, 800), keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1024, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ],
+    ),
+]
+data = dict(
+    samples_per_gpu=2,
+    workers_per_gpu=0,  # workers_per_gpu > 0 may occur out of memory
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/oidv6-train-annotations-bbox.csv',
+        img_prefix=data_root + 'OpenImages/train/',
+        label_file=data_root + 'annotations/class-descriptions-boxable.csv',
+        hierarchy_file=data_root +
+        'annotations/bbox_labels_600_hierarchy.json',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/validation-annotations-bbox.csv',
+        img_prefix=data_root + 'OpenImages/validation/',
+        label_file=data_root + 'annotations/class-descriptions-boxable.csv',
+        hierarchy_file=data_root +
+        'annotations/bbox_labels_600_hierarchy.json',
+        meta_file=data_root + 'annotations/validation-image-metas.pkl',
+        image_level_ann_file=data_root +
+        'annotations/validation-annotations-human-imagelabels-boxable.csv',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/validation-annotations-bbox.csv',
+        img_prefix=data_root + 'OpenImages/validation/',
+        label_file=data_root + 'annotations/class-descriptions-boxable.csv',
+        hierarchy_file=data_root +
+        'annotations/bbox_labels_600_hierarchy.json',
+        meta_file=data_root + 'annotations/validation-image-metas.pkl',
+        image_level_ann_file=data_root +
+        'annotations/validation-annotations-human-imagelabels-boxable.csv',
+        pipeline=test_pipeline))
+evaluation = dict(interval=1, metric='mAP')
diff --git a/configs/_base_/default_runtime.py b/configs/_base_/default_runtime.py
@@ -14,3 +14,14 @@
 load_from = None
 resume_from = None
 workflow = [('train', 1)]
+
+# disable opencv multithreading to avoid system being overloaded
+opencv_num_threads = 0
+# set multi-process start method as `fork` to speed up the training
+mp_start_method = 'fork'
+
+# Default setting for scaling LR automatically
+#   - `enable` means enable scaling LR automatically
+#       or not by default.
+#   - `base_batch_size` = (8 GPUs) x (2 samples per GPU).
+auto_scale_lr = dict(enable=False, base_batch_size=16)
diff --git a/configs/_base_/models/faster_rcnn_r50_caffe_c4.py b/configs/_base_/models/faster_rcnn_r50_caffe_c4.py
@@ -42,7 +42,10 @@
             dilation=1,
             style='caffe',
             norm_cfg=norm_cfg,
-            norm_eval=True),
+            norm_eval=True,
+            init_cfg=dict(
+                type='Pretrained',
+                checkpoint='open-mmlab://detectron2/resnet50_caffe')),
         bbox_roi_extractor=dict(
             type='SingleRoIExtractor',
             roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
@@ -78,7 +81,7 @@
                 pos_fraction=0.5,
                 neg_pos_ub=-1,
                 add_gt_as_proposals=False),
-            allowed_border=0,
+            allowed_border=-1,
             pos_weight=-1,
             debug=False),
         rpn_proposal=dict(

diff --git a/configs/albu_example/README.md b/configs/albu_example/README.md
@@ -1,24 +1,26 @@
 # Albu Example
 
-## Abstract
+> [Albumentations: fast and flexible image augmentations](https://arxiv.org/abs/1809.06839)
+
+<!-- [OTHERS] -->
 
-<!-- [ABSTRACT] -->
+## Abstract
 
 Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations.
 
-<!-- [IMAGE] -->
 <div align=center>
 <img src="https://user-images.githubusercontent.com/40661020/143870703-74f3ea3f-ae23-4035-9856-746bc3f88464.png" height="400" />
 </div>
 
-<!-- [PAPER_TITLE: Albumentations: fast and flexible image augmentations] -->
-<!-- [PAPER_URL: https://arxiv.org/abs/1809.06839] -->
+## Results and Models
 
-## Citation
+| Backbone |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP |                                                         Config                                                         |                                                                                                                                                        Download                                                                                                                                                         |
+| :------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :--------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|   R-50   | pytorch |   1x    |   4.4    |      16.6      |  38.0  |  34.5   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/albu_example/mask_rcnn_r50_fpn_albu_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/albu_example/mask_rcnn_r50_fpn_albu_1x_coco/mask_rcnn_r50_fpn_albu_1x_coco_20200208-ab203bcd.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/albu_example/mask_rcnn_r50_fpn_albu_1x_coco/mask_rcnn_r50_fpn_albu_1x_coco_20200208_225520.log.json) |
 
-<!-- [OTHERS] -->
+## Citation
 
-```
+```latex
 @article{2018arXiv180906839B,
   author = {A. Buslaev, A. Parinov, E. Khvedchenya, V.~I. Iglovikov and A.~A. Kalinin},
   title = "{Albumentations: fast and flexible image augmentations}",
@@ -27,9 +29,3 @@ Data augmentation is a commonly used technique for increasing both the size and
   year = 2018
 }
 ```
-
-## Results and Models
-
-| Backbone  | Style   | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
-|:---------:|:-------:|:-------:|:--------:|:--------------:|:------:|:-------:|:------:|:--------:|
-| R-50      | pytorch | 1x      | 4.4      | 16.6           |  38.0  | 34.5    |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/albu_example/mask_rcnn_r50_fpn_albu_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/albu_example/mask_rcnn_r50_fpn_albu_1x_coco/mask_rcnn_r50_fpn_albu_1x_coco_20200208-ab203bcd.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/albu_example/mask_rcnn_r50_fpn_albu_1x_coco/mask_rcnn_r50_fpn_albu_1x_coco_20200208_225520.log.json) |
diff --git a/configs/atss/README.md b/configs/atss/README.md
@@ -1,22 +1,25 @@
-# Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
+# ATSS
 
-## Abstract
+> [Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection](https://arxiv.org/abs/1912.02424)
+
+<!-- [ALGORITHM] -->
 
-<!-- [ABSTRACT] -->
+## Abstract
 
 Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to 50.7% AP without introducing any overhead.
 
-<!-- [IMAGE] -->
 <div align=center>
 <img src="https://user-images.githubusercontent.com/40661020/143870776-c81168f5-e8b2-44ee-978b-509e4372c5c9.png"/>
 </div>
 
-<!-- [PAPER_TITLE: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection] -->
-<!-- [PAPER_URL: https://arxiv.org/abs/1912.02424] -->
+## Results and Models
 
-## Citation
+| Backbone |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP |                                                Config                                                 |                                                                                                                            Download                                                                                                                             |
+| :------: | :-----: | :-----: | :------: | :------------: | :----: | :---------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|   R-50   | pytorch |   1x    |   3.7    |      19.7      |  39.4  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/atss/atss_r50_fpn_1x_coco.py)  | [model](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r50_fpn_1x_coco/atss_r50_fpn_1x_coco_20200209-985f7bd0.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r50_fpn_1x_coco/atss_r50_fpn_1x_coco_20200209_102539.log.json) |
+|  R-101   | pytorch |   1x    |   5.6    |      12.3      |  41.5  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/atss/atss_r101_fpn_1x_coco.py) |   [model](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r101_fpn_1x_coco/atss_r101_fpn_1x_20200825-dfcadd6f.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r101_fpn_1x_coco/atss_r101_fpn_1x_20200825-dfcadd6f.log.json)   |
 
-<!-- [ALGORITHM] -->
+## Citation
 
 ```latex
 @article{zhang2019bridging,
@@ -26,10 +29,3 @@ Object detection has been dominated by anchor-based detectors for several years.
   year    =  {2019}
 }
 ```
-
-## Results and Models
-
-| Backbone  | Style   | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
-|:---------:|:-------:|:-------:|:--------:|:--------------:|:------:|:------:|:--------:|
-| R-50      | pytorch | 1x      | 3.7      | 19.7           |  39.4  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/atss/atss_r50_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r50_fpn_1x_coco/atss_r50_fpn_1x_coco_20200209-985f7bd0.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r50_fpn_1x_coco/atss_r50_fpn_1x_coco_20200209_102539.log.json) |
-| R-101     | pytorch | 1x      | 5.6      | 12.3           |  41.5  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/atss/atss_r101_fpn_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r101_fpn_1x_coco/atss_r101_fpn_1x_20200825-dfcadd6f.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/atss/atss_r101_fpn_1x_coco/atss_r101_fpn_1x_20200825-dfcadd6f.log.json) |
diff --git a/configs/autoassign/README.md b/configs/autoassign/README.md
@@ -1,39 +1,35 @@
-# AutoAssign: Differentiable Label Assignment for Dense Object Detection
+# AutoAssign
 
-## Abstract
+> [AutoAssign: Differentiable Label Assignment for Dense Object Detection](https://arxiv.org/abs/2007.03496)
+
+<!-- [ALGORITHM] -->
 
-<!-- [ABSTRACT] -->
+## Abstract
 
 Determining positive/negative samples for object detection is known as label assignment. Here we present an anchor-free detector named AutoAssign. It requires little human knowledge and achieves appearance-aware through a fully differentiable weighting mechanism. During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions. To adapt to object appearances, Confidence Weighting is proposed to adjust the specific assign strategy of each instance. The two weighting modules are then combined to generate positive and negative weights to adjust each location's confidence. Extensive experiments on the MS COCO show that our method steadily surpasses other best sampling strategies by large margins with various backbones. Moreover, our best model achieves 52.1% AP, outperforming all existing one-stage detectors. Besides, experiments on other datasets, e.g., PASCAL VOC, Objects365, and WiderFace, demonstrate the broad applicability of AutoAssign.
 
-<!-- [IMAGE] -->
 <div align=center>
 <img src="https://user-images.githubusercontent.com/40661020/143870875-33567e44-0584-4470-9a90-0df0fb6c1fe2.png"/>
 </div>
 
-<!-- [PAPER_TITLE: AutoAssign: Differentiable Label Assignment for Dense Object Detection] -->
-<!-- [PAPER_URL: https://arxiv.org/abs/2007.03496] -->
+## Results and Models
 
-## Citation
+| Backbone | Style | Lr schd | Mem (GB) | box AP |                                                        Config                                                        |                                                                                                                                                        Download                                                                                                                                                         |
+| :------: | :---: | :-----: | :------: | :----: | :------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+|   R-50   | caffe |   1x    |   4.08   |  40.4  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/autoassign/autoassign_r50_fpn_8x2_1x_coco.py) | [model](https://download.openmmlab.com/mmdetection/v2.0/autoassign/auto_assign_r50_fpn_1x_coco/auto_assign_r50_fpn_1x_coco_20210413_115540-5e17991f.pth) \| [log](https://download.openmmlab.com/mmdetection/v2.0/autoassign/auto_assign_r50_fpn_1x_coco/auto_assign_r50_fpn_1x_coco_20210413_115540-5e17991f.log.json) |
 
-<!-- [ALGORITHM] -->
+**Note**:
 
-```
+1. We find that the performance is unstable with 1x setting and may fluctuate by about 0.3 mAP. mAP 40.3 ~ 40.6 is acceptable. Such fluctuation can also be found in the original implementation.
+2. You can get a more stable results ~ mAP 40.6 with a schedule total 13 epoch, and learning rate is divided by 10 at 10th and 13th epoch.
+
+## Citation
+
+```latex
 @article{zhu2020autoassign,
   title={AutoAssign: Differentiable Label Assignment for Dense Object Detection},
   author={Zhu, Benjin and Wang, Jianfeng and Jiang, Zhengkai and Zong, Fuhang and Liu, Songtao and Li, Zeming and Sun, Jian},
   journal={arXiv preprint arXiv:2007.03496},
   year={2020}
 }
 ```
-
-## Results and Models
-
-| Backbone  | Style   | Lr schd | Mem (GB) |   box AP | Config | Download |
-|:---------:|:-------:|:-------:|:--------:|:------:|:------:|:--------:|
-| R-50     | caffe | 1x      | 4.08      |   40.4  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/autoassign/autoassign_r50_fpn_8x2_1x_coco.py)       |[model](https://download.openmmlab.com/mmdetection/v2.0/autoassign/auto_assign_r50_fpn_1x_coco/auto_assign_r50_fpn_1x_coco_20210413_115540-5e17991f.pth) &#124; [log](https://download.openmmlab.com/mmdetection/v2.0/autoassign/auto_assign_r50_fpn_1x_coco/auto_assign_r50_fpn_1x_coco_20210413_115540-5e17991f.log.json) |
-
-**Note**:
-
-1. We find that the performance is unstable with 1x setting and may fluctuate by about 0.3 mAP. mAP 40.3 ~ 40.6 is acceptable. Such fluctuation can also be found in the original implementation.
-2. You can get a more stable results ~ mAP 40.6 with a schedule total 13 epoch, and learning rate is divided by 10 at 10th and 13th epoch.