From 9e72ec473b885e661c32f6b915d84fb8009b97e7 Mon Sep 17 00:00:00 2001
From: Xin Li <7219519+xin-li-67@users.noreply.github.com>
Date: Wed, 28 Jun 2023 17:33:54 +0800
Subject: [PATCH] [MMSIG-176] Add GLIP demo to Inference.md (#10472)

---
 configs/glip/README.md              |  7 ++-
 docs/en/user_guides/inference.md    | 98 +++++++++++++++++++++++++++++
 docs/zh_cn/user_guides/inference.md | 98 +++++++++++++++++++++++++++++
 3 files changed, 201 insertions(+), 2 deletions(-)
diff --git a/configs/glip/README.md b/configs/glip/README.md
index b6dec71bdf1..5f7c8d3ccb7 100644
--- a/configs/glip/README.md
+++ b/configs/glip/README.md
@@ -27,9 +27,12 @@ mim install mmdet[multimodal]
 ```shell
 cd $MMDETROOT
 
-python demo/multimodal_demo.py demo/demo.jpg "bench . car . " \
+wget https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_a_mmdet-b3654169.pth
+
+python demo/image_demo.py demo/demo.jpg \
 configs/glip/glip_atss_swin-t_a_fpn_dyhead_pretrain_obj365.py \
-https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_a_mmdet-b3654169.pth
+glip_tiny_a_mmdet-b3654169.pth \
+--texts 'bench . car .'
 ```
 
 <div align=center>
diff --git a/docs/en/user_guides/inference.md b/docs/en/user_guides/inference.md
index 33257ed5ed4..8eeed39af44 100644
--- a/docs/en/user_guides/inference.md
+++ b/docs/en/user_guides/inference.md
@@ -186,3 +186,101 @@ python demo/video_gpuaccel_demo.py demo/demo.mp4 \
     checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth \
     --nvdecode --out result.mp4
 ```
+
+## Multi-modal algorithm inference demo and evaluation
+
+As multimodal vision algorithms continue to evolve, MMDetection has also supported such algorithms. This section demonstrates how to use the demo and eval scripts corresponding to multimodal algorithms using the GLIP algorithm and model as the example. Moreover, MMDetection integrated a [gradio_demo project](../../../projects/gradio_demo/), which allows developers to quickly play with all image input tasks in MMDetection on their local devices. Check the [document](../../../projects/gradio_demo/README.md) for more details.
+
+### Preparation
+
+Please first make sure that you have the correct dependencies installed:
+
+```shell
+# if source
+pip install -r requirements/multimodal.txt
+
+# if wheel
+mim install mmdet[multimodal]
+```
+
+MMDetection has already implemented GLIP algorithms and provided the weights, you can download directly from urls:
+
+```shell
+cd mmdetection
+wget https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_a_mmdet-b3654169.pth
+```
+
+### Inference
+
+Once the model is successfully downloaded, you can use the `demo/image_demo.py` script to run the inference.
+
+```shell
+python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts bench
+```
+
+Demo result will be similar to this:
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/234547841-266476c8-f987-4832-8642-34357be621c6.png" height="300"/>
+</div>
+
+If users would like to detect multiple targets, please declare them in the format of `xx . xx .` after the `--texts`.
+
+```shell
+python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts 'bench . car .'
+```
+
+And the result will be like this one:
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/234548156-ef9bbc2e-7605-4867-abe6-048b8578893d.png" height="300"/>
+</div>
+
+You can also use a sentence as the input prompt for the `--texts` field, for example:
+
+```shell
+python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts 'There are a lot of cars here.'
+```
+
+The result will be similar to this:
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/234548490-d2e0a16d-1aad-4708-aea0-c829634fabbd.png" height="300"/>
+</div>
+
+### Evaluation
+
+The GLIP implementation in MMDetection does not have any performance degradation, our benchmark is as follows:
+
+| Model                   | official mAP | mmdet mAP |
+| ----------------------- | :----------: | :-------: |
+| glip_A_Swin_T_O365.yaml |     42.9     |   43.0    |
+| glip_Swin_T_O365.yaml   |     44.9     |   44.9    |
+| glip_Swin_L.yaml        |     51.4     |   51.3    |
+
+Users can use the test script we provided to run evaluation as well. Here is a basic example:
+
+```shell
+# 1 gpu
+python tools/test.py configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365.py glip_tiny_a_mmdet-b3654169.pth
+
+# 8 GPU
+./tools/dist_test.sh configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365.py glip_tiny_a_mmdet-b3654169.pth 8
+```
+
+The result will be similar to this:
+
+```shell
+Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.428
+Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.594
+Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.466
+Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.300
+Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.477
+Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.534
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.634
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.634
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.634
+Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.473
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.690
+Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.789
+```
diff --git a/docs/zh_cn/user_guides/inference.md b/docs/zh_cn/user_guides/inference.md
index 1f504cc69e2..788d9eec2f2 100644
--- a/docs/zh_cn/user_guides/inference.md
+++ b/docs/zh_cn/user_guides/inference.md
@@ -185,3 +185,101 @@ python demo/video_gpuaccel_demo.py demo/demo.mp4 \
     checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth \
     --nvdecode --out result.mp4
 ```
+
+## 多模态算法的推理和验证
+
+随着多模态视觉算法的不断发展，MMDetection 也完成了对这类算法的支持。这一小节我们通过 GLIP 算法和模型来演示如何使用对应多模态算法的 demo 和 eval 脚本。同时 MMDetection 也在 projects 下完成了 [gradio_demo 项目](../../../projects/gradio_demo/)，用户可以参照[文档](../../../projects/gradio_demo/README.md)在本地快速体验 MMDetection 中支持的各类图片输入的任务。
+
+### 模型准备
+
+首先需要安装多模态依赖：
+
+```shell
+# if source
+pip install -r requirements/multimodal.txt
+
+# if wheel
+mim install mmdet[multimodal]
+```
+
+MMDetection 已经集成了 glip 算法和模型，可以直接使用链接下载使用：
+
+```shell
+cd mmdetection
+wget https://download.openmmlab.com/mmdetection/v3.0/glip/glip_tiny_a_mmdet-b3654169.pth
+```
+
+### 推理演示
+
+下载完成后我们就可以利用 `demo` 下的多模态推理脚本完成推理：
+
+```shell
+python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts bench
+```
+
+demo 效果如下图所示：
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/234547841-266476c8-f987-4832-8642-34357be621c6.png" height="300"/>
+</div>
+
+如果想进行多种类型的识别，需要使用 `xx . xx .` 的格式在 `--texts` 字段后声明目标类型:
+
+```shell
+python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts 'bench . car .'
+```
+
+结果如下图所示：
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/234548156-ef9bbc2e-7605-4867-abe6-048b8578893d.png" height="300"/>
+</div>
+
+推理脚本还支持输入一个句子作为 `--texts` 字段的输入：
+
+```shell
+python demo/image_demo.py demo/demo.jpg glip_tiny_a_mmdet-b3654169.pth --texts 'There are a lot of cars here.'
+```
+
+结果可以参考下图：
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/234548490-d2e0a16d-1aad-4708-aea0-c829634fabbd.png" height="300"/>
+</div>
+
+### 验证演示
+
+MMDetection 支持后的 GLIP 算法对比官方版本没有精度上的损失， benchmark 如下所示：
+
+| Model                   | official mAP | mmdet mAP |
+| ----------------------- | :----------: | :-------: |
+| glip_A_Swin_T_O365.yaml |     42.9     |   43.0    |
+| glip_Swin_T_O365.yaml   |     44.9     |   44.9    |
+| glip_Swin_L.yaml        |     51.4     |   51.3    |
+
+用户可以使用 `test.py` 脚本对模型精度进行验证，使用如下所示：
+
+```shell
+# 1 gpu
+python tools/test.py configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365.py glip_tiny_a_mmdet-b3654169.pth
+
+# 8 GPU
+./tools/dist_test.sh configs/glip/glip_atss_swin-t_fpn_dyhead_pretrain_obj365.py glip_tiny_a_mmdet-b3654169.pth 8
+```
+
+验证结果大致如下：
+
+```shell
+Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.428
+Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.594
+Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.466
+Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.300
+Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.477
+Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.534
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.634
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.634
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.634
+Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.473
+Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.690
+Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.789
+```