-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 232f497
Showing
97 changed files
with
58,936 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
# AiATrack | ||
|
||
The official PyTorch implementation of our **ECCV 2022** paper: | ||
|
||
**AiATrack: Attention in Attention for Transformer Visual Tracking** | ||
|
||
[Shenyuan Gao](https://github.com/Little-Podi), [Chunluan Zhou](https://www.sites.google.com/view/chunluanzhou/), [Chao Ma](https://vision.sjtu.edu.cn/), [Xinggang Wang](https://xinggangw.info/), [Junsong Yuan](https://cse.buffalo.edu/~jsyuan/) | ||
|
||
[[PDF on arXiv](https://arxiv.org/abs/2207.09603)] [[Trained Models](todo)] [[Raw Results](todo)] | ||
|
||
## Highlight | ||
|
||
![](AiA.png) | ||
|
||
### :bookmark:Brief Introduction | ||
|
||
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose **an attention in attention module** (named AiA), which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose **a streamlined Transformer tracking framework** (dubbed AiATrack), by introducing efficient feature reuse and target-background embeddings to make full use of temporal references. Experiments show that our tracker achieves state-of-the-art performance on several tracking benchmarks while running at a real-time speed. | ||
|
||
### :bookmark:Strong Performance | ||
|
||
The proposed AiATrack sets state-of-the-art results on 8 widely used benchmarks. | ||
|
||
| Benchmark (Metrics) | AiATrack | | ||
| ---------------------------------- | --------------------- | | ||
| LaSOT (AUC / Norm P / P) | 69.0 / 79.4 / 73.8 | | ||
| LaSOT Extension (AUC / Norm P / P) | 47.7 / 55.6 / 55.4 | | ||
| TrackingNet (AUC / Norm P / P) | 82.7 / 87.8 / 80.4 | | ||
| GOT-10k (AO / SR 0.75 / SR 0.5) | 69.6 / 63.2 / 80.0 | | ||
| NfS30 (AUC) | 67.9 | | ||
| OTB100 (AUC) | 69.6 | | ||
| UAV123 (AUC) | 70.6 | | ||
| VOT2020 (EAO / A / R) | 0.530 / 0.764 / 0.827 | | ||
|
||
### :bookmark:Inference Speed | ||
|
||
The proposed AiATrack can run at 38fps (frames per second) on a single NVIDIA GeForce RTX 2080 Ti. | ||
|
||
### :bookmark:Training Cost | ||
|
||
It takes nearly two days to train our model on 8 NVIDIA GeForce RTX 2080 Ti (each of which has 11GB GPU memory). | ||
|
||
### :bookmark:Model Complexity | ||
|
||
The proposed AiATrack has 15.79M (million) model parameters. | ||
|
||
## Release | ||
|
||
**Trained Models** (containing the model we trained on four datasets and the model we trained on GOT-10k only) [[download zip file](todo)] | ||
|
||
**Raw Results** (containing raw tracking results on the datasets we benchmarked in the paper) [[download zip file](todo)] | ||
|
||
Download and unzip these two zip files under AiATrack project path, then both of them can be directly used by our code. | ||
|
||
## Let's Get Started | ||
|
||
- ### Environment | ||
|
||
Our experiments are conducted with Ubuntu 18.04 and CUDA 10.1. | ||
|
||
- ### Preparation | ||
|
||
- Clone our repository to your local project directory. | ||
|
||
- Download the training datasets ([LaSOT](http://vision.cs.stonybrook.edu/~lasot/download.html), [TrackingNet](https://github.com/SilvioGiancola/TrackingNet-devkit), [GOT-10k](http://got-10k.aitestunion.com/downloads), [COCO2017](https://cocodataset.org/#download)) and testing datasets ([NfS](http://ci2cv.net/nfs/index.html), [OTB](http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html), [UAV123](https://cemse.kaust.edu.sa/ivul/uav123)) to your disk, the organized directory should look like: | ||
|
||
``` | ||
--LaSOT/ | ||
|--airplane | ||
|... | ||
|--zebra | ||
--TrackingNet/ | ||
|--TRAIN_0 | ||
|... | ||
|--TEST | ||
--GOT10k/ | ||
|--test | ||
|--train | ||
|--val | ||
--COCO/ | ||
|--annotations | ||
|--images | ||
--NFS30/ | ||
|--anno | ||
|--sequences | ||
--OTB100/ | ||
|--Basketball | ||
|... | ||
|--Woman | ||
--UAV123/ | ||
|--anno | ||
|--data_seq | ||
``` | ||
- Edit the **PATH** to the proper absolute path in ```lib/test/evaluation/local.py``` and ```lib/train/adim/local.py```. | ||
- ### Installation | ||
We use conda to manage the environment. | ||
``` | ||
conda create --name aiatrack python=3.6 | ||
conda activate aiatrack | ||
sudo apt-get install ninja-build | ||
sudo apt-get install libturbojpeg | ||
bash install.sh | ||
``` | ||
- ### Training | ||
- Multiple GPU training by DDP (suppose you have 8 GPU) | ||
``` | ||
python tracking/train.py --mode multiple --nproc 8 | ||
``` | ||
- Single GPU debugging (too slow, not recommended for training) | ||
``` | ||
python tracking/train.py | ||
``` | ||
- For GOT-10k evaluation, remember to set ```--config baseline_got```. | ||
- ### Evaluation | ||
- Make sure you have prepared the trained model. | ||
- On large-scale benchmarks: | ||
- LaSOT | ||
``` | ||
python tracking/test.py --dataset lasot | ||
python tracking/test.py --dataset lasot_ext | ||
``` | ||
Then evaluate the raw results using the [official MATLAB toolkit](https://github.com/HengLan/LaSOT_Evaluation_Toolkit). | ||
- TrackingNet | ||
``` | ||
python tracking/test.py --dataset trackingnet | ||
python lib/test/utils/transform_trackingnet.py --tracker_name aiatrack --cfg_name baseline | ||
``` | ||
Then upload ```test/tracking_results/aiatrack/baseline/trackingnet_submit.zip``` to the [online evaluation server](https://eval.ai/web/challenges/challenge-page/1805/overview). | ||
- GOT-10k | ||
``` | ||
python tracking/test.py --param baseline_got --dataset got10k_test | ||
python lib/test/utils/transform_got10k.py --tracker_name aiatrack --cfg_name baseline_got | ||
``` | ||
Then upload ```test/tracking_results/aiatrack/baseline_got10k/got10k_submit.zip``` to the [online evaluation server](http://got-10k.aitestunion.com/submit_instructions). | ||
- On small-scale benchmarks: | ||
- NfS30, OTB100, UAV123 | ||
``` | ||
python tracking/test.py --dataset nfs | ||
python tracking/test.py --dataset otb | ||
python tracking/test.py --dataset uav | ||
python tracking/analysis_results.py | ||
``` | ||
The frames where the target object doesn't exist will be excluded during the analysis. | ||
- For multiple threads inference, just add ```--threads 40``` after ```tracking/test.py``` (suppose you want to use 40 threads in total). | ||
- To show the immediate prediction result during inference, modify ```settings.show_result = True``` in ```lib/test/evaluation/local.py``` (may have bugs if you try this on a remote sever). | ||
- Please refer to [STARK](https://github.com/researchmm/Stark/blob/main/external/AR/README.md) for VOT integration and [DETR](https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_attention.ipynb) for correlation map visualization. | ||
## Acknowledgement | ||
:heart::heart::heart:Our idea is implemented base on the following projects. We really appreciate their wonderful open-source work! | ||
- [STARK](https://github.com/researchmm/Stark) [[related paper](https://arxiv.org/abs/2103.17154)] | ||
- [PyTracking](https://github.com/visionml/pytracking) [[related paper](https://arxiv.org/abs/1811.07628)] | ||
- [DETR](https://github.com/facebookresearch/detr) [[related paper](https://arxiv.org/abs/2005.12872)] | ||
- [PreciseRoIPooling](https://github.com/vacancy/PreciseRoIPooling) [[related paper](https://arxiv.org/abs/1807.11590)] | ||
## Citation | ||
If any parts of our paper and codes help your research, please consider citing us and giving a star to our repository. | ||
## Contact | ||
If you have any questions or concerns, feel free to open issues or directly contact me through the ways on my GitHub homepage. Suggestions and collaborations are also highly welcomed! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
DATA: | ||
MAX_SAMPLE_INTERVAL: 200 | ||
MEAN: | ||
- 0.485 | ||
- 0.456 | ||
- 0.406 | ||
SEARCH: | ||
CENTER_JITTER: 4.5 | ||
FACTOR: 5.0 | ||
SCALE_JITTER: 0.5 | ||
SIZE: 320 | ||
STD: | ||
- 0.229 | ||
- 0.224 | ||
- 0.225 | ||
TEMPLATE: | ||
CENTER_JITTER: 0 | ||
SCALE_JITTER: 0 | ||
TRAIN: | ||
DATASETS_NAME: | ||
- LASOT | ||
- GOT10K_vot_train | ||
- COCO17 | ||
- TRACKINGNET | ||
DATASETS_RATIO: | ||
- 1 | ||
- 1 | ||
- 1 | ||
- 1 | ||
SAMPLE_PER_EPOCH: 60000 | ||
MODEL: | ||
BACKBONE: | ||
DILATION: false | ||
OUTPUT_LAYERS: | ||
- layer3 | ||
TYPE: resnet50 | ||
HEAD_TYPE: CORNER | ||
HIDDEN_DIM: 256 | ||
NUM_OBJECT_QUERIES: 1 | ||
POSITION_EMBEDDING: sine | ||
PREDICT_MASK: false | ||
TRANSFORMER: | ||
DEC_LAYERS: 1 | ||
DIM_FEEDFORWARD: 1024 | ||
DIVIDE_NORM: false | ||
DROPOUT: 0.1 | ||
ENC_LAYERS: 3 | ||
NHEADS: 4 | ||
PRE_NORM: false | ||
AIA: | ||
USE_AIA: true | ||
MATCH_DIM: 64 | ||
FEAT_SIZE: 400 | ||
TRAIN: | ||
BACKBONE_MULTIPLIER: 0.1 | ||
BATCH_SIZE: 15 | ||
DEEP_SUPERVISION: false | ||
EPOCH: 500 | ||
FREEZE_BACKBONE_BN: true | ||
FREEZE_LAYERS: | ||
- conv1 | ||
- layer1 | ||
GIOU_WEIGHT: 2.0 | ||
L1_WEIGHT: 5.0 | ||
IOU_WEIGHT: 2.0 | ||
GRAD_CLIP_NORM: 0.1 | ||
LR: 0.0001 | ||
LR_DROP_EPOCH: 400 | ||
NUM_WORKER: 16 | ||
OPTIMIZER: ADAMW | ||
PRINT_INTERVAL: 50 | ||
SCHEDULER: | ||
TYPE: step | ||
DECAY_RATE: 0.1 | ||
VAL_EPOCH_INTERVAL: 10 | ||
WEIGHT_DECAY: 0.0001 | ||
TEST: | ||
EPOCH: 500 | ||
SEARCH_FACTOR: 5.0 | ||
SEARCH_SIZE: 320 | ||
HYPER: | ||
DEFAULT: [100, 3, 0.7] | ||
LASOT: [100, 4, 0.8] | ||
LASOT_EXT: [100, 6, 0.8] | ||
TRACKINGNET: [100, 6, 0.7] | ||
NFS: [80, 3, 0.6] | ||
OTB: [100, 3, 0.7] | ||
UAV: [100, 3, 0.7] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
DATA: | ||
MAX_SAMPLE_INTERVAL: 200 | ||
MEAN: | ||
- 0.485 | ||
- 0.456 | ||
- 0.406 | ||
SEARCH: | ||
CENTER_JITTER: 4.5 | ||
FACTOR: 5.0 | ||
SCALE_JITTER: 0.5 | ||
SIZE: 320 | ||
STD: | ||
- 0.229 | ||
- 0.224 | ||
- 0.225 | ||
TEMPLATE: | ||
CENTER_JITTER: 0 | ||
SCALE_JITTER: 0 | ||
TRAIN: | ||
DATASETS_NAME: | ||
- GOT10K_train | ||
DATASETS_RATIO: | ||
- 1 | ||
SAMPLE_PER_EPOCH: 60000 | ||
MODEL: | ||
BACKBONE: | ||
DILATION: false | ||
OUTPUT_LAYERS: | ||
- layer3 | ||
TYPE: resnet50 | ||
HEAD_TYPE: CORNER | ||
HIDDEN_DIM: 256 | ||
NUM_OBJECT_QUERIES: 1 | ||
POSITION_EMBEDDING: sine | ||
PREDICT_MASK: false | ||
TRANSFORMER: | ||
DEC_LAYERS: 1 | ||
DIM_FEEDFORWARD: 1024 | ||
DIVIDE_NORM: false | ||
DROPOUT: 0.1 | ||
ENC_LAYERS: 3 | ||
NHEADS: 4 | ||
PRE_NORM: false | ||
AIA: | ||
USE_AIA: true | ||
MATCH_DIM: 64 | ||
FEAT_SIZE: 400 | ||
TRAIN: | ||
BACKBONE_MULTIPLIER: 0.1 | ||
BATCH_SIZE: 15 | ||
DEEP_SUPERVISION: false | ||
EPOCH: 500 | ||
FREEZE_BACKBONE_BN: true | ||
FREEZE_LAYERS: | ||
- conv1 | ||
- layer1 | ||
GIOU_WEIGHT: 2.0 | ||
L1_WEIGHT: 5.0 | ||
IOU_WEIGHT: 2.0 | ||
GRAD_CLIP_NORM: 0.1 | ||
LR: 0.0001 | ||
LR_DROP_EPOCH: 400 | ||
NUM_WORKER: 16 | ||
OPTIMIZER: ADAMW | ||
PRINT_INTERVAL: 50 | ||
SCHEDULER: | ||
TYPE: step | ||
DECAY_RATE: 0.1 | ||
VAL_EPOCH_INTERVAL: 10 | ||
WEIGHT_DECAY: 0.0001 | ||
TEST: | ||
EPOCH: 500 | ||
SEARCH_FACTOR: 5.0 | ||
SEARCH_SIZE: 320 | ||
HYPER: | ||
DEFAULT: [100, 3, 0.7] | ||
GOT10K_TEST: [80, 4, 0.7] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
*.o | ||
/_prroi_pooling |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#! /usr/bin/env python3 | ||
# -*- coding: utf-8 -*- | ||
# File : __init__.py | ||
# Author : Jiayuan Mao, Tete Xiao | ||
# Email : [email protected], [email protected] | ||
# Date : 07/13/2018 | ||
# | ||
# This file is part of PreciseRoIPooling. | ||
# Distributed under terms of the MIT license. | ||
# Copyright (c) 2017 Megvii Technology Limited. | ||
|
||
from .prroi_pool import * |
Oops, something went wrong.