Today’s deep learning methods focus on how to design the most appropriate objective functions so that the pre diction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed. Existing methods ignore a fact that when input data undergoes layer-by-layer feature extrac tion and spatial transformation, large amount of informa tion will be lost. This paper will delve into the important is sues of data loss when data is transmitted through deep net works, namely information bottleneck and reversible func tions. We proposed the concept of programmable gradi ent information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. PGI can provide complete input information for the tar get task to calculate objective function, so that reliable gradient information can be obtained to update network weights. In addition, a new lightweight network architec ture– Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed. GELAN’s architecture confirms that PGI has gained su perior results on lightweight models. We verified the pro posed GELAN and PGI on MS COCO dataset based ob ject detection. The results show that GELAN only uses conventional convolution operators to achieve better pa rameter utilization than the state-of-the-art methods devel oped based on depth-wise convolution. PGI can be used for variety of models from lightweight to large. It can be used to obtain complete information, so that train-from scratch models can achieve better results than state-of-the art models pre-trained using large datasets.
mindspore | ascend driver | firmware | cann toolkit/kernel |
---|---|---|---|
2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
Please refer to the GETTING_STARTED in MindYOLO for details.
View More
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
# distributed training on multiple GPU/Ascend devices
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov9_log python train.py --config ./configs/yolov9/yolov9-t.yaml --device_target Ascend --is_parallel True
Similarly, you can train the model on multiple GPU devices with the above msrun command. Note: For more information about msrun configuration, please refer to here.
For detailed illustration of all hyper-parameters, please refer to config.py.
Note: As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
# standalone training on a CPU/GPU/Ascend device
python train.py --config ./configs/yolov9/yolov9-t.yaml --device_target Ascend
To validate the accuracy of the trained model, you can use test.py
and parse the checkpoint path with --weight
.
python test.py --config ./configs/yolov9/yolov9-t.yaml --device_target Ascend --weight /PATH/TO/WEIGHT.ckpt --ms_amp_level O2
Experiments are tested on Ascend 910* with mindspore 2.3.1 graph mode.
model name | scale | cards | batch size | resolution | jit level | graph compile | ms/step | img/s | map | recipe | weight |
---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv9 | T | 8 | 16 | 640x640 | O2 | 1316.4s | 350 | 365.71 | 37.3% | yaml | weights |
YOLOv9 | S | 8 | 16 | 640x640 | O2 | 1337.1s | 377 | 339.52 | 46.3% | yaml | weights |
YOLOv9 | M | 8 | 16 | 640x640 | O2 | 897.32s | 499 | 256.51 | 51.4% | yaml | weights |
YOLOv9 | C | 8 | 16 | 640x640 | O2 | 1017.9s | 627 | 204.15 | 52.6% | yaml | weights |
YOLOv9 | E | 8 | 16 | 640x640 | O2 | 1927.8s | 826 | 154.96 | 55.1% | yaml | weights |
- map: Accuracy reported on the validation set.
- We refer to the official YOLOV9 to reproduce the P5 series model.
[1] Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv preprint arXiv:2402.13616v2, 2024.