This is the third place solution for the 2nd Task of the CCKS-2022 Digital Business Knowledge Map Assessment Competition.
📃Paper: "Multi-Modal Representation Learning with Self-Adaptive Thresholds for Commodity Verification"
- The training is only conducted on the official training set. Neither external training data nor test data are utilized.
- When dividing the validation set, we remove the items that appear in the training set to ensure that the training set and validation set do not overlap. The ratio of the final training set and validation set is about 5.6:1.
- We resize all images to 384 x 384.
- For text, except title, we picked the 10 most frequent pvs and sku:
["颜色分类", "货号", "型号", "品牌", "尺寸", "口味", "品名", "批准文号", "系列", "尺码"]
.
- For image, we use Swin Transformer Large pre-trained on ImageNet-22k.
- For text, we use RoBERTa Base pre-trained on EXT data.
- Both pre-trained models are from Hugging Face.
- We do not ensemble models and all results are from a single model.
GPU | NVIDIA A100-SXM4-80GB * 2 |
---|---|
Python | 3.8.8 |
PyTorch | 1.8.1 |
CUDA | 11.1 |
cuDNN | 8 |
Stage | Training time | GPU memory |
---|---|---|
Train | Full steps, 100k iters, ~23 hours Peak performance, 64k iters, ~15 hours |
~42GB |
Inference | ~7 minutes | ~16GB |
Train with FP16: FP16-version
- Add emojis
- Docker image
- Pre-trained models
- Logs
- Results
- Figure
- FP16
- Emoji
Model | Threshold | Val F1 / P / R |
Test A F1 / P / R |
Test B F1 / P / R |
Training Log | YAML |
---|---|---|---|---|---|---|
63_grad_clip_norm_0.5_net_64000.pth | 0 | 0.8834 0.8909 0.8761 |
0.8888 0.8762 0.9017 |
0.8909 0.8790 0.9031 |
log | yaml |
1.65 | - | - | 0.8936 0.8970 0.8902 |
|||
64_grad_clip_norm_0.1_net_60000.pth | 0 | 0.8753 0.9002 0.8517 |
0.8910 0.8901 0.8919 |
0.8933 0.8933 0.8933 |
log | yaml |
- We recommend to use our established docker image, which also includes our preprocessed data.
docker pull registry.cn-hangzhou.aliyuncs.com/ccks-2022/ccks-2022:v1.0
- Please install PyTorch according to About Runtime Environment first.
- Then install other dependencies by
pip
.
pip install -r requirements.txt
- Our docker image includes our preprocessed data, which is relatively smaller and easier to download.
docker pull registry.cn-hangzhou.aliyuncs.com/ccks-2022/ccks-2022:v1.0
export REPO_DIR=$PWD
mkdir /data
cd /data
bash $REPO_DIR/scripts/download_data.sh
cat item_train_images.zip.part* > item_train_images.zip
cd $REPO_DIR
bash scripts/resize_img.sh
bash scripts/prepare_data.sh
bash train.sh
Due to the file size limit of GitHub Release, we have to split the checkpoint. Please download 63_grad_clip_norm_0.5_net_64000.pth.partaa and 63_grad_clip_norm_0.5_net_64000.pth.partab to this repo and run
cat 63_grad_clip_norm_0.5_net_64000.pth.part* > 63_grad_clip_norm_0.5_net_64000.pth
bash predict.sh
If it helps your research or work, please consider citing our paper. The following is a BibTeX reference.
@misc{https://doi.org/10.48550/arxiv.2208.11064,
doi = {10.48550/ARXIV.2208.11064},
url = {https://arxiv.org/abs/2208.11064},
author = {Han Chenchen and Jia Heng},
keywords = {Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Multi-Modal Representation Learning with Self-Adaptive Thresholds for Commodity Verification},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}