PLSC/task/ssl/mocov3 at master · GuoxiaWang/PLSC

Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
README.md	README.md
builder_moco.py	builder_moco.py
extract_weight.py	extract_weight.py
finetune.sh	finetune.sh
linprob.sh	linprob.sh
main_lincls.py	main_lincls.py
main_moco.py	main_moco.py
pretrain.sh	pretrain.sh
vit_moco.py	vit_moco.py

MoCo v3 for Self-supervised ResNet and ViT

PaddlePaddle reimplementation of facebookresearch's repository for the MoCo v3 model that was released with the paper An Empirical Study of Training Self-Supervised Vision Transformers.

Requirements

To enjoy some new features, PaddlePaddle 2.4 is required. For more installation tutorials refer to installation.md

Data Preparation

Prepare the data into the following directory:

dataset/
└── ILSVRC2012
    ├── train
    └── val

How to Self-supervised Pre-Training

With a batch size of 4096, ViT-Base is trained with 4 nodes:

# Note: Set the following environment variables 
# and then need to run the script on each node.
unset PADDLE_TRAINER_ENDPOINTS
export PADDLE_NNODES=4
export PADDLE_MASTER="xxx.xxx.xxx.xxx:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export FLAGS_stop_check_timeout=3600

IMAGENET_DIR=./dataset/ILSVRC2012/
python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    main_moco.py \
    -a moco_vit_base \
    --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
    --epochs=300 --warmup-epochs=40 \
    --stop-grad-conv1 --moco-m-cos --moco-t=.2 \
    ${IMAGENET_DIR}

How to Linear Classification

By default, we use momentum-SGD and a batch size of 1024 for linear classification on frozen features/weights. This can be done with a single 8-GPU node.

unset PADDLE_TRAINER_ENDPOINTS
export PADDLE_NNODES=1
export PADDLE_MASTER="127.0.0.1:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export FLAGS_stop_check_timeout=3600

IMAGENET_DIR=./dataset/ILSVRC2012/
python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    main_lincls.py \
    -a moco_vit_base \
    --lr=3 \
    --pretrained pretrained/checkpoint_0299.pd \
    ${IMAGENET_DIR}

How to End-to-End Fine-tuning

To perform end-to-end fine-tuning for ViT, use our script to convert the pre-trained ViT checkpoint to PLSC DeiT format:

python extract_weight.py \
  --input pretrained/checkpoint_0299.pd \
  --output pretrained/moco_vit_base.pdparams

Then run the training with the converted PLSC format checkpoint:

unset PADDLE_TRAINER_ENDPOINTS
export PADDLE_NNODES=1
export PADDLE_MASTER="127.0.0.1:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export FLAGS_stop_check_timeout=3600

python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    plsc-train \
    -c ./configs/DeiT_base_patch16_224_in1k_1n8c_dp_fp16o1.yaml \
    -o Global.epochs=150 \
    -o Global.pretrained_model=pretrained/moco_vit_base \
    -o Global.finetune=True

Models

ViT-Base

Model	Phase	Dataset	Configs	GPUs	Epochs	Top1 Acc	Checkpoint
moco_vit_base	pretrain	ImageNet2012	-	A100*N4C32	300	-	download
moco_vit_base	linear prob	ImageNet2012	-	A100*N1C8	90	0.7662
moco_vit_base	finetune	ImageNet2012	config	A100*N1C8	150	0.8288

Citations

@Article{chen2021mocov3,
  author  = {Xinlei Chen* and Saining Xie* and Kaiming He},
  title   = {An Empirical Study of Training Self-Supervised Vision Transformers},
  journal = {arXiv preprint arXiv:2104.02057},
  year    = {2021},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mocov3

mocov3

README.md

MoCo v3 for Self-supervised ResNet and ViT

Requirements

Data Preparation

How to Self-supervised Pre-Training

How to Linear Classification

How to End-to-End Fine-tuning

Models

ViT-Base

Citations

Files

mocov3

Directory actions

More options

Directory actions

More options

Latest commit

History

mocov3

Folders and files

parent directory

README.md

MoCo v3 for Self-supervised ResNet and ViT

Requirements

Data Preparation

How to Self-supervised Pre-Training

How to Linear Classification

How to End-to-End Fine-tuning

Models

ViT-Base

Citations