Skip to content

yan-roo/SceneClassifier

Repository files navigation

SceneClassifier

example output

This repository is for greyscale scene image classification from the in-class Kaggle challenge and NCTU Computer Vision HW.
The dataset is a little different:

  • Kaggle challenge: 3859 grey images with 13 categories (train:2819, test:1040)
  • CV HW: 1650 grey images with 15 categories (train:1500, test:150)

Previous work

  1. VGG16 (imagenet pretrain) + 2*FC layers & Dropout
  2. ResNet50 (imagenet pretrain) on Keras 2.2.4 Broken BatchNorm Freeze
  3. Image Size: 224, VGG16 preprocess_input + horizontal_flip (on-the-fly data augmentation)
  4. Train on spilt training set(loss some of training data)
  5. Ensemble prediction on Kaggle 0.899 accuracy

New method

  1. ResNet50 (imagenet pretrain) on TF2.2 classification_models
  2. CosineAnnealingScheduler
  3. Image Size: 256, + horizontal_flip + brightness + zoom + rotation (on-the-fly data augmentation)
  4. Train on whole training set
  5. Single model prediction on CV HW 0.98 accuracy

Experiment

EfficientNet

Model Batch_size Accuracy Extra
EfficientNetB0 64 0.92
EfficientNetB0 64 0.906 noisy-student pretrain
EfficientNetB1 64 0.926
EfficientNetB1 64 0.906 noisy-student pretrain
EfficientNetB4 16 0.92
EfficientNetB4 32 0.95
EfficientNetB4 32 0.89 Freeze 1st Block(Conv+BN+Activation)
EfficientNetB4 32 0.9 Freeze 1~2 Blocks(Conv+BN+Activation)
EfficientNetB5 16 0.926
EfficientNetB6 16 0.9 Freeze 1st Block(Conv+BN+Activation)
EfficientNetB6 16 0.926 Freeze 1~2 Blocks(Conv+BN+Activation)
EfficientNetB6 16 0.94 Freeze 1~3 Blocks(Conv+BN+Activation)
EfficientNetB6 16 0.85 Freeze 1~4 Blocks(Conv+BN+Activation)

ResNet50

Freeze first 12 layers (0~47 layers in the implment)

Model Batch_size Accuracy Extra
ResNet50 64 0.953 Generate New Data
ResNet50 64 0.966 on-the-fly
ResNet50 64 0.946 on-the-fly + constrast_pil
ResNet50 64 0.98 on-the-fly + rotation 5
ResNet50 64 0.96 on-the-fly + rotation 7
ResNet50 64 0.953 on-the-fly + rotation 10

Big Transfer (BiT)

  • BiT-M (pre-trained on ImageNet-21k), on-the-fly
Model Batch_size Accuracy Extra
R50x1 64 0.966
R50x3 64 0.96
R101x1 64 0.96
R101x3 64 0.953

Conclusion

  • Use ResNet50 with imagenet pretrain and freeze first 12 layers
  • Large batch size might be helpful
  • Use on-the-fly (random) instead of generate new data on data augmentation
  • Use Brightness, Zoom and Rotation instead of Equalize and RandomResizedCropped
  • Use TF2 if you want to freeze BN layers
  • Sparse labels might help on accuracy (Dense without softmax, class_mode='sparse', loss=SparseCategoricalCrossentropy)