Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 1.38 KB

fixres.md

File metadata and controls

26 lines (16 loc) · 1.38 KB

February 2020

tl;dr: Conventional imageNet classification has a train/test resolution discrepancy (domain shift).

Overall impression

Scale invariance/equivariance is not guaranteed in CNN (only shift invariance). The same model with different test time input will yield very different statistics. The distribution of activation changes at test time, the values are not in the range that the final cls layers were trained for.

In ImageNet training, conventional way is to use 10-time crop (center, four corners, and their mirrors) and test time is always central crop. This leads to a discrepancy of the statistics in training/test.

Simple solution: finetune last layer with test time scale and resolution, as the final stage of training.

Key ideas

Technical details

  • Larger test crops yields better results.
  • A similar work is MultiGrain, where the p-pooling is adjusted to match the train/test-time stats.
  • GeM (generalized mean pooling) p-pooling: a generalization of average pooling and max pooling

Notes

  • Questions and notes on how to improve/revise the current work