Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch ResNet50 Validation Accuracy #3

Open
ankmathur96 opened this issue Apr 11, 2019 · 8 comments
Open

PyTorch ResNet50 Validation Accuracy #3

ankmathur96 opened this issue Apr 11, 2019 · 8 comments

Comments

@ankmathur96
Copy link

ankmathur96 commented Apr 11, 2019

Hey there!

I came across your project from Jeremy Howard's Twitter. I think it's great to be benchmarking these numbers and keeping them in a single place!

I've tried running your script and ran into some problems that I was hoping you could help diagnose:
I ran python imagenet_pytorch_get_predictions.py -m resnet50 -g 0 -b 64 ~/imagenet/ and got

resnet50 completed: 100.00%
resnet50: acc1: 0.10%, acc5: 0.27%

I'm using Python 3.7 and PyTorch 1.0.1.post2 and didn't change any of your code except for making the argparse parameter for batch_size to be type=int.

I work pretty regularly with PyTorch and ResNet-50 and was surprised to see the ResNet-50 have only 75.02% validation accuracy. When I use the pretrained ResNet-50 using the code here, I get 76.138% top-1, 92.864% top-5 accuracy. Specifically, I run:

python main.py -a resnet50 -e -b 64 -j 8 --pretrained ~/imagenet/

I'm using CUDA 9.2 and CUDNN version 7.4.1 and running inference on a NVIDIA V100 on a Google Cloud instance using Ubuntu 16.04.

I'm curious what might be going wrong here and why our results are different - to start with, what version of CUDNN/CUDA did your results originate from?

@rwightman
Copy link

rwightman commented Apr 12, 2019

There are a lot of factors at play for a given result. Pytorch version, CUDA, PIL, etc. Even changing the image scaling between bicubic and bilinear can have a notable impact. I default to bicubic but bilinear works better for some models, likely based on what they were originally trained with.

I have noticed changes in accuracy for many models that I measured over a year ago to now (same weights).

My ResNet50 number with PyTorch 1.0.1.post2 and CUDA 10: Prec@1 75.868, Prec@5 92.872

My old ResNet50 numbers with PyTorch (0.2.0.post1) and CUDA 9.x?: Prec@1 76.130, Prec@5 92.862

A table with some of my old measurements here: https://github.com/rwightman/pytorch-dpn-pretrained

@rwightman
Copy link

ResNet50 on PyTorch 1.0.1.post2 and CUDA 10 w/ bilinear instead of bicubic, Prec@1 76.138, Prec@5 92.864 ... matches your numbers @ankmathur96

@ankmathur96
Copy link
Author

ankmathur96 commented Apr 12, 2019

Interesting! I should mention that I am using PIL version 5.3.0.post0.

I believe that bilinear is the default in PyTorch transforms (https://github.com/pytorch/vision/blob/master/torchvision/transforms/transforms.py#L182) and it seems this repository is using the default (https://github.com/cgnorthcutt/benchmarking-keras-pytorch/blob/master/imagenet_pytorch_get_predictions.py#L95). It's interesting to note the difference when using bicubic though.

I've also seen variation with different CUDA versions and other setup differences similar to what you're describing. I've seen, for example, a full percentage point drop when using OpenCV's implementation bilinear resizing, as compared to PIL. I was unaware, though, that there could be a full percentage point drop from such setup differences in this kind of more constrained setting (using PyTorch/CUDA/PIL). I found this especially worth highlighting since this repo's evaluation seems to be off by enough that densenet169 performs worse than ResNet-50 in my setup.

Edit: It's worth noting that many such differences due to subtle changes in preprocessing implementations can be eliminated (if need be for a production use case) by fine tuning with a low learning rate for several epochs

@rwightman
Copy link

@ankmathur96 yeah, I noticed when I was doing my benchmarking in the past that most of the resnet/densenet models in torchvision were better with the default bilinear, but a number of the ported models, Inception variants, DPN, etc were doing better with bicubic.

Fine-tuning can definitely help with these sorts of issues if/when it matters. It's also worth noting that many of the default pretrained weights can pretty easily be surpassed by around 1% or more using different training schedules and augmentation techniques.

FWIW my densenet169 numbers are very close to this repo and less than my ResNet50 numbers @1 but better @5.

I'm using Pillow-SIMD 5.3.0.post0

@cgnorthcutt
Copy link
Owner

@ankmathur96 @rwightman Thanks for finding this. I agree its likely a PyTorch version / cuda version incompatibility. Did either of you find a fix? Feel free to send a Pull Request on https://github.com/cgnorthcutt/benchmarking-keras-pytorch/blob/master/imagenet_pytorch_get_predictions.py

@ozabluda
Copy link

@ankmathur96

I get 76.138% top-1 accuracy.

@rwightman

My ResNet50 number with PyTorch 1.0.1.post2 and CUDA 10: Prec@1 75.868, Prec@5 92.872
My old ResNet50 numbers with PyTorch (0.2.0.post1) and CUDA 9.x?: Prec@1 76.130, Prec@5 92.862

the difference between 75.868% and 76.130% (0.262% difference) is not statistically significant with only 50,000 validation samples. Standard deviation of Binomial distribution with p=0.76 and n=50,000 is sqrt(.76*(1-.76)/50000)*100=0.19%

@ozabluda
Copy link

ozabluda commented Apr 19, 2019

@ankmathur96

a full percentage point drop when using OpenCV's implementation bilinear resizing, as compared to PIL.

See these 2 URLs for the differences in bilinear resizing across libraries, or even same library same function, different padding options:

https://stackoverflow.com/questions/18104609/interpolating-1-dimensional-array-using-opencv
https://stackoverflow.com/questions/43598373/opencv-resize-result-is-wrong

also see
https://hackernoon.com/how-tensorflows-tf-image-resize-stole-60-days-of-my-life-aba5eb093f35

TFv2 now follows Pillow, not OpenCV, if there is a difference between the two...
tensorflow/tensorflow#6720

...which doesn't seem the case
chainer/onnx-chainer#147

@ozabluda
Copy link

@calebrob6 Caleb Robinson | How to reproduce ImageNet validation results
http://calebrob.com/ml/imagenet/ilsvrc2012/2018/10/22/imagenet-benchmarking.html

For every image in the validation set we need to apply the following process:

  1. Load the image data in a floating point format.
  2. Resize the smallest side of the image to 256 pixels using bicubic interpolation over 4x4 pixel neighborhood (using OpenCVs resize method with the “INTER_CUBIC” interpolation flag). The larger side should be resized to maintain the original aspect ratio of the image.
  3. Crop the central 224x224 window from the resized image.
  4. Save the image in RGB format.
    [...]
    All the steps above are shown in the notebooks from the accompanying GitHub repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants