In this directory, we aim to implement the AlexNet architecture for a Convolutional Neural Network (CNN) used for image classification, to be tested with the ImageNet dataset.
Description | Library | Notebook |
---|---|---|
Using Pylearn2/Keras LRN | Keras | |
Using TF.NN.LRN | Keras |
Description | Library | Notebook | |
---|---|---|---|
v1 | Basic impl | Keras |
Implementation notes for ImageNet v1:
- We haven't yet trained or tested this network (work in progress).
Our implementation is based on the following paper:
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NeurIPS 2012). Curran Associates Inc., Red Hook, NY, USA, 1097–1105.
The paper is available via:
See also:
Per the AlexNet paper, the Local Response Normalization layer computes the following function:
The paper authors chose
We provide 2 implementations of the Local Response Normalization layer:
- one in this directory,
local_response_normalization.py
as a very light wrapper aroundtensorflow.nn.local_response_normalization
layer which was written as a result of this paper - another one in
third_party/pylearn2/local_response_normalization.py
which is based on the Pylearn2 implementation and adapted in the Keras project
Note that the original AlexNet paper refers to inputs as
Here are several references which agree on this analysis of input shape:
-
Classic Networks lecture by Andrew Ng as part of the Deep Learning specialization
If you read the paper, the paper refers to
$224 \times 224 \times 3$ images, but if you look at the numbers, the numbers only make sense if they are$227 \times 227 \times 3$ . -
As a fun aside, if you read the actual paper it claims that the input images were 224x224, which is surely incorrect because (224 - 11)/4 + 1 is quite clearly not an integer. This has confused many people in the history of ConvNets and little is known about what happened. My own best guess is that Alex used zero-padding of 3 extra pixels that he does not mention in the paper.
-
One answer also quotes Andrew Ng's lecture in (1) above. Another answer demonstrates via calculation that the
$224 \times 224$ input shape would result in a non-integral output shape, and hence, must be an error, similarly to the Stanford CS231n class notes in (2) above.