adding missing dropout layers to VGG16 and VG19 #43

webeng · 2016-01-28T14:13:17Z

According to the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition" there are two dropout layers (ratio 0.5) for the first two fully-connected layers in VGG16 and VGG19.

I added the two dropout layers and it drastically reduces overfitting.

f0k · 2016-01-28T15:58:55Z

Good spot! I think this wasn't done (or noticed) because the main goal was to replicate the predictions, but when fine-tuning for another task, it shouldn't hurt to faithfully replicate the architecture originally used in training.
My only complaint would be that the existing layer names should probably stay the same as they're referenced in other notebooks. Maybe the dropout layers should be called fc6_dropout and fc7_dropout, respectively.

webeng · 2016-01-28T16:05:40Z

I just renamed the dropout layers based on your suggestions.

f0k · 2016-01-28T16:32:12Z

Thanks! I'll leave it to @ebenolson to have a look and/or merge.

ebenolson · 2016-01-28T23:01:27Z

Yeah, I was focused on inference/feature extraction so I thought it would be simpler to leave these out, similarly the googlenet model is missing the auxiliary classifier arms. I also didn't want to confuse people with the need to set deterministic=True in get_output.

It's true though that these would be important for fine-tuning so I'm fine with the change. However, I'm pretty sure the dropout layers should follow fc6 and fc7, not precede them?

webeng · 2016-01-29T09:08:45Z

You're right the dropout layers should follow not proceed the dense layers. I updated it.

I found Lasagne's recipes very useful as a base model for feature extraction and training new models. Perhaps we should add a comment at the top saying: "If you want to build your own model, you should use the dropout layers to reduce overfitting. Otherwise, comment them". Or add deterministic=True in get_output

What do you think?

f0k · 2016-01-29T17:35:39Z

However, I'm pretty sure the dropout layers should follow fc6 and fc7, not precede them?

Are you sure? Section 3.1 says:

The training was regularised by weight decay (the L2 penalty multiplier set to 5e-4) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5).

I'd say dropout comes before the fully-connected layers, not after. The biggest layer (in terms of inputs and params) is fc6, so that's where dropping inputs makes the most sense. And it sometimes helps to not drop any inputs of the final output layer, i.e., not place a dropout layer before fc8. So I think dff8363 was correct. Is there any training code to compare to?

Perhaps we should add a comment at the top saying: "If you want to build your own model, you should use the dropout layers to reduce overfitting. Otherwise, comment them".

Or maybe add a boolean parameter to build_model(), something like for_training=False. If set to True, it will include the dropout layers, and maybe eventually also the auxiliary classifiers for GoogLeNet.

ebenolson · 2016-01-30T00:18:37Z

I would agree with your interpretation, but the Caffe prototxt indicates dropout after fc6 and fc7.

I like the idea of an argument for build_model, @webeng would you mind adding that?

benanne · 2016-01-30T10:04:35Z

On the one hand I agree it makes the most sense to put the dropout before the layers with the most parameters, on the other hand it's kind of weird not to have one before the output layer as well in that case (where have you seen this before?).

Also, even if the first dropout layer is after the first dense layer, its parameters will still get regularized somewhat, because dropout affects all activations coming after it, and all gradients in the network (the effect is global). So it's not unlikely that the caffe interpretation is right.

webeng · 2016-01-30T15:04:56Z

Yes, ImageNet Pretrained Network (VGG_S).ipynb also has the dropout layers after the dense layers.

I think it's more inline with Lasagne to use deterministic=true when you call get_output as it will already ignore the dropout layers, don't you think? This is how it's done in the previous example.

ebenolson · 2016-01-31T22:10:43Z

Ok, I'm alright with the current version also. Perhaps when I get around to #18 we will revisit the idea of a for_training parameter. I will merge tonight if there are no further comments.

ebenolson · 2016-02-01T02:52:58Z

Merging, thank you for contributing.

adding missing dropout layers to VGG16 and VG19

f0k · 2016-02-01T11:57:11Z

where have you seen this before?

Phew, not sure... but I think I've seen instances where they don't dropout the inputs of the final classification layer. Probably that was with a lot fewer hidden units and a lot fewer classes, though (not an ImageNet model).

adding missing dropout layers

b521947

renamed dropout layers

dff8363

Lasagne#43 dropout layers after fc6 and fc7

98ae32f

ebenolson added a commit that referenced this pull request Feb 1, 2016

Merge pull request #43 from webeng/master

0ccd547

adding missing dropout layers to VGG16 and VG19

ebenolson merged commit 0ccd547 into Lasagne:master Feb 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding missing dropout layers to VGG16 and VG19 #43

adding missing dropout layers to VGG16 and VG19 #43

webeng commented Jan 28, 2016

f0k commented Jan 28, 2016

webeng commented Jan 28, 2016

f0k commented Jan 28, 2016

ebenolson commented Jan 28, 2016

webeng commented Jan 29, 2016

f0k commented Jan 29, 2016

ebenolson commented Jan 30, 2016

benanne commented Jan 30, 2016

webeng commented Jan 30, 2016

ebenolson commented Jan 31, 2016

ebenolson commented Feb 1, 2016

f0k commented Feb 1, 2016

adding missing dropout layers to VGG16 and VG19 #43

adding missing dropout layers to VGG16 and VG19 #43

Conversation

webeng commented Jan 28, 2016

f0k commented Jan 28, 2016

webeng commented Jan 28, 2016

f0k commented Jan 28, 2016

ebenolson commented Jan 28, 2016

webeng commented Jan 29, 2016

f0k commented Jan 29, 2016

ebenolson commented Jan 30, 2016

benanne commented Jan 30, 2016

webeng commented Jan 30, 2016

ebenolson commented Jan 31, 2016

ebenolson commented Feb 1, 2016

f0k commented Feb 1, 2016