Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep nets with stochastic depth #66

Open
christopher-beckham opened this issue May 18, 2016 · 2 comments
Open

Deep nets with stochastic depth #66

christopher-beckham opened this issue May 18, 2016 · 2 comments

Comments

@christopher-beckham
Copy link

I hope this is the appropriate venue to post this. I don't have an implementation yet, but maybe this ticket could encourage some work.

I am currently interested in this stochastic depth paper:

http://arxiv.org/pdf/1603.09382v2.pdf

I was going to have a go in implementing this, but I was a bit stumped as to how one would go about the identity transform that is mentioned in equation (2). As you can see, if the next layer and the current layer have different output shapes, you need to linearly project the output of the current layer so that it matches the dimensions of the output of the following layer. I'm not clear on how this is done and am afraid it's blatently obvious... is your "projection matrix" (or whatever it's called) a matrix (of some appropriate shape) consisting solely of ones? Furthermore, how would we do this for convolution networks?

It seems like that's the only roadblock for me -- the binomial mask is easy to do.

Let me know what you think.

PS: Interesting, I found a post asking on how to go about implementing this, but it seems to omit the identity transform:

https://www.reddit.com/r/MachineLearning/comments/4dr998/askreddit_has_anyone_implemented_resnets_with/

@f0k
Copy link
Member

f0k commented May 18, 2016

As you can see, if the next layer and the current layer have different output shapes, you need to linearly project the output of the current layer so that it matches the dimensions of the output of the following layer.

The identity transform in Eq. 2 is the same as in Eq. 1, which is just the shortcut connection of the ResNet. For shortcuts changing the spatial size (dashed arrows in http://arxiv.org/abs/1512.03385, Fig 3), there are two options, explained in the first paragraph of page 4 of http://arxiv.org/abs/1512.03385. An implementation of that paper is given in https://github.com/Lasagne/Recipes/blob/master/papers/deep_residual_learning/Deep_Residual_Learning_CIFAR-10.py, including these two options.

On page 8 of the stochastic depth paper, they mention that for blocks changing the number of filters and spatial dimension, they "replace the identity connections in these blocks by an average pooling layer followed by zero paddings to match the dimensions." This is neither of the two options in the ResNet paper, but it's easy enough to modify the existing Lasagne Recipe to do so.

If in doubt about what they did in the stochastic depth paper, refer to the source code at https://github.com/yueatsprograms/Stochastic_Depth.

/edit: If you manage to reproduce the results of the stochastic depth paper (CIFAR-10 will be the easiest target), we'd appreciate a PR to this repository.
For additional fun, note that there's also a second ResNet paper (https://arxiv.org/abs/1603.05027) which was done concurrently to the stochastic depth paper. It's possible that combining these two would yield even better results. A Lasagne implementation is here: https://github.com/FlorianMuellerklein/Identity-Mapping-ResNet-Lasagne.

@christopher-beckham
Copy link
Author

Ah, thank you! How did I not notice the implementation detail section in the paper... I'll take a crack at this and see what I can come up with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants