-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Softmax in routing algorithm incorrect:? #12
Comments
Have you tried to implement and test with dim=2? The implementation here (https://github.com/gram-ai/capsule-networks) is similar to the code in this repo. I.e., takes softmax over dim=1. But when I implemented with appropriate dim as you mentioned, the network is not learning |
Hi, Yes my implementation operates the softmax over the dimension that corresponds to the number of digit capsules (10) and works well it appears - giving a best test accuracy of 99.68% (not as good as reported in the paper but I have yet to see another implementation that matches their results.) If you check the code in the naturomics tensorflow implementation that you reference, you will see that also applies the softmax over the dimension that has the number of digit caps (i.e. 10). In the paper, equation (3) shows the softmax operation and it can be seen that the summation on the divisor is over k in b_ik and gives c_ij where c_ij is the coupling coefficient between capsule i and all the capsules in the layer above, which sum to one. Thus the softmax should be over the dimension of size 10. See also the implementation by the author of the paper at https://github.com/Sarasra/models/blob/master/research/capsules/models/layers/layers.py line 110. I am not sure why your network does not learn if you change the softmax but possibly there is another problem somewhere else? |
Hi, I also notice the softmax function problem, and I think you are right that the softmax function should be used in dim=2. |
Hi,
I think the softmax in the routing algorithm is being calculated over the wrong dimension.
Currently the code has:
and since the dim parameter is not passed to the
F.softmax
call it will choose dim=1 and compute the softmax over theself.in_channels
dimension (1152 here) whereas the softmax should be computed so that the c_ij between each input capsule and all the capsules in the next layer should sum to 1.Thus the correct call should be:
The text was updated successfully, but these errors were encountered: