Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Softmax in routing algorithm incorrect:? #12

Open
geefer opened this issue May 17, 2018 · 3 comments
Open

Softmax in routing algorithm incorrect:? #12

geefer opened this issue May 17, 2018 · 3 comments

Comments

@geefer
Copy link

geefer commented May 17, 2018

Hi,
I think the softmax in the routing algorithm is being calculated over the wrong dimension.

Currently the code has:

        # Initialize routing logits to zero.
        b_ij = Variable(torch.zeros(1, self.in_channels, self.num_units, 1)).cuda()

        # Iterative routing.
        num_iterations = 3
        for iteration in range(num_iterations):
            # Convert routing logits to softmax.
            # (batch, features, num_units, 1, 1)
            c_ij = F.softmax(b_ij)

and since the dim parameter is not passed to the F.softmax call it will choose dim=1 and compute the softmax over the self.in_channels dimension (1152 here) whereas the softmax should be computed so that the c_ij between each input capsule and all the capsules in the next layer should sum to 1.

Thus the correct call should be:

           c_ij = F.softmax(b_ij, dim=2)
@InnovArul
Copy link

InnovArul commented Jun 13, 2018

Have you tried to implement and test with dim=2?

The implementation here (https://github.com/gram-ai/capsule-networks) is similar to the code in this repo. I.e., takes softmax over dim=1. But when I implemented with appropriate dim as you mentioned, the network is not learning

@geefer
Copy link
Author

geefer commented Jun 14, 2018

Hi,

Yes my implementation operates the softmax over the dimension that corresponds to the number of digit capsules (10) and works well it appears - giving a best test accuracy of 99.68% (not as good as reported in the paper but I have yet to see another implementation that matches their results.)

If you check the code in the naturomics tensorflow implementation that you reference, you will see that also applies the softmax over the dimension that has the number of digit caps (i.e. 10).

In the paper, equation (3) shows the softmax operation and it can be seen that the summation on the divisor is over k in b_ik and gives c_ij where c_ij is the coupling coefficient between capsule i and all the capsules in the layer above, which sum to one. Thus the softmax should be over the dimension of size 10.

See also the implementation by the author of the paper at https://github.com/Sarasra/models/blob/master/research/capsules/models/layers/layers.py line 110.

I am not sure why your network does not learn if you change the softmax but possibly there is another problem somewhere else?

@lcwy220
Copy link

lcwy220 commented Apr 1, 2020

Hi,
I think the softmax in the routing algorithm is being calculated over the wrong dimension.

Currently the code has:

        # Initialize routing logits to zero.
        b_ij = Variable(torch.zeros(1, self.in_channels, self.num_units, 1)).cuda()

        # Iterative routing.
        num_iterations = 3
        for iteration in range(num_iterations):
            # Convert routing logits to softmax.
            # (batch, features, num_units, 1, 1)
            c_ij = F.softmax(b_ij)

and since the dim parameter is not passed to the F.softmax call it will choose dim=1 and compute the softmax over the self.in_channels dimension (1152 here) whereas the softmax should be computed so that the c_ij between each input capsule and all the capsules in the next layer should sum to 1.

Thus the correct call should be:

           c_ij = F.softmax(b_ij, dim=2)

Hi, I also notice the softmax function problem, and I think you are right that the softmax function should be used in dim=2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants