Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

something wrong in your code #8

Open
selous123 opened this issue Nov 17, 2017 · 3 comments
Open

something wrong in your code #8

selous123 opened this issue Nov 17, 2017 · 3 comments

Comments

@selous123
Copy link

Firstly,thank you for your code

but as i try to read your source code.i find maybe there is errors in your squash function code
Problem1:
from your readme file,i read the tensorflow source code

Squashing function corresponding to Eq. 1
    Args:
        vector: A tensor with shape [batch_size, 1, num_caps, vec_len, 1] or [batch_size, num_caps, vec_len, 1].
    Returns:
        A tensor with the same shape as vector but squashed in 'vec_len' dimension.

in the comment,we squash in the vec_len dimension.But in your code

 def squash(s):
        # This is equation 1 from the paper.
        mag_sq = torch.sum(s**2, dim=2, keepdim=True)
        mag = torch.sqrt(mag_sq)
        s = (mag_sq / (1.0 + mag_sq)) * (s / mag)
        return s

because you have not wrote comment.so we just see here

# Flatten to (batch, unit, output).
u = u.view(x.size(0), self.num_units, -1)
# Return squashed outputs.
return CapsuleLayer.squash(u)

it is easy to know we should do squashing in dim=1 not 2

Problem2:

# (batch, features, in_units) -> (batch, features, num_units, in_units, 1)
x = torch.stack([x] * self.num_units, dim=2).unsqueeze(4)

# (batch, features, in_units, unit_size, num_units)
W = torch.cat([self.W] * batch_size, dim=0)

# Transform inputs by weight matrix.
# (batch_size, features, num_units, unit_size, 1)
u_hat = torch.matmul(W, x)

how can x with shape(batch, features, num_units, in_units, 1) and w with shape (batch, features, in_units, unit_size, num_units) do matmul operate...

i do not run your code successfully,because of data.so i do not know is it right.

Best Wishes!!

@selous123 selous123 changed the title something run in your code something wrong in your code Nov 17, 2017
@motokimura
Copy link

motokimura commented Nov 25, 2017

@timomernick, first of all, thank you for sharing your codes.

After playing around with it, I've found something (maybe) wrong in your squash() function which @selous123 mentioned as Problem1.

According to the paper, we should squash the vectors in vector_len dimension (PrimaryCaps: vector_len=8, DigitCaps: vector_len=16). However, in your implementation, the vectors are squashed always in capsule_num dimension (PrimaryCaps: capsule_num=1152, DigitCaps: capsule_num=10).

### Your implementation for PrimaryCaps in capsule_layer.py

# Flatten to (batch, unit, output).
u = u.view(x.size(0), self.num_units, -1)

# u: (batch, vector_len=8, capsule_num=1152)

# Return squashed outputs.
return CapsuleLayer.squash(u)
### Your implementation for DigitCaps in capsule_layer.py

# Apply routing (c_ij) to weighted inputs (u_hat).
# (batch_size, 1, num_units, unit_size, 1)
s_j = (c_ij * u_hat).sum(dim=1, keepdim=True)

# s_j: (batch_size, 1, capsule_num=10, vector_len=16, 1)

v_j = CapsuleLayer.squash(s_j)
### Your implementation for squashing in capsule_layer.py

def squash(s):
    # This is equation 1 from the paper
    mag_sq = torch.sum(s**2, dim=2, keepdim=True)
    mag = torch.sqrt(mag_sq)
    s = (mag_sq / (1.0 + mag_sq)) * (s / mag)
    return s

For the correct implementation of squash(), I guess dim=1 for PrimaryCaps as @selous123 mentioned, and also, dim=3 for DigitCaps.

When I ran the code with this modification in my environments, model convergence looks much faster and reconstructed images also look okay although I trained the model only up to 2 epochs.

I hope this helps you. Thank you again for sharing the codes!

@timomernick
Copy link
Owner

If either if you want to submit a pull request with a measurement of before & after accuracy I would love to take your fixes!

@danielhavir
Copy link

danielhavir commented Jan 25, 2018

Hello Timo,
Hats off for the implementation, you have the u_hat 5D multiplication done right unlike other implementations. It's well written and efficient code. However, @motokimura is right. The paper says:

"We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1."
and also:
"In total PrimaryCapsules has [32×6×6] capsule outputs (each output is an 8D vector)"
(therefore there are 1152 capsules in primary layer and 10 capsules in digits, also mentioned: "ui, i ∈ (1, 32×6×6) in PrimaryCapsules and vj, j ∈ (1, 10)")

The output vector of the primary capsules in your case is (batch_size, 8, 1152) and therefore the squashing non-linearity should be calculated along dim=1. Similarly for s_j which has shape (batch_size, 1, 10, 16, 1), the output vector has length 16 (quote: "The final Layer (DigitCaps) has one 16D capsule per digit class") and therefore should be "squashed" along dim=3.

At this point, the model is so complex that both options work and train well. However, the correct application of the squashing non-linearity is described in the paper.

Also: if you calculate the primary capsules (without routing) so that the output size is (batch_size, 1152, 8) such as:

u = [self.units[i](x) for i in range(self.num_units)]
u = torch.stack(u, dim=**-1**)
u = u.view(x.size(0), **-1, self.num_units**)

you could get rid of the of the x input transposing on line 83 in the routing algorithm in capsule_layer.py.

Thanks for your implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants