something wrong in your code #8

selous123 · 2017-11-17T06:46:56Z

Firstly,thank you for your code

but as i try to read your source code.i find maybe there is errors in your squash function code
Problem1:
from your readme file,i read the tensorflow source code

Squashing function corresponding to Eq. 1
    Args:
        vector: A tensor with shape [batch_size, 1, num_caps, vec_len, 1] or [batch_size, num_caps, vec_len, 1].
    Returns:
        A tensor with the same shape as vector but squashed in 'vec_len' dimension.

in the comment,we squash in the vec_len dimension.But in your code

 def squash(s):
        # This is equation 1 from the paper.
        mag_sq = torch.sum(s**2, dim=2, keepdim=True)
        mag = torch.sqrt(mag_sq)
        s = (mag_sq / (1.0 + mag_sq)) * (s / mag)
        return s

because you have not wrote comment.so we just see here

# Flatten to (batch, unit, output).
u = u.view(x.size(0), self.num_units, -1)
# Return squashed outputs.
return CapsuleLayer.squash(u)

it is easy to know we should do squashing in dim=1 not 2

Problem2:

# (batch, features, in_units) -> (batch, features, num_units, in_units, 1)
x = torch.stack([x] * self.num_units, dim=2).unsqueeze(4)

# (batch, features, in_units, unit_size, num_units)
W = torch.cat([self.W] * batch_size, dim=0)

# Transform inputs by weight matrix.
# (batch_size, features, num_units, unit_size, 1)
u_hat = torch.matmul(W, x)

how can x with shape(batch, features, num_units, in_units, 1) and w with shape (batch, features, in_units, unit_size, num_units) do matmul operate...

i do not run your code successfully,because of data.so i do not know is it right.

Best Wishes!!

The text was updated successfully, but these errors were encountered:

motokimura · 2017-11-25T04:54:53Z

@timomernick, first of all, thank you for sharing your codes.

After playing around with it, I've found something (maybe) wrong in your squash() function which @selous123 mentioned as Problem1.

According to the paper, we should squash the vectors in vector_len dimension (PrimaryCaps: vector_len=8, DigitCaps: vector_len=16). However, in your implementation, the vectors are squashed always in capsule_num dimension (PrimaryCaps: capsule_num=1152, DigitCaps: capsule_num=10).

### Your implementation for PrimaryCaps in capsule_layer.py

# Flatten to (batch, unit, output).
u = u.view(x.size(0), self.num_units, -1)

# u: (batch, vector_len=8, capsule_num=1152)

# Return squashed outputs.
return CapsuleLayer.squash(u)

### Your implementation for DigitCaps in capsule_layer.py

# Apply routing (c_ij) to weighted inputs (u_hat).
# (batch_size, 1, num_units, unit_size, 1)
s_j = (c_ij * u_hat).sum(dim=1, keepdim=True)

# s_j: (batch_size, 1, capsule_num=10, vector_len=16, 1)

v_j = CapsuleLayer.squash(s_j)

### Your implementation for squashing in capsule_layer.py

def squash(s):
    # This is equation 1 from the paper
    mag_sq = torch.sum(s**2, dim=2, keepdim=True)
    mag = torch.sqrt(mag_sq)
    s = (mag_sq / (1.0 + mag_sq)) * (s / mag)
    return s

For the correct implementation of squash(), I guess dim=1 for PrimaryCaps as @selous123 mentioned, and also, dim=3 for DigitCaps.

When I ran the code with this modification in my environments, model convergence looks much faster and reconstructed images also look okay although I trained the model only up to 2 epochs.

I hope this helps you. Thank you again for sharing the codes!

timomernick · 2018-01-24T21:33:58Z

If either if you want to submit a pull request with a measurement of before & after accuracy I would love to take your fixes!

danielhavir · 2018-01-25T20:32:08Z

Hello Timo,
Hats off for the implementation, you have the u_hat 5D multiplication done right unlike other implementations. It's well written and efficient code. However, @motokimura is right. The paper says:

"We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1."
and also:
"In total PrimaryCapsules has [32×6×6] capsule outputs (each output is an 8D vector)"
(therefore there are 1152 capsules in primary layer and 10 capsules in digits, also mentioned: "ui, i ∈ (1, 32×6×6) in PrimaryCapsules and vj, j ∈ (1, 10)")

The output vector of the primary capsules in your case is (batch_size, 8, 1152) and therefore the squashing non-linearity should be calculated along dim=1. Similarly for s_j which has shape (batch_size, 1, 10, 16, 1), the output vector has length 16 (quote: "The final Layer (DigitCaps) has one 16D capsule per digit class") and therefore should be "squashed" along dim=3.

At this point, the model is so complex that both options work and train well. However, the correct application of the squashing non-linearity is described in the paper.

Also: if you calculate the primary capsules (without routing) so that the output size is (batch_size, 1152, 8) such as:

u = [self.units[i](x) for i in range(self.num_units)]
u = torch.stack(u, dim=**-1**)
u = u.view(x.size(0), **-1, self.num_units**)

you could get rid of the of the x input transposing on line 83 in the routing algorithm in capsule_layer.py.

Thanks for your implementation!

selous123 changed the title ~~something run in your code~~ something wrong in your code Nov 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

something wrong in your code #8

something wrong in your code #8

selous123 commented Nov 17, 2017

motokimura commented Nov 25, 2017 •

edited

Loading

timomernick commented Jan 24, 2018

danielhavir commented Jan 25, 2018 •

edited

Loading

something wrong in your code #8

something wrong in your code #8

Comments

selous123 commented Nov 17, 2017

motokimura commented Nov 25, 2017 • edited Loading

timomernick commented Jan 24, 2018

danielhavir commented Jan 25, 2018 • edited Loading

motokimura commented Nov 25, 2017 •

edited

Loading

danielhavir commented Jan 25, 2018 •

edited

Loading