bugfix for the reconstruction part and the squash function and the margin&reconsturction loss function. #4

JaveyWang · 2017-11-08T12:00:39Z

#2
Get 98% accuracy in 1st epoch now.
Original code is so great and neat. thanks a lot :)
But I found some bugs here.

fixbug1: Use the correct capsule to reconstruct the input image rather than longest capsule.
fixbug2: Use the squash function to squash the right dimension(unit_size 8) in primary capsule, It have a great impact on the model accuracy. CapsuleLayer.squash(u, dim=1)(The digit part squash function seems not right, and the most weird thing is if I squash the right dim which is capsule size 16, model can't be train correctly.)
fixbug3: The margin loss function lack of a square term.
fixbug4: The reconstruction_loss function should minimize the sum of squared differences instead of the mean squared differences.(as the capsule paper said)

than the longest digit capsule.

timomernick

would you mind splitting things up into small patches? for example the missing squares in loss term, mean/sum stuff, i can accept immediately. but the reconstruction change and the transposing of unit size/count needs some more investigation and thought

timomernick · 2017-11-08T13:16:36Z

capsule_conv_layer.py

@@ -19,7 +19,7 @@ def __init__(self, in_channels, out_channels):
                               out_channels=out_channels,
                               kernel_size=9, # fixme constant
                               stride=1,
-                               bias=False)
+                               bias=True)


good find! i was experimenting with this myself locally but didn't want to commit either way. did you measure an accuracy improvement on MNIST with this?

Oh, I just think if there is no BN layer in network, the bias should be add up. This strategy is work in traditional convnet. But I don't know if it is useful too in capsnet, I will test it after this model can get better accuracy. It always stuck at about 99.4% acc in my local version.

timomernick · 2017-11-08T13:19:52Z

capsule_network.py

-        self.reconstruct0 = nn.Linear(num_output_units*output_unit_size, (reconstruction_size * 2) / 3)
-        self.reconstruct1 = nn.Linear((reconstruction_size * 2) / 3, (reconstruction_size * 3) / 2)
-        self.reconstruct2 = nn.Linear((reconstruction_size * 3) / 2, reconstruction_size)
+        # self.reconstruct0 = nn.Linear(num_output_units*output_unit_size, (reconstruction_size * 2) / 3)


are you sure this is correct? the paper (and other implementations i've seen) seem to say that the reconstruction input is all capsules, but with the inactive capsules masked out.

yes, you're right. Now I can get 98.9% in first epoch after fix few bug. But I still wonder why we softmax the wrong dim and squash the wrong dim in routing part can still get such a great result.
And the most weird thing is that if I make the dim correct this model would crash sometimes, and get bad accuracy... Hope you can fix it. Thx.

timomernick · 2017-11-08T13:21:53Z

capsule_network.py

        # Multiplied by a small number so it doesn't dominate the margin (class) loss.
        error = (output - images).view(output.size(0), -1)
        error = error**2
-        error = torch.mean(error, dim=1) * 0.0005
+        error = torch.sum(error, dim=1) * 0.0005


Thx. Your code is very neat, helpful to me.

timomernick · 2017-11-08T13:29:53Z

main.py

-num_primary_units = 8
-primary_unit_size = 32 * 6 * 6  # fixme get from conv2d
+num_primary_units = 32 * 6 * 6
+primary_unit_size = 8  # fixme get from conv2d


i think you're right, i did this part wrong. let me re-read the paper and consider your change carefully.

timomernick · 2017-11-08T13:31:46Z

capsule_network.py

+        # (batch_size, num_output_units, 1)
+        target = torch.unsqueeze(target, 2)
+        # (batch_size, output_unit_size, 1)
+        masked = torch.matmul(input.transpose(2,1), target)


hmm, i don't quite understand this part. what does 'masked' look like for a typical sample after this matmul? also see earlier comment, i thought the input was all capsules with all but one masked out -- otherwise the reconstruction would be confused by different classes of digits

First, there is no doubt to use the target to reconstruction. But my implementation maybe wrong after I rethink about this question. I thought the reconstruction network just need 16 values to reconstruct 10 images, but we know every digit capsule represent 1 entity(digit), so we should use all the 160 values to finish the reconstruction, and this also prove the fact how to reconstruct 2 overlapped MNIST at a time as the original paper said(Although this can be done by find the two max longest digit capsule and then reconstruct them two times and sum it, not enough neat as aforementioned). So yes, I think you're right. :)
just change the masked part to ' masked = input * target ' and change the number of first layer units to ' num_output_units*output_unit_size ' then it will work.

timomernick · 2017-11-08T16:51:24Z

@JaveyWang i took a couple of your small fixes and pushed them in d2e9aa8

i'm still thinking about your more substantial changes and will leave this PR open. it would help if you could resolve the conflicts in this PR and reduce it to only the proposed squash and capsule size/count changes. thanks!

ghost · 2018-03-11T07:23:59Z

@JaveyWang I am also confused with your fixbug2. When I correct the squash dim the trainging result is going to be bad. Do you have any progress with this?

JaveyWang added 4 commits November 7, 2017 12:01

# bugfix: Use correct digit capsule to reconstruct input image rather

d08b556

than the longest digit capsule.

# rename the no_routing part to make it more readable.

1a9e38f

#fixbug: squash the wrong dimention give rise to the bad result.

d0e8877

Merge commit fixbug branch into devolop.

7b4aabd

timomernick requested changes Nov 8, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix for the reconstruction part and the squash function and the margin&reconsturction loss function. #4

bugfix for the reconstruction part and the squash function and the margin&reconsturction loss function. #4

JaveyWang commented Nov 8, 2017 •

edited

Loading

timomernick left a comment

timomernick Nov 8, 2017

JaveyWang Nov 9, 2017

timomernick Nov 8, 2017 •

edited

Loading

JaveyWang Nov 9, 2017

timomernick Nov 8, 2017

JaveyWang Nov 9, 2017

timomernick Nov 8, 2017

timomernick Nov 8, 2017

JaveyWang Nov 9, 2017

timomernick commented Nov 8, 2017

ghost commented Mar 11, 2018

bugfix for the reconstruction part and the squash function and the margin&reconsturction loss function. #4

Are you sure you want to change the base?

bugfix for the reconstruction part and the squash function and the margin&reconsturction loss function. #4

Conversation

JaveyWang commented Nov 8, 2017 • edited Loading

timomernick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timomernick Nov 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timomernick commented Nov 8, 2017

ghost commented Mar 11, 2018

JaveyWang commented Nov 8, 2017 •

edited

Loading

timomernick Nov 8, 2017 •

edited

Loading