Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix for the reconstruction part and the squash function and the margin&reconsturction loss function. #4

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

JaveyWang
Copy link

@JaveyWang JaveyWang commented Nov 8, 2017

#2
Get 98% accuracy in 1st epoch now.
Original code is so great and neat. thanks a lot :)
But I found some bugs here.

fixbug1: Use the correct capsule to reconstruct the input image rather than longest capsule.
fixbug2: Use the squash function to squash the right dimension(unit_size 8) in primary capsule, It have a great impact on the model accuracy. CapsuleLayer.squash(u, dim=1)(The digit part squash function seems not right, and the most weird thing is if I squash the right dim which is capsule size 16, model can't be train correctly.)
fixbug3: The margin loss function lack of a square term.
fixbug4: The reconstruction_loss function should minimize the sum of squared differences instead of the mean squared differences.(as the capsule paper said)

Copy link
Owner

@timomernick timomernick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind splitting things up into small patches? for example the missing squares in loss term, mean/sum stuff, i can accept immediately. but the reconstruction change and the transposing of unit size/count needs some more investigation and thought

@@ -19,7 +19,7 @@ def __init__(self, in_channels, out_channels):
out_channels=out_channels,
kernel_size=9, # fixme constant
stride=1,
bias=False)
bias=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good find! i was experimenting with this myself locally but didn't want to commit either way. did you measure an accuracy improvement on MNIST with this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just think if there is no BN layer in network, the bias should be add up. This strategy is work in traditional convnet. But I don't know if it is useful too in capsnet, I will test it after this model can get better accuracy. It always stuck at about 99.4% acc in my local version.

self.reconstruct0 = nn.Linear(num_output_units*output_unit_size, (reconstruction_size * 2) / 3)
self.reconstruct1 = nn.Linear((reconstruction_size * 2) / 3, (reconstruction_size * 3) / 2)
self.reconstruct2 = nn.Linear((reconstruction_size * 3) / 2, reconstruction_size)
# self.reconstruct0 = nn.Linear(num_output_units*output_unit_size, (reconstruction_size * 2) / 3)
Copy link
Owner

@timomernick timomernick Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this is correct? the paper (and other implementations i've seen) seem to say that the reconstruction input is all capsules, but with the inactive capsules masked out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. Now I can get 98.9% in first epoch after fix few bug. But I still wonder why we softmax the wrong dim and squash the wrong dim in routing part can still get such a great result.
And the most weird thing is that if I make the dim correct this model would crash sometimes, and get bad accuracy... Hope you can fix it. Thx.

# Multiplied by a small number so it doesn't dominate the margin (class) loss.
error = (output - images).view(output.size(0), -1)
error = error**2
error = torch.mean(error, dim=1) * 0.0005
error = torch.sum(error, dim=1) * 0.0005
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good find!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Your code is very neat, helpful to me.

num_primary_units = 8
primary_unit_size = 32 * 6 * 6 # fixme get from conv2d
num_primary_units = 32 * 6 * 6
primary_unit_size = 8 # fixme get from conv2d
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you're right, i did this part wrong. let me re-read the paper and consider your change carefully.

# (batch_size, num_output_units, 1)
target = torch.unsqueeze(target, 2)
# (batch_size, output_unit_size, 1)
masked = torch.matmul(input.transpose(2,1), target)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i don't quite understand this part. what does 'masked' look like for a typical sample after this matmul? also see earlier comment, i thought the input was all capsules with all but one masked out -- otherwise the reconstruction would be confused by different classes of digits

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, there is no doubt to use the target to reconstruction. But my implementation maybe wrong after I rethink about this question. I thought the reconstruction network just need 16 values to reconstruct 10 images, but we know every digit capsule represent 1 entity(digit), so we should use all the 160 values to finish the reconstruction, and this also prove the fact how to reconstruct 2 overlapped MNIST at a time as the original paper said(Although this can be done by find the two max longest digit capsule and then reconstruct them two times and sum it, not enough neat as aforementioned). So yes, I think you're right. :)
just change the masked part to ' masked = input * target ' and change the number of first layer units to ' num_output_units*output_unit_size ' then it will work.

@timomernick
Copy link
Owner

@JaveyWang i took a couple of your small fixes and pushed them in d2e9aa8

i'm still thinking about your more substantial changes and will leave this PR open. it would help if you could resolve the conflicts in this PR and reduce it to only the proposed squash and capsule size/count changes. thanks!

@ghost
Copy link

ghost commented Mar 11, 2018

@JaveyWang I am also confused with your fixbug2. When I correct the squash dim the trainging result is going to be bad. Do you have any progress with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants