Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

share weights in 6x6x8 grids #47

Open
yaxinshen opened this issue Dec 12, 2017 · 2 comments
Open

share weights in 6x6x8 grids #47

yaxinshen opened this issue Dec 12, 2017 · 2 comments

Comments

@yaxinshen
Copy link

In paper, each capsule in the [6 × 6] grid is sharing their
weights with each other and is your code miss this point?

@veshboo
Copy link

veshboo commented Dec 29, 2017

@yaxinshen +1, I tried to imagine how to share weights in that way. How about introducing for _ in range(1152/36): when tf.matmul involving W in routing function? Other idea not losing vectorization?

EDIT
Oh, I found tf.scan which is commented out was for the sharing weights. The author preferred tf.tile to tf.scan for performance!

EDIT2
This issue seems to be a duplication of previous issue,
questions about the weight maxtrix Wij between ui and vj and it makes me clear.

@tonyzhao6
Copy link

@yaxinshen,

Version 1 (i.e., the computationally expensive approach) does have 8 distinct set of weights for each 6 x 6 x 32 tensor. This is what the paper does.

Version 2 technically has 1 distinct set of weights for the entire 6 x 6 x 256 block and then reshapes the output to the correct shape.

I don't know if this actually matters in practice => The network will eventually learn the correct weights, whether it's 8 or 1 distinct sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants