Parameter count formula for RWKV #176
Unanswered
Triang-jyed-driung
asked this question in
Q&A
Replies: 1 comment
-
I guess another 2D is from ln0 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
According to the formula at the arXiv article, the parameter count of the RWKV-4-world 0.1B model ($L=12, D=768, V=65536$ ) would be:
$$2VD + 13D^2 L + D(11L+4)$$
which yields
But the output is
So,$2VD$ comes from
However! Where is$4D$ ? these are only $2D$ :
Am I missing something?
Beta Was this translation helpful? Give feedback.
All reactions