You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current SwiGLU implementation defines the projection matrice names, different from the original paper (https://arxiv.org/pdf/2002.05202).
We should stick to the W, V, W_2 names. The projection name c_proj in SwiGLU has the same name as a projection in GeLU already having lead to side-effects for weight initialisation (see comments in PR #168 )
The current SwiGLU implementation defines the projection matrice names, different from the original paper (https://arxiv.org/pdf/2002.05202).
We should stick to the
W, V, W_2
names. The projection namec_proj
in SwiGLU has the same name as a projection in GeLU already having lead to side-effects for weight initialisation (see comments in PR #168 )https://github.com/Modalities/modalities/blob/f810fcce978e2f4fc577edf337835b6f4afa8aa9/src/modalities/models/model.py#L30C6-L45C10
The text was updated successfully, but these errors were encountered: