You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I double and tripple checked the code and paper (at first I thought it was a mistake in the figure, but it matches their code), but couldn't find any mentions of this architectural detail.
As I'm currently replicating the paper I would like to test this difference too. On a quick run on MPE envs, it didn't even matter that I swapped x and context in the current xtransformers implementation, but the environment might just be too simple + I still have some bugs to squash and implementation details to finish.
@MaxWolf-01 yea, it is strange for sure, unless they have some figure showing one way is the better than the other, prob best to stick with traditional design
As far as I can tell, having KV come from self-attention and Q as context isn't supported currently, like this in MAT:
Is it possible to support this cleanly, or would it be better to just use the lower level attention blocks directly?
The text was updated successfully, but these errors were encountered: