Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support Q from Encoder, KV from Decoder Cross-Attention Pattern #308

Open
MaxWolf-01 opened this issue Jan 24, 2025 · 3 comments

Comments

@MaxWolf-01
Copy link
Contributor

As far as I can tell, having KV come from self-attention and Q as context isn't supported currently, like this in MAT: Image

Is it possible to support this cleanly, or would it be better to just use the lower level attention blocks directly?

@lucidrains
Copy link
Owner

@MaxWolf-01 this is really interesting! i've never seen such a design

did they compare it to having the observations as keys / values ? wondering if this could really make that big of a difference

@MaxWolf-01
Copy link
Contributor Author

I double and tripple checked the code and paper (at first I thought it was a mistake in the figure, but it matches their code), but couldn't find any mentions of this architectural detail.
As I'm currently replicating the paper I would like to test this difference too. On a quick run on MPE envs, it didn't even matter that I swapped x and context in the current xtransformers implementation, but the environment might just be too simple + I still have some bugs to squash and implementation details to finish.

@lucidrains
Copy link
Owner

@MaxWolf-01 yea, it is strange for sure, unless they have some figure showing one way is the better than the other, prob best to stick with traditional design

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants