Feature Request: Support Q from Encoder, KV from Decoder Cross-Attention Pattern #308

MaxWolf-01 · 2025-01-24T23:53:47Z

As far as I can tell, having KV come from self-attention and Q as context isn't supported currently, like this in MAT:

Is it possible to support this cleanly, or would it be better to just use the lower level attention blocks directly?

lucidrains · 2025-01-25T13:14:01Z

@MaxWolf-01 this is really interesting! i've never seen such a design

did they compare it to having the observations as keys / values ? wondering if this could really make that big of a difference

MaxWolf-01 · 2025-01-25T13:35:48Z

I double and tripple checked the code and paper (at first I thought it was a mistake in the figure, but it matches their code), but couldn't find any mentions of this architectural detail.
As I'm currently replicating the paper I would like to test this difference too. On a quick run on MPE envs, it didn't even matter that I swapped x and context in the current xtransformers implementation, but the environment might just be too simple + I still have some bugs to squash and implementation details to finish.

lucidrains · 2025-01-25T13:49:19Z

@MaxWolf-01 yea, it is strange for sure, unless they have some figure showing one way is the better than the other, prob best to stick with traditional design

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support Q from Encoder, KV from Decoder Cross-Attention Pattern #308

Feature Request: Support Q from Encoder, KV from Decoder Cross-Attention Pattern #308

MaxWolf-01 commented Jan 24, 2025

lucidrains commented Jan 25, 2025

MaxWolf-01 commented Jan 25, 2025

lucidrains commented Jan 25, 2025

Feature Request: Support Q from Encoder, KV from Decoder Cross-Attention Pattern #308

Feature Request: Support Q from Encoder, KV from Decoder Cross-Attention Pattern #308

Comments

MaxWolf-01 commented Jan 24, 2025

lucidrains commented Jan 25, 2025

MaxWolf-01 commented Jan 25, 2025

lucidrains commented Jan 25, 2025