Attention mask unused?

```
        ass_mask=torch.ones(q_size2*q_size1,1,1,q_size0).cuda()  #[31*128,1,1,11]
        x, self.attn_asset = attention(ass_query, ass_key, ass_value, mask=None, 
                             dropout=self.dropout)   
```
Within MultiHeadedAttention the ass_mask is not being passed into the attention method here and appears as if it's unused. IIUC the attention mask is necessary to prevent look ahead bias in the attention mechanism and should be masking off future values when calculating attention.

If this mask is unused, what was it's intent? Where is attention being masked? And how should that be appied?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention mask unused? #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Attention mask unused? #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions