Using scatter_reduce instead of scatter and max #1364

lsrock1 · 2025-02-10T02:36:06Z

Thank you for sharing your outstanding work

Using scatter_reduce instead of scatter allows you to create a tensor of shape (bs, vocab_size) instead of (bs, length, vocab_size), which reduces memory usage. This means you can use a larger batch size. How about using scatter_reduce?

FlagEmbedding/research/BGE_M3/modeling.py

Line 106 in fcdf889

    
           sparse_embedding = torch.zeros(input_ids.size(0), input_ids.size(1), self.vocab_size,

https://pytorch.org/docs/stable/generated/torch.Tensor.scatter_reduce_.html#torch.Tensor.scatter_reduce_

sparse_embedding = torch.zeros(input_ids.size(0), self.vocab_size,
                                       dtype=token_weights.dtype,
                                       device=token_weights.device)
sparse_embedding = sparse_embedding.scatter_reduce(dim=-1, index=input_ids, src=token_weights.squeeze(-1), reduce='amax')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using scatter_reduce instead of scatter and max #1364

Using scatter_reduce instead of scatter and max #1364

lsrock1 commented Feb 10, 2025

Using scatter_reduce instead of scatter and max #1364

Using scatter_reduce instead of scatter and max #1364

Comments

lsrock1 commented Feb 10, 2025