[Question]: Does vertical_slash_sparse_attention supported to concatenate all batches into a single row for operation like flash_attn_2_cuda.varlen_fwd? #46

Amanda-Barbara · 2024-07-17T09:56:17Z

Describe the issue

Does vertical_slash_sparse_attention/block_sparse_attention/streaming_forward supported to concatenate all batches into a single row for operation like flash_attn_2_cuda.varlen_fwd?

iofu728 · 2024-07-18T01:44:41Z

Hi @Amanda-Barbara, thanks for your question.

Our three kernels supports multi-batch input, but currently, it does not support variable-length sequences. However, this is something that can be implemented.

Amanda-Barbara added the question Further information is requested label Jul 17, 2024

iofu728 assigned Starmys Jul 18, 2024

polarispw mentioned this issue Jul 23, 2024

Shape of slash mismatch when input batchsize > 1 #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Does vertical_slash_sparse_attention supported to concatenate all batches into a single row for operation like flash_attn_2_cuda.varlen_fwd? #46

[Question]: Does vertical_slash_sparse_attention supported to concatenate all batches into a single row for operation like flash_attn_2_cuda.varlen_fwd? #46

Amanda-Barbara commented Jul 17, 2024

iofu728 commented Jul 18, 2024

[Question]: Does vertical_slash_sparse_attention supported to concatenate all batches into a single row for operation like flash_attn_2_cuda.varlen_fwd? #46

[Question]: Does vertical_slash_sparse_attention supported to concatenate all batches into a single row for operation like flash_attn_2_cuda.varlen_fwd? #46

Comments

Amanda-Barbara commented Jul 17, 2024

Describe the issue

iofu728 commented Jul 18, 2024