Skip to content

New op: iree_gpu.coalesced_gather_dma #21784

@lialan

Description

@lialan

Request description

This op is meant to go inside the in_parallel region of an scf.forall or the like. It takes as input a tensor to gather from, a thread-level tile of indices to gather at, and a subgroup-level output tensor to gather in to (and returns no results - that subgroup-level tile is a shared out)

An example of the use of the op is as follows:

  %0 = scf.forall (%flat_thread_id) shared_outs(%shared_dest_slice = %dest_slice) {
    %m_id, %k_id = affine.delinearize_index %flat_subgroup_id into (mTile, kTile)
    %indices_thread_slice = tensor.extract_slice %indices_slice [%m_id, %k_id] [m, k]
    scf.forall.in_parallel {
      iree_gpu.coasceled_gather_dma (%indices_thread_slice, %Idx) -> %shared_dest_slice
    }
  }

What component(s) does this issue relate to?

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions