-
Notifications
You must be signed in to change notification settings - Fork 755
Open
Labels
enhancement ➕New feature or requestNew feature or request
Description
Request description
This op is meant to go inside the in_parallel
region of an scf.forall
or the like. It takes as input a tensor to gather from, a thread-level tile of indices to gather at, and a subgroup-level output tensor to gather in to (and returns no results - that subgroup-level tile is a shared out)
An example of the use of the op is as follows:
%0 = scf.forall (%flat_thread_id) shared_outs(%shared_dest_slice = %dest_slice) {
%m_id, %k_id = affine.delinearize_index %flat_subgroup_id into (mTile, kTile)
%indices_thread_slice = tensor.extract_slice %indices_slice [%m_id, %k_id] [m, k]
scf.forall.in_parallel {
iree_gpu.coasceled_gather_dma (%indices_thread_slice, %Idx) -> %shared_dest_slice
}
}
What component(s) does this issue relate to?
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancement ➕New feature or requestNew feature or request