-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GlobalISel] Match G_SHUFFLE_VECTORs representing sub-vector extracts #224
base: aie-public
Are you sure you want to change the base?
[GlobalISel] Match G_SHUFFLE_VECTORs representing sub-vector extracts #224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good. Just a few nits. No comments on the tests yet because of the draft status. I have 2 questions:
- Is the goal to match only half vector extract patterns or Sub-Vector extracts in general? (quarter extracts being also relevant for AIE's API)
- How about the missing canonicalization combine for shufflevector in GlobalISel? I think this is needed if we want to support extracts from the second vector input with the current combine implementation in this PR.
const LLT Src1Ty = MRI.getType(Src1Reg); | ||
|
||
if (!DstTy.isVector() || !Src1Ty.isVector()) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? G_SHUFFLE_VECTOR
only operates on vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, G_SHUFFLE_VECTOR
can have scalar inputs. The opcode is a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this misses the following edge case:
%0:_(<2 x s32>) = COPY $x0
%1:_(<2 x s32>) = UNDEF
%2:_(s32) = G_SHUFFLE_VECTOR %0, %1, shufflemask(0)
In this case, you could unmerge the first array into the destination. Since the target is not a vector, this would skip it. How useful this is? Who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that case could even be matched to a G_EXTRACT_VECTOR_ELT
. It might not be very useful for AIE's API since element extracts would be implemented using extractelement
instead of shufflevector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a bit obscure and the overhead it would cause is minimal.
assert(MI.getOpcode() == TargetOpcode::G_SHUFFLE_VECTOR); | ||
const Register DstReg = MI.getOperand(0).getReg(); | ||
const Register Src1Reg = MI.getOperand(1).getReg(); | ||
ArrayRef<int> Mask = MI.getOperand(3).getShuffleMask(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ShuffleMask
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Konstantin, thanks for taking my design and generally unjunior/internifying. This looks very clean, great job. I am not able to run it a.t.m., but I will look at that whenever the PR is complete.
There are two small worries:
This check is really expensive, this replaces the basic design but makes it hard to add the optimizations that might be needed to merge upstream.
Having separate optimizer causes it to be missed by the G_EXTRACT->G_INSERT->G_BUILD
sequences that might be valid representations. Upstream doesn't, for sensible reasons, canonize those into shufflevectors so you need to make sure that this optimizer is on those sequences as well.
See MR linked in: #41 (comment)
def combine_shuffle_to_extract_vector : GICombineRule< | ||
(defs root:$root, build_fn_matchinfo:$matchinfo), | ||
(match (wip_match_opcode G_SHUFFLE_VECTOR):$root, | ||
[{ return Helper.matchShuffleToExtractSubvector(*${root}, ${matchinfo}); }]), | ||
(apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using wip_match_opcode
, upstream really doesn't like this since it causes imprecise patterns and slows down the compilation. In this case, it should be very little difference compared to just matching SHUFFLE_VECTOR directly.
https://llvm.org/docs/GlobalISel/MIRPatterns.html#gallery
See MR linked in: #41 (comment)
const LLT Src1Ty = MRI.getType(Src1Reg); | ||
|
||
if (!DstTy.isVector() || !Src1Ty.isVector()) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, G_SHUFFLE_VECTOR
can have scalar inputs. The opcode is a bit weird.
@@ -1642,7 +1652,7 @@ def all_combines : GICombineGroup<[trivial_combines, vector_ops_combines, | |||
sub_add_reg, select_to_minmax, redundant_binop_in_equality, | |||
fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors, | |||
combine_concat_vector, double_icmp_zero_and_or_combine, match_addos, | |||
combine_shuffle_concat]>; | |||
combine_shuffle_concat, combine_shuffle_to_extract_vector]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is a bit of an expensive combiner to run since it iterates through a decent chunk of elements twice. Previously, this was an explicitly opt in combiner that a target enabled by calling a specific function. This will now run it on all targets, this is probably worth it. Also, upstream will probably be running those checks now anyways, so it doesn't matter as much.
return Helper.tryCombineShuffleVector(MI); |
See MR linked in: #41 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What upstream checks are you referring to? I think the current tryCombineShuffleVector
only matches merge-like patterns, but we could definitely move our combine_shuffle_to_extract_vector
combine there, would that make it less expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, checking elements to the shuffle vector is just an O(n²) operations unless you use a hashmap. At the moment, this is done in tryCombineShuffleVector
which a backend can enable. This is now run always in all backends. I don't know exactly why someone made the decision to have that combiner be run explicitly rather than using tablegen like the others
const LLT Src1Ty = MRI.getType(Src1Reg); | ||
|
||
if (!DstTy.isVector() || !Src1Ty.isVector()) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this misses the following edge case:
%0:_(<2 x s32>) = COPY $x0
%1:_(<2 x s32>) = UNDEF
%2:_(s32) = G_SHUFFLE_VECTOR %0, %1, shufflemask(0)
In this case, you could unmerge the first array into the destination. Since the target is not a vector, this would skip it. How useful this is? Who knows.
auto CheckExtractMask = [=](unsigned Start, unsigned NumElems) -> bool { | ||
auto ExtractMask = createSequentialMask(Start, NumElems, 0); | ||
|
||
for (unsigned I = 0; I < NumDstElems; I++) { | ||
if (Mask[I] == -1) | ||
continue; | ||
|
||
if (Mask[I] != ExtractMask[I]) | ||
return false; | ||
} | ||
|
||
return true; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this is a lot, this is great.
if (NumDstElems * 2 != NumSrc1Elems) | ||
return false; | ||
|
||
auto CheckExtractMask = [=](unsigned Start, unsigned NumElems) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can use by-reference capture and automatic return type deduction.
No description provided.