Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalISel] Match G_SHUFFLE_VECTORs representing sub-vector extracts #224

Draft
wants to merge 1 commit into
base: aie-public
Choose a base branch
from

Conversation

konstantinschwarz
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@khallouh khallouh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good. Just a few nits. No comments on the tests yet because of the draft status. I have 2 questions:

  • Is the goal to match only half vector extract patterns or Sub-Vector extracts in general? (quarter extracts being also relevant for AIE's API)
  • How about the missing canonicalization combine for shufflevector in GlobalISel? I think this is needed if we want to support extracts from the second vector input with the current combine implementation in this PR.

const LLT Src1Ty = MRI.getType(Src1Reg);

if (!DstTy.isVector() || !Src1Ty.isVector())
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? G_SHUFFLE_VECTOR only operates on vectors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, G_SHUFFLE_VECTOR can have scalar inputs. The opcode is a bit weird.

https://github.com/Xilinx/llvm-aie/blob/aie-public/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shuffle-vector.mir#L119

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this misses the following edge case:

%0:_(<2 x s32>) = COPY $x0
%1:_(<2 x s32>) = UNDEF
%2:_(s32) = G_SHUFFLE_VECTOR %0, %1, shufflemask(0)

In this case, you could unmerge the first array into the destination. Since the target is not a vector, this would skip it. How useful this is? Who knows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that case could even be matched to a G_EXTRACT_VECTOR_ELT. It might not be very useful for AIE's API since element extracts would be implemented using extractelement instead of shufflevector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a bit obscure and the overhead it would cause is minimal.

assert(MI.getOpcode() == TargetOpcode::G_SHUFFLE_VECTOR);
const Register DstReg = MI.getOperand(0).getReg();
const Register Src1Reg = MI.getOperand(1).getReg();
ArrayRef<int> Mask = MI.getOperand(3).getShuffleMask();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ShuffleMask?

Copy link
Collaborator

@ValentijnvdBeek ValentijnvdBeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Konstantin, thanks for taking my design and generally unjunior/internifying. This looks very clean, great job. I am not able to run it a.t.m., but I will look at that whenever the PR is complete.

There are two small worries:

This check is really expensive, this replaces the basic design but makes it hard to add the optimizations that might be needed to merge upstream.

Having separate optimizer causes it to be missed by the G_EXTRACT->G_INSERT->G_BUILD sequences that might be valid representations. Upstream doesn't, for sensible reasons, canonize those into shufflevectors so you need to make sure that this optimizer is on those sequences as well.

See MR linked in: #41 (comment)

Comment on lines +1534 to +1538
def combine_shuffle_to_extract_vector : GICombineRule<
(defs root:$root, build_fn_matchinfo:$matchinfo),
(match (wip_match_opcode G_SHUFFLE_VECTOR):$root,
[{ return Helper.matchShuffleToExtractSubvector(*${root}, ${matchinfo}); }]),
(apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using wip_match_opcode, upstream really doesn't like this since it causes imprecise patterns and slows down the compilation. In this case, it should be very little difference compared to just matching SHUFFLE_VECTOR directly.

https://llvm.org/docs/GlobalISel/MIRPatterns.html#gallery
See MR linked in: #41 (comment)

const LLT Src1Ty = MRI.getType(Src1Reg);

if (!DstTy.isVector() || !Src1Ty.isVector())
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, G_SHUFFLE_VECTOR can have scalar inputs. The opcode is a bit weird.

https://github.com/Xilinx/llvm-aie/blob/aie-public/llvm/test/CodeGen/AArch64/GlobalISel/legalize-shuffle-vector.mir#L119

@@ -1642,7 +1652,7 @@ def all_combines : GICombineGroup<[trivial_combines, vector_ops_combines,
sub_add_reg, select_to_minmax, redundant_binop_in_equality,
fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
combine_concat_vector, double_icmp_zero_and_or_combine, match_addos,
combine_shuffle_concat]>;
combine_shuffle_concat, combine_shuffle_to_extract_vector]>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is a bit of an expensive combiner to run since it iterates through a decent chunk of elements twice. Previously, this was an explicitly opt in combiner that a target enabled by calling a specific function. This will now run it on all targets, this is probably worth it. Also, upstream will probably be running those checks now anyways, so it doesn't matter as much.

return Helper.tryCombineShuffleVector(MI);

See MR linked in: #41 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What upstream checks are you referring to? I think the current tryCombineShuffleVector only matches merge-like patterns, but we could definitely move our combine_shuffle_to_extract_vector combine there, would that make it less expensive?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, checking elements to the shuffle vector is just an O(n²) operations unless you use a hashmap. At the moment, this is done in tryCombineShuffleVector which a backend can enable. This is now run always in all backends. I don't know exactly why someone made the decision to have that combiner be run explicitly rather than using tablegen like the others

const LLT Src1Ty = MRI.getType(Src1Reg);

if (!DstTy.isVector() || !Src1Ty.isVector())
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this misses the following edge case:

%0:_(<2 x s32>) = COPY $x0
%1:_(<2 x s32>) = UNDEF
%2:_(s32) = G_SHUFFLE_VECTOR %0, %1, shufflemask(0)

In this case, you could unmerge the first array into the destination. Since the target is not a vector, this would skip it. How useful this is? Who knows.

Comment on lines +7278 to +7290
auto CheckExtractMask = [=](unsigned Start, unsigned NumElems) -> bool {
auto ExtractMask = createSequentialMask(Start, NumElems, 0);

for (unsigned I = 0; I < NumDstElems; I++) {
if (Mask[I] == -1)
continue;

if (Mask[I] != ExtractMask[I])
return false;
}

return true;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this is a lot, this is great.

if (NumDstElems * 2 != NumSrc1Elems)
return false;

auto CheckExtractMask = [=](unsigned Start, unsigned NumElems) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can use by-reference capture and automatic return type deduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants