Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalISel] Remove references to rhs of shufflevector if rhs is undef #115076

Conversation

konstantinschwarz
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 5, 2024

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Konstantin Schwarz (konstantinschwarz)

Changes

Patch is 31.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115076.diff

8 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+4)
  • (modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+9-1)
  • (modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+29)
  • (modified) llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp (+7-7)
  • (added) llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-shuffle-vector-undef-rhs.mir (+40)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll (+1-9)
  • (modified) llvm/test/CodeGen/AArch64/neon-perm.ll (+105-255)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/vni8-across-blocks.ll (+22-50)
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index b09981eaef506e..cd2022e88a0df1 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -870,6 +870,10 @@ class CombinerHelper {
   /// register and different indices.
   bool matchExtractVectorElementWithDifferentIndices(const MachineOperand &MO,
                                                      BuildFnTy &MatchInfo);
+
+  /// Remove references to rhs if it is undef
+  bool matchShuffleUndefRHS(MachineInstr &MI, BuildFnTy &MatchInfo);
+
   /// Use a function which takes in a MachineIRBuilder to perform a combine.
   /// By default, it erases the instruction def'd on \p MO from the function.
   void applyBuildFnMO(const MachineOperand &MO, BuildFnTy &MatchInfo);
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 80a22c35ebceff..9a0945b903aa52 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1576,6 +1576,14 @@ def expand_const_fpowi : GICombineRule<
           [{ return Helper.matchFPowIExpansion(*${root}, ${imm}.getCImm()->getSExtValue()); }]),
    (apply [{ Helper.applyExpandFPowI(*${root}, ${imm}.getCImm()->getSExtValue()); }])>;
 
+def combine_shuffle_undef_rhs : GICombineRule<
+  (defs root:$root, build_fn_matchinfo:$matchinfo),
+  (match (G_IMPLICIT_DEF $undef),
+         (G_SHUFFLE_VECTOR $root, $src1, $undef, $mask):$root,
+        [{ return Helper.matchShuffleUndefRHS(*${root}, ${matchinfo}); }]),
+  (apply [{ Helper.applyBuildFn(*${root}, ${matchinfo}); }])
+>;
+
 // match_extract_of_element and insert_vector_elt_oob must be the first!
 def vector_ops_combines: GICombineGroup<[
 match_extract_of_element_undef_vector,
@@ -1948,7 +1956,7 @@ def all_combines : GICombineGroup<[integer_reassoc_combines, trivial_combines,
     fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
     combine_concat_vector, match_addos,
     sext_trunc, zext_trunc, prefer_sign_combines, combine_shuffle_concat,
-    combine_use_vector_truncate, merge_combines]>;
+    combine_use_vector_truncate, merge_combines, combine_shuffle_undef_rhs]>;
 
 // A combine group used to for prelegalizer combiners at -O0. The combines in
 // this group have been selected based on experiments to balance code size and
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index ede8d82fc1a35e..3f163f429c1b47 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -7721,3 +7721,32 @@ bool CombinerHelper::matchUnmergeValuesAnyExtBuildVector(const MachineInstr &MI,
 
   return false;
 }
+
+bool CombinerHelper::matchShuffleUndefRHS(MachineInstr &MI,
+                                          BuildFnTy &MatchInfo) {
+
+  bool Changed = false;
+  ArrayRef<int> OrigMask = MI.getOperand(3).getShuffleMask();
+  SmallVector<int, 8> NewMask;
+  const LLT SrcTy = MRI.getType(MI.getOperand(1).getReg());
+  const unsigned NumSrcElems = SrcTy.isVector() ? SrcTy.getNumElements() : 1;
+  const unsigned NumDstElts = OrigMask.size();
+  for (unsigned i = 0; i != NumDstElts; ++i) {
+    int Idx = OrigMask[i];
+    if (Idx >= (int)NumSrcElems) {
+      Idx = -1;
+      Changed = true;
+    }
+    NewMask.push_back(Idx);
+  }
+
+  if (!Changed)
+    return false;
+
+  MatchInfo = [&, NewMask](MachineIRBuilder &B) {
+    B.buildShuffleVector(MI.getOperand(0), MI.getOperand(2), MI.getOperand(1),
+                         NewMask);
+  };
+
+  return true;
+}
diff --git a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
index 15b9164247846c..02dbe781babdba 100644
--- a/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
@@ -772,13 +772,13 @@ MachineInstrBuilder MachineIRBuilder::buildShuffleVector(const DstOp &Res,
   LLT DstTy = Res.getLLTTy(*getMRI());
   LLT Src1Ty = Src1.getLLTTy(*getMRI());
   LLT Src2Ty = Src2.getLLTTy(*getMRI());
-  assert((size_t)(Src1Ty.getNumElements() + Src2Ty.getNumElements()) >=
-         Mask.size());
-  assert(DstTy.getElementType() == Src1Ty.getElementType() &&
-         DstTy.getElementType() == Src2Ty.getElementType());
-  (void)DstTy;
-  (void)Src1Ty;
-  (void)Src2Ty;
+  const LLT DstElemTy = DstTy.isVector() ? DstTy.getElementType() : DstTy;
+  const LLT ElemTy1 = Src1Ty.isVector() ? Src1Ty.getElementType() : Src1Ty;
+  const LLT ElemTy2 = Src2Ty.isVector() ? Src2Ty.getElementType() : Src2Ty;
+  assert(DstElemTy == ElemTy1 && DstElemTy == ElemTy2);
+  (void)DstElemTy;
+  (void)ElemTy1;
+  (void)ElemTy2;
   ArrayRef<int> MaskAlloc = getMF().allocateShuffleMask(Mask);
   return buildInstr(TargetOpcode::G_SHUFFLE_VECTOR, {Res}, {Src1, Src2})
       .addShuffleMask(MaskAlloc);
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-shuffle-vector-undef-rhs.mir b/llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-shuffle-vector-undef-rhs.mir
new file mode 100644
index 00000000000000..d40b4e22fbe8bc
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-shuffle-vector-undef-rhs.mir
@@ -0,0 +1,40 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple aarch64  -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: shuffle_vector_undef_rhs
+tracksRegLiveness: true
+body:             |
+  bb.1:
+    liveins: $d0
+
+    ; CHECK-LABEL: name: shuffle_vector_undef_rhs
+    ; CHECK: liveins: $d0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $d0
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<2 x s32>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[SHUF:%[0-9]+]]:_(<4 x s32>) = G_SHUFFLE_VECTOR [[DEF]](<2 x s32>), [[COPY]], shufflemask(0, undef, 1, undef)
+    ; CHECK-NEXT: RET_ReallyLR implicit [[SHUF]](<4 x s32>)
+    %0:_(<2 x s32>) = COPY $d0
+    %1:_(<2 x s32>) = G_IMPLICIT_DEF
+    %2:_(<4 x s32>) = G_SHUFFLE_VECTOR %0(<2 x s32>), %1(<2 x s32>), shufflemask(0, 2, 1, 3)
+    RET_ReallyLR implicit %2
+...
+
+---
+name: shuffle_vector_undef_rhs_scalar
+tracksRegLiveness: true
+body:             |
+  bb.1:
+    liveins: $x0
+
+    ; CHECK-LABEL: name: shuffle_vector_undef_rhs_scalar
+    ; CHECK: liveins: $x0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<2 x s64>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: RET_ReallyLR implicit [[DEF]](<2 x s64>)
+    %0:_(s64) = COPY $x0
+    %1:_(s64) = G_IMPLICIT_DEF
+    %2:_(<2 x s64>) = G_SHUFFLE_VECTOR %0(s64), %1(s64), shufflemask(0, 1)
+    RET_ReallyLR implicit %2
+...
diff --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
index a39c2b5d14dddd..22bdbcfcdaacc9 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
@@ -322,17 +322,9 @@ define void @typei1_orig(i64 %a, ptr %p, ptr %q) {
 ;
 ; CHECK-GI-LABEL: typei1_orig:
 ; CHECK-GI:       // %bb.0:
-; CHECK-GI-NEXT:    ldr q1, [x2]
-; CHECK-GI-NEXT:    cmp x0, #0
 ; CHECK-GI-NEXT:    movi v0.2d, #0xffffffffffffffff
-; CHECK-GI-NEXT:    cset w8, gt
-; CHECK-GI-NEXT:    neg v1.8h, v1.8h
-; CHECK-GI-NEXT:    dup v2.8h, w8
 ; CHECK-GI-NEXT:    mvn v0.16b, v0.16b
-; CHECK-GI-NEXT:    mul v1.8h, v1.8h, v2.8h
-; CHECK-GI-NEXT:    cmeq v1.8h, v1.8h, #0
-; CHECK-GI-NEXT:    mvn v1.16b, v1.16b
-; CHECK-GI-NEXT:    uzp1 v0.16b, v1.16b, v0.16b
+; CHECK-GI-NEXT:    uzp1 v0.16b, v0.16b, v0.16b
 ; CHECK-GI-NEXT:    shl v0.16b, v0.16b, #7
 ; CHECK-GI-NEXT:    sshr v0.16b, v0.16b, #7
 ; CHECK-GI-NEXT:    str q0, [x1]
diff --git a/llvm/test/CodeGen/AArch64/neon-perm.ll b/llvm/test/CodeGen/AArch64/neon-perm.ll
index 7b85924ce1e323..ad036218f242ca 100644
--- a/llvm/test/CodeGen/AArch64/neon-perm.ll
+++ b/llvm/test/CodeGen/AArch64/neon-perm.ll
@@ -2838,435 +2838,285 @@ entry:
 }
 
 define <8 x i8> @test_undef_vtrn1_s8(<8 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1_s8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1_s8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.8b, v0.8b, v0.8b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1_s8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
   ret <8 x i8> %shuffle.i
 }
 
 define <16 x i8> @test_undef_vtrn1q_s8(<16 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_s8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_s8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.16b, v0.16b, v0.16b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_s8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <16 x i8> %a, <16 x i8> undef, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
   ret <16 x i8> %shuffle.i
 }
 
 define <4 x i16> @test_undef_vtrn1_s16(<4 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1_s16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1_s16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.4h, v0.4h, v0.4h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1_s16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x i16> %a, <4 x i16> undef, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
   ret <4 x i16> %shuffle.i
 }
 
 define <8 x i16> @test_undef_vtrn1q_s16(<8 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_s16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_s16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.8h, v0.8h, v0.8h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_s16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
   ret <8 x i16> %shuffle.i
 }
 
 define <4 x i32> @test_undef_vtrn1q_s32(<4 x i32> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_s32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_s32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.4s, v0.4s, v0.4s
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_s32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
   ret <4 x i32> %shuffle.i
 }
 
 define <8 x i8> @test_undef_vtrn1_u8(<8 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1_u8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1_u8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.8b, v0.8b, v0.8b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1_u8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
   ret <8 x i8> %shuffle.i
 }
 
 define <16 x i8> @test_undef_vtrn1q_u8(<16 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_u8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_u8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.16b, v0.16b, v0.16b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_u8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <16 x i8> %a, <16 x i8> undef, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
   ret <16 x i8> %shuffle.i
 }
 
 define <4 x i16> @test_undef_vtrn1_u16(<4 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1_u16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1_u16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.4h, v0.4h, v0.4h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1_u16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x i16> %a, <4 x i16> undef, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
   ret <4 x i16> %shuffle.i
 }
 
 define <8 x i16> @test_undef_vtrn1q_u16(<8 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_u16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_u16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.8h, v0.8h, v0.8h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_u16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
   ret <8 x i16> %shuffle.i
 }
 
 define <4 x i32> @test_undef_vtrn1q_u32(<4 x i32> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_u32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_u32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.4s, v0.4s, v0.4s
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_u32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
   ret <4 x i32> %shuffle.i
 }
 
 define <4 x float> @test_undef_vtrn1q_f32(<4 x float> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_f32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_f32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.4s, v0.4s, v0.4s
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_f32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
   ret <4 x float> %shuffle.i
 }
 
 define <8 x i8> @test_undef_vtrn1_p8(<8 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1_p8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1_p8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.8b, v0.8b, v0.8b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1_p8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
   ret <8 x i8> %shuffle.i
 }
 
 define <16 x i8> @test_undef_vtrn1q_p8(<16 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_p8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_p8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.16b, v0.16b, v0.16b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_p8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <16 x i8> %a, <16 x i8> undef, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
   ret <16 x i8> %shuffle.i
 }
 
 define <4 x i16> @test_undef_vtrn1_p16(<4 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1_p16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1_p16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.4h, v0.4h, v0.4h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1_p16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x i16> %a, <4 x i16> undef, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
   ret <4 x i16> %shuffle.i
 }
 
 define <8 x i16> @test_undef_vtrn1q_p16(<8 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn1q_p16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn1q_p16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn1 v0.8h, v0.8h, v0.8h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn1q_p16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
   ret <8 x i16> %shuffle.i
 }
 
 define <8 x i8> @test_undef_vtrn2_s8(<8 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn2_s8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    rev16 v0.8b, v0.8b
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn2_s8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn2 v0.8b, v0.8b, v0.8b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn2_s8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    rev16 v0.8b, v0.8b
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
   ret <8 x i8> %shuffle.i
 }
 
 define <16 x i8> @test_undef_vtrn2q_s8(<16 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn2q_s8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    rev16 v0.16b, v0.16b
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn2q_s8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn2 v0.16b, v0.16b, v0.16b
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn2q_s8:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    rev16 v0.16b, v0.16b
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <16 x i8> %a, <16 x i8> undef, <16 x i32> <i32 1, i32 17, i32 3, i32 19, i32 5, i32 21, i32 7, i32 23, i32 9, i32 25, i32 11, i32 27, i32 13, i32 29, i32 15, i32 31>
   ret <16 x i8> %shuffle.i
 }
 
 define <4 x i16> @test_undef_vtrn2_s16(<4 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn2_s16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    rev32 v0.4h, v0.4h
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn2_s16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn2 v0.4h, v0.4h, v0.4h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn2_s16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    rev32 v0.4h, v0.4h
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x i16> %a, <4 x i16> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
   ret <4 x i16> %shuffle.i
 }
 
 define <8 x i16> @test_undef_vtrn2q_s16(<8 x i16> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn2q_s16:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    rev32 v0.8h, v0.8h
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn2q_s16:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn2 v0.8h, v0.8h, v0.8h
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn2q_s16:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    rev32 v0.8h, v0.8h
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
   ret <8 x i16> %shuffle.i
 }
 
 define <4 x i32> @test_undef_vtrn2q_s32(<4 x i32> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn2q_s32:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    rev64 v0.4s, v0.4s
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn2q_s32:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn2 v0.4s, v0.4s, v0.4s
-; CHECK-GI-NEXT:    ret
+; CHECK-LABEL: test_undef_vtrn2q_s32:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    rev64 v0.4s, v0.4s
+; CHECK-NEXT:    ret
 entry:
   %shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
   ret <4 x i32> %shuffle.i
 }
 
 define <8 x i8> @test_undef_vtrn2_u8(<8 x i8> %a) {
-; CHECK-SD-LABEL: test_undef_vtrn2_u8:
-; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    rev16 v0.8b, v0.8b
-; CHECK-SD-NEXT:    ret
-;
-; CHECK-GI-LABEL: test_undef_vtrn2_u8:
-; CHECK-GI:       // %bb.0: // %entry
-; CHECK-GI-NEXT:    trn2 v0.8b, v0....
[truncated]

@konstantinschwarz konstantinschwarz force-pushed the kschwarz.upstream.shufflevector.undef.rhs branch from 0dba1c1 to 00a1ec2 Compare November 5, 2024 22:58
Copy link
Contributor

@aemerson aemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor nits.

BuildFnTy &MatchInfo) {

bool Changed = false;
ArrayRef<int> OrigMask = MI.getOperand(3).getShuffleMask();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do:

auto &Shuffle = cast<GShuffleVector>(MI);
ArrayRef<int> OrigMask = Shuffle.getMask();

bool Changed = false;
ArrayRef<int> OrigMask = MI.getOperand(3).getShuffleMask();
SmallVector<int, 8> NewMask;
const LLT SrcTy = MRI.getType(MI.getOperand(1).getReg());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And likewise: MRI.getType(Shuffle.getSrc1Reg())


bool Changed = false;
ArrayRef<int> OrigMask = MI.getOperand(3).getShuffleMask();
SmallVector<int, 8> NewMask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16-wide byte vectors are common so let's bump this to 16.

@@ -1948,7 +1956,7 @@ def all_combines : GICombineGroup<[integer_reassoc_combines, trivial_combines,
fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
combine_concat_vector, match_addos,
sext_trunc, zext_trunc, prefer_sign_combines, combine_shuffle_concat,
combine_use_vector_truncate, merge_combines]>;
combine_use_vector_truncate, merge_combines, combine_shuffle_undef_rhs]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a new combine group, maybe shuffle_combines, and put this and the combine_shuffle_concat combine into it? Can throw in anything else you think fits the scope there too.

@konstantinschwarz konstantinschwarz force-pushed the kschwarz.upstream.shufflevector.undef.rhs branch from 00a1ec2 to 3534945 Compare November 7, 2024 00:12
@konstantinschwarz konstantinschwarz merged commit cbfe87c into llvm:main Nov 7, 2024
5 of 7 checks passed
@konstantinschwarz konstantinschwarz deleted the kschwarz.upstream.shufflevector.undef.rhs branch November 7, 2024 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants