Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAG][X86]added shrd in combineor for bzhiq+shlq+or #125734

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

shalini-nik
Copy link

@shalini-nik shalini-nik commented Feb 4, 2025

Created pull request for [DAG][X86] Use shrdq to insert into a bitfield

Fixes #112488 issue

Copy link

github-actions bot commented Feb 4, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Feb 4, 2025

@llvm/pr-subscribers-backend-x86

Author: shalininikhil (shalini-nik)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/125734.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+29)
  • (added) llvm/test/CodeGen/X86/shrdq-to-insert-into-bitfield.ll (+18)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a956074e50d86f7..395c0f5504d1b78 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -51887,6 +51887,35 @@ static SDValue combineOr(SDNode *N, SelectionDAG &DAG,
     }
   }
 
+  if (N0.getOpcode() == ISD::SHL || N1.getOpcode() == ISD::SHL){
+    SDValue SHL = (N0.getOpcode() == ISD::SHL) ? N0 : N1;
+    SDValue OtherOp = (N0.getOpcode() == ISD::SHL) ? N1 : N0;
+    
+    if (OtherOp.getOpcode() == ISD::AND) {
+      SDValue andop = OtherOp;
+      
+      if(andop.getOperand(0).getOpcode()==ISD::Constant||andop.getOperand(1).getOpcode()==ISD::Constant){
+              
+              SDValue constOp = andop.getOperand(0).getOpcode()==ISD::Constant ? andop.getOperand(0): andop.getOperand(1);
+              SDValue valueOp = andop.getOperand(0).getOpcode()==ISD::Constant ? andop.getOperand(1): andop.getOperand(0);
+              auto *ConstRHS = dyn_cast<ConstantSDNode>(constOp);
+              uint64_t maskValue = ConstRHS->getZExtValue();
+              auto *ConstSHL = dyn_cast<ConstantSDNode>(SHL.getOperand(1));
+              uint64_t shiftValue = ConstSHL->getZExtValue();
+              
+              if((((uint64_t)1<<shiftValue)-1)==maskValue){
+                      unsigned numbits = SHL.getScalarValueSizeInBits();
+                      unsigned newshift=numbits-shiftValue;
+                      
+                      SDValue newSHL = DAG.getNode(ISD::SHL,dl,VT,valueOp,DAG.getConstant(newshift, dl, MVT::i8));
+                      SDValue R = DAG.getNode(ISD::FSHR,dl,VT,
+                                    SHL.getOperand(0),newSHL,DAG.getConstant(newshift, dl, MVT::i8));
+                      return R;
+            }
+          }
+      }
+  }
+  
   if (SDValue SetCC = combineAndOrForCcmpCtest(N, DAG, DCI, Subtarget))
     return SetCC;
 
diff --git a/llvm/test/CodeGen/X86/shrdq-to-insert-into-bitfield.ll b/llvm/test/CodeGen/X86/shrdq-to-insert-into-bitfield.ll
new file mode 100644
index 000000000000000..cc205ee145d88c8
--- /dev/null
+++ b/llvm/test/CodeGen/X86/shrdq-to-insert-into-bitfield.ll
@@ -0,0 +1,18 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -O3 < %s | FileCheck %s
+
+define dso_local i64 @updateTop10Bits(i64 noundef %A, i64 noundef %B) local_unnamed_addr #0 {
+; CHECK-LABEL: updateTop10Bits:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    movq	%rdi, %rax
+; CHECK-NEXT:    shlq	$10, %rax
+; CHECK-NEXT:    shrdq	$10, %rsi, %rax
+; CHECK-NEXT:    retq
+entry:
+  %and = and i64 %A, 18014398509481983
+  %shl = shl i64 %B, 54
+  %or = or disjoint i64 %shl, %and
+  ret i64 %or
+}
+
+attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="skylake" "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sgx,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
\ No newline at end of file

@RKSimon RKSimon requested review from RKSimon and phoebewang February 4, 2025 18:14
@@ -0,0 +1,18 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O3 < %s | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for -O3

You should add a second RUN line with "-mattr=+slow-shld" as well to account for AMD CPUs with slow funnel shifts:

; RUN: llc < %s -mtriple=x86_64-- | FileCheck %s --check-prefixes=CHECK,FAST
; RUN: llc < %s -mtriple=x86_64-- -mattr=+slow-shld | FileCheck %s --check-prefixes=CHECK,SLOW

ret i64 %or
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="skylake" "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sgx,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this - attributes shouldn't be necessary

@@ -0,0 +1,18 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename shrdq-to-insert-into-bitfield.ll to something more general "insert-bitfield.ll"?

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O3 < %s | FileCheck %s

define dso_local i64 @updateTop10Bits(i64 noundef %A, i64 noundef %B) local_unnamed_addr #0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove dso_local / noundef / local_unnamed_addr #0 noise

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 5, 2025

Are you working with @dipeshs809 on this?

@shalini-nik
Copy link
Author

shalini-nik commented Feb 5, 2025

Are you working with @dipeshs809 on this?

No, i am working individually and this is my first issue

Comment on lines 51893 to 51902
if(sd_match(N,m_Or(m_Shl(m_Value(B),m_ConstInt(ShlConst)),m_And(m_Value(A),m_ConstInt(MaskConst))))){
uint64_t shiftValue = ShlConst.getZExtValue();
if(MaskConst.isMask(shiftValue)){
unsigned numbits = B.getScalarValueSizeInBits();
unsigned newshift=numbits-shiftValue;
SDValue newSHL = DAG.getNode(ISD::SHL,dl,VT,A,DAG.getConstant(newshift, dl, MVT::i8));
SDValue R = DAG.getNode(ISD::FSHR,dl,VT,B,newSHL,DAG.getConstant(newshift, dl, MVT::i8));
return R;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the code with clang-format. I'm surprised GH didn't complain this time.

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 5, 2025

@shalini-nik - @dipeshs809 is OK with you taking over their assigned issue, but in the future please check an issue hasn't been assigned (or ask on the issue thread if they are still working on it).

@shalini-nik
Copy link
Author

@shalini-nik - @dipeshs809 is OK with you taking over their assigned issue, but in the future please check an issue hasn't been assigned (or ask on the issue thread if they are still working on it).

@RKSimon sure i will check and ask for assignment.

Copy link

github-actions bot commented Feb 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please can you fix the clang-format warnings?

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - the next step should be to add additional test coverage - different types, maybe some negative cases etc.

m_And(m_Value(A), m_ConstInt(MaskConst))))) {
uint64_t ShiftValue = ShlConst.getZExtValue();
if (MaskConst.isMask(ShiftValue) && (A.getOpcode() == ISD::CopyFromReg) &&
(B.getOpcode() == ISD::CopyFromReg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check CopyFromReg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using CopyFromReg feels like you're trying to prevent regressions on something?

; RUN: llc < %s -mtriple=x86_64-- | FileCheck %s --check-prefixes=X64,X64-FAST
; RUN: llc < %s -mtriple=x86_64-- -mattr=+slow-shld | FileCheck %s --check-prefixes=X64,X64-SLOW

define i64 @updateTop10Bits_64bits(i64 %A, i64 %B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace 2 spaces with one.

; X64-SLOW-NEXT: shlq $54, %rsi
; X64-SLOW-NEXT: orq %rsi, %rax
; X64-SLOW-NEXT: retq
entry:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove entry: lines

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 21, 2025

@shalini-nik have you been able to create an alive2 proof for your solution yet please?

unsigned NumBits = B.getScalarValueSizeInBits();
unsigned NewShift = NumBits - ShiftValue;
if (ShiftValue > 4 && ShiftValue != 8 && ShiftValue != 16 &&
ShiftValue != 32 && ShiftValue != 64) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these specific shift values omitted? 64 in particular seems weird.

@@ -51887,6 +51887,47 @@ static SDValue combineOr(SDNode *N, SelectionDAG &DAG,
}
}

if (!Subtarget.isSHLDSlow()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a single comment in this entire patch :(

m_And(m_Value(A), m_ConstInt(MaskConst))))) {
uint64_t ShiftValue = ShlConst.getZExtValue();
if (MaskConst.isMask(ShiftValue) && (A.getOpcode() == ISD::CopyFromReg) &&
(B.getOpcode() == ISD::CopyFromReg)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using CopyFromReg feels like you're trying to prevent regressions on something?

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 1, 2025

@shalini-nik reverse-ping

@shalini-nik
Copy link
Author

Hey @RKSimon, just saw your ping!

@shalini-nik shalini-nik requested a review from RKSimon April 1, 2025 12:59
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to remove the unnecessary CopyFromReg checks and cleanup the ShiftVal limits (it feels like you're trying to hide regressions)

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 9, 2025

please can you regenerate the failing test files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DAG][X86] Use shrdq to insert into a bitfield
4 participants