-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable #130577
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-transforms Author: None (Shoreshen) ChangesFor Add, Sub, Mul with Int64 type, if profitable, then do:
Full diff: https://github.com/llvm/llvm-project/pull/130577.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
index 6b0f568864fd5..73bd75f37cc71 100644
--- a/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
+++ b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
@@ -1224,6 +1224,49 @@ static bool foldLibCalls(Instruction &I, TargetTransformInfo &TTI,
return false;
}
+static bool tryNarrowMathIfNoOverflow(Instruction &I,
+ TargetTransformInfo &TTI) {
+ unsigned opc = I.getOpcode();
+ if (opc != Instruction::Add && opc != Instruction::Sub &&
+ opc != Instruction::Mul) {
+ return false;
+ }
+ LLVMContext &ctx = I.getContext();
+ Type *i64type = Type::getInt64Ty(ctx);
+ Type *i32type = Type::getInt32Ty(ctx);
+
+ if (I.getType() != i64type || !TTI.isTruncateFree(i64type, i32type)) {
+ return false;
+ }
+ InstructionCost costOp64 =
+ TTI.getArithmeticInstrCost(opc, i64type, TTI::TCK_RecipThroughput);
+ InstructionCost costOp32 =
+ TTI.getArithmeticInstrCost(opc, i32type, TTI::TCK_RecipThroughput);
+ InstructionCost costZext64 = TTI.getCastInstrCost(
+ Instruction::ZExt, i64type, i32type, TTI.getCastContextHint(&I),
+ TTI::TCK_RecipThroughput);
+ if ((costOp64 - costOp32) <= costZext64) {
+ return false;
+ }
+ uint64_t AndConst0, AndConst1;
+ Value *X;
+ if ((match(I.getOperand(0), m_And(m_Value(X), m_ConstantInt(AndConst0))) ||
+ match(I.getOperand(0), m_And(m_ConstantInt(AndConst0), m_Value(X)))) &&
+ AndConst0 <= 2147483647 &&
+ (match(I.getOperand(1), m_And(m_Value(X), m_ConstantInt(AndConst1))) ||
+ match(I.getOperand(1), m_And(m_ConstantInt(AndConst1), m_Value(X)))) &&
+ AndConst1 <= 2147483647) {
+ IRBuilder<> Builder(&I);
+ Value *trun0 = Builder.CreateTrunc(I.getOperand(0), i32type);
+ Value *trun1 = Builder.CreateTrunc(I.getOperand(1), i32type);
+ Value *arith32 = Builder.CreateAdd(trun0, trun1);
+ Value *zext64 = Builder.CreateZExt(arith32, i64type);
+ I.replaceAllUsesWith(zext64);
+ I.eraseFromParent();
+ }
+ return false;
+}
+
/// This is the entry point for folds that could be implemented in regular
/// InstCombine, but they are separated because they are not expected to
/// occur frequently and/or have more than a constant-length pattern match.
@@ -1256,6 +1299,7 @@ static bool foldUnusualPatterns(Function &F, DominatorTree &DT,
// needs to be called at the end of this sequence, otherwise we may make
// bugs.
MadeChange |= foldLibCalls(I, TTI, TLI, AC, DT, DL, MadeCFGChange);
+ MadeChange |= tryNarrowMathIfNoOverflow(I, TTI);
}
}
|
could you please add some test cases. |
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about sext operations with sign bit is 1?
%zext1 = and i64 %b, 2 | ||
%mul = mul i64 %zext0, %zext1 | ||
ret i64 %mul | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test vector cases
llvm/test/Transforms/AggressiveInstCombine/narrow_math_for_and.ll
Outdated
Show resolved
Hide resolved
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
Outdated
Show resolved
Hide resolved
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
Outdated
Show resolved
Hide resolved
Hi @shiltian , there may be problem with sext if I'm not wrong, using the following example:
So When If I truncate both The 31'th bit is 1, with sext this will extend to 0xFFFFFFFF80000000, which is not equals to |
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
Outdated
Show resolved
Hide resolved
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
Outdated
Show resolved
Hide resolved
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
Outdated
Show resolved
Hide resolved
You should check whether both LHS and RHS have more than 33 sign bits. |
Hi @dtcxzyw I think because the const that is getting check is the second operand of and, so even if I have the following:
It could also be true that |
https://alive2.llvm.org/ce/z/Bom92i Having 34 sign bits should work. |
Hi @dtcxzyw , I'm a little bit confusing about the link posted, are you saying that function |
No. I mean |
We don't need trunc. These trunc instructions in |
I'd add test cases to make sure invalid cases are not combined. |
Hi @dtcxzyw , by trunc + sext, the original value can be changed. If What I'm trying to say is that if I'm not wrong, the case
|
Hi @shiltian , some of the no_narrow cases were added, I'll fix the comments and add some more cases. Thanks~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM but a AMDGPU guru should approve
// Old cost | ||
InstructionCost OldCost = | ||
TTI.getArithmeticInstrCost(Opc, OldType, TTI::TCK_RecipThroughput); | ||
// New cost of new op | ||
InstructionCost NewCost = | ||
TTI.getArithmeticInstrCost(Opc, NewType, TTI::TCK_RecipThroughput); | ||
// New cost of narrowing 2 operands (use trunc) | ||
NewCost += 2 * TTI.getCastInstrCost(Instruction::Trunc, NewType, OldType, | ||
TTI.getCastContextHint(I), | ||
TTI::TCK_RecipThroughput); | ||
// New cost of zext narrowed result to original type | ||
NewCost += | ||
TTI.getCastInstrCost(Instruction::ZExt, OldType, NewType, | ||
TTI.getCastContextHint(I), TTI::TCK_RecipThroughput); | ||
if (NewCost >= OldCost) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this cost makes the transformation too conservative. Usually the truncs will be removed in the final code. Also, it does not include the benefit of using fewer registers with the narrower operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @LU-JOHN , maybe yes, but currently I do have other cost strategy in my mind........ BTW, the cost of trunc is 0 from AMD backend, the narrowed arith is proportionally to the amount of bit narrowed. I'm not sure that if this took count in the saving of registers....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM though I'm not an AMDGPU guru.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/6973 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/55/builds/9233 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/6890 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/7236 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/164/builds/8612 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/94/builds/5800 Here is the relevant piece of the build log for the reference
|
…bit if profitable" (#133880) Reverts llvm/llvm-project#130577
For Add, Sub, Mul with Int64 type, if profitable, then do: