-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize VectorT.ConditionalSelect for constant masks #104092
base: main
Are you sure you want to change the base?
Conversation
This adds a check in the JIT for constant masks (`GT_CNS_VEC`, everything else gets lowered to it) and enables optimization to `BlendVariable` (`(v)pblendvb` instruction). This currently does not work for masks loaded from an array in a field/variable. Also this optimization is not triggered for platforms supporting AVX512F(/VL?) since it gets optimized earlier to `vpternlogd` instruction.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@dotnet-policy-service agree |
src/coreclr/jit/gentree.cpp
Outdated
if (IsCnsVec()) | ||
{ | ||
const GenTreeVecCon* vecCon = AsVecCon(); | ||
if (vecCon->IsAllBitsSet() || vecCon->IsZero()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these checks? I doubt they save any measurable TP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, these functions just loop the memory checking if all elements are a given value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding suggested that these checks should be in place here: https://discord.com/channels/143867839282020352/312132327348240384/1255568916151930990
However ConditionalSelect with all bits set/all bits zero as a mask is pretty useless, so we can get rid of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consideration is rather just that main loop below already handles this and these checks aren't "cheap", they are just doing a loop and comparison themselves.
So by removing these explicit checks, we can loop once rather than 3 times and still handle all the scenarios.
// Returns: | ||
// True if this node is a vector constant per-element mask | ||
// | ||
bool GenTree::IsVectorPerElementMask(var_types simdBaseType, unsigned simdSize) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, we can name it as ElementsAreAllBitsSetOrZero as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've explicitly used the term PerElementMask
elsewhere throughout the JIT to encompass the general concept necessary here.
src/coreclr/jit/gentree.cpp
Outdated
@@ -29442,6 +29442,57 @@ bool GenTree::IsInvariant() const | |||
return OperIsConst() || OperIs(GT_LCL_ADDR) || OperIs(GT_FTN_ADDR); | |||
} | |||
|
|||
//------------------------------------------------------------------- | |||
// IsVectorPerElementMask: returns true if this node is a vector constant per-element mask | |||
// (every element has either all bits set or none of them). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comments for parameters are missing
// TODO-XARCH-AVX512 Use VPBLENDM* and take input directly from K registers if cond is from | ||
// MoveMaskToVectorSpecial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't applicable to the general query, it was specific to the CndSel lowering logic
{ | ||
// TODO-XARCH-AVX512 Use VPBLENDM* and take input directly from K registers if cond is from | ||
// MoveMaskToVectorSpecial. | ||
return HWIntrinsicInfo::ReturnsPerElementMask(AsHWIntrinsic()->GetHWIntrinsicId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to, you could also update this just a little bit more as there's some very simple cases that we could cover that would also allow us to optimize further.
In particular, you could have this do:
GenTreeHWIntrinsic* intrinsic = AsHWIntrinsic();
NamedIntrinsic intrinsicId = intrinsic->GetHWIntrinsicId();
if (HWIntrinsicInfo::ReturnsPerElementMask(intrinsicId))
{
// We directly return a per-element mask
return true;
}
bool isScalar = false;
genTreeOps oper = intrinsic->HWOperGet(&isScalar);
switch (oper)
{
case GT_AND:
case GT_AND_NOT:
case GT_OR:
case GT_XOR:
{
// We are a binary bitwise operation where both inputs are per-element masks
return intrinsic->Op(1)->IsVectorPerElementMask(simdBaseType, simdSize)
&& intrinsic->Op(2)->IsVectorPerElementMask(simdBaseType, simdSize);
}
case GT_NOT:
{
// We are an unary bitwise operation where the input is a per-element mask
return intrinsic->Op(1)->IsVectorPerElementMask(simdBaseType, simdSize);
}
default:
{
assert(!GenTreeHWIntrinsic::OperIsBitwiseHWIntrinsic(oper));
break;
}
}
return false;
There's others that qualify as well, but I expect they're more rare and the bitwise operations are the most important to consider.
Resolves #104001.
I’ve also realized that it is not possible to implement this for arrays as I’ve originally intended since we have no immutability guarantees, so instead I’ve made sure it gets optimized with
ReadOnlySpan<T>
property and staticVectorX<T>
field.Example:
Old codegen:
New codegen: