Optimize VectorT.ConditionalSelect for constant masks #104092

ezhevita · 2024-06-27T12:05:31Z

Resolves #104001.
I’ve also realized that it is not possible to implement this for arrays as I’ve originally intended since we have no immutability guarantees, so instead I’ve made sure it gets optimized with ReadOnlySpan<T> property and static VectorX<T> field.

Example:

[DisassemblyDiagnoser]
public class TestClass
{
    private ReadOnlySpan<byte> Mask => [255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0];

    [Benchmark]
    public byte Span()
    {
        return Vector128.ConditionalSelect(Vector128.Create(Mask), Vector128<byte>.Zero, Vector128<byte>.One)[0];
    }
}

Old codegen:

; Method TestNamespace.TestClass:Span():ubyte:this (FullOpts)
G_M000_IG01:

G_M000_IG02:
       vmovups  xmm0, xmmword ptr [reloc @RWD00]
       vxorps   xmm1, xmm1, xmm1
       vpand    xmm1, xmm1, xmm0
       vpandn   xmm0, xmm0, xmmword ptr [reloc @RWD16]
       vpor     xmm0, xmm0, xmm1
       vmovd    eax, xmm0
       movzx    rax, al

G_M000_IG03:
       ret      
RWD00  	dq	00FF00FF00FF00FFh, 00FF00FF00FF00FFh
RWD16  	dq	0101010101010101h, 0101010101010101h
; Total bytes of code: 36

New codegen:

; Method TestNamespace.TestClass:Span():ubyte:this (FullOpts)
G_M000_IG01:

G_M000_IG02:
       vmovups  xmm0, xmmword ptr [reloc @RWD00]
       vxorps   xmm1, xmm1, xmm1
       vmovups  xmm2, xmmword ptr [reloc @RWD16]
       vpblendvb xmm0, xmm2, xmm1, xmm0
       vmovd    eax, xmm0
       movzx    rax, al

G_M000_IG03:
       ret      
RWD00  	dq	00FF00FF00FF00FFh, 00FF00FF00FF00FFh
RWD16  	dq	0101010101010101h, 0101010101010101h
; Total bytes of code: 34

This adds a check in the JIT for constant masks (`GT_CNS_VEC`, everything else gets lowered to it) and enables optimization to `BlendVariable` (`(v)pblendvb` instruction). This currently does not work for masks loaded from an array in a field/variable. Also this optimization is not triggered for platforms supporting AVX512F(/VL?) since it gets optimized earlier to `vpternlogd` instruction.

dotnet-policy-service · 2024-06-27T12:05:59Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

ezhevita · 2024-06-27T12:06:34Z

@dotnet-policy-service agree

EgorBo · 2024-06-27T13:19:20Z

src/coreclr/jit/gentree.cpp

+    if (IsCnsVec())
+    {
+        const GenTreeVecCon* vecCon = AsVecCon();
+        if (vecCon->IsAllBitsSet() || vecCon->IsZero())


do we need these checks? I doubt they save any measurable TP

👍, these functions just loop the memory checking if all elements are a given value

@tannergooding suggested that these checks should be in place here: https://discord.com/channels/143867839282020352/312132327348240384/1255568916151930990
However ConditionalSelect with all bits set/all bits zero as a mask is pretty useless, so we can get rid of these

The consideration is rather just that main loop below already handles this and these checks aren't "cheap", they are just doing a loop and comparison themselves.

So by removing these explicit checks, we can loop once rather than 3 times and still handle all the scenarios.

EgorBo · 2024-06-27T13:20:13Z

src/coreclr/jit/gentree.cpp

+// Returns:
+//     True if this node is a vector constant per-element mask
+//
+bool GenTree::IsVectorPerElementMask(var_types simdBaseType, unsigned simdSize) const


Perhaps, we can name it as ElementsAreAllBitsSetOrZero as well?

We've explicitly used the term PerElementMask elsewhere throughout the JIT to encompass the general concept necessary here.

EgorBo · 2024-06-27T13:21:20Z

src/coreclr/jit/gentree.cpp

@@ -29442,6 +29442,57 @@ bool GenTree::IsInvariant() const
    return OperIsConst() || OperIs(GT_LCL_ADDR) || OperIs(GT_FTN_ADDR);
 }

+//-------------------------------------------------------------------
+// IsVectorPerElementMask: returns true if this node is a vector constant per-element mask
+// (every element has either all bits set or none of them).


nit: comments for parameters are missing

tannergooding · 2024-06-29T21:57:23Z

src/coreclr/jit/gentree.cpp

+        // TODO-XARCH-AVX512 Use VPBLENDM* and take input directly from K registers if cond is from
+        // MoveMaskToVectorSpecial.


This comment isn't applicable to the general query, it was specific to the CndSel lowering logic

tannergooding · 2024-06-29T22:12:32Z

src/coreclr/jit/gentree.cpp

+    {
+        // TODO-XARCH-AVX512 Use VPBLENDM* and take input directly from K registers if cond is from
+        // MoveMaskToVectorSpecial.
+        return HWIntrinsicInfo::ReturnsPerElementMask(AsHWIntrinsic()->GetHWIntrinsicId());


If you wanted to, you could also update this just a little bit more as there's some very simple cases that we could cover that would also allow us to optimize further.

In particular, you could have this do:

GenTreeHWIntrinsic* intrinsic = AsHWIntrinsic(); NamedIntrinsic intrinsicId = intrinsic->GetHWIntrinsicId(); if (HWIntrinsicInfo::ReturnsPerElementMask(intrinsicId)) { // We directly return a per-element mask return true; } bool isScalar = false; genTreeOps oper = intrinsic->HWOperGet(&isScalar); switch (oper) { case GT_AND: case GT_AND_NOT: case GT_OR: case GT_XOR: { // We are a binary bitwise operation where both inputs are per-element masks return intrinsic->Op(1)->IsVectorPerElementMask(simdBaseType, simdSize) && intrinsic->Op(2)->IsVectorPerElementMask(simdBaseType, simdSize); } case GT_NOT: { // We are an unary bitwise operation where the input is a per-element mask return intrinsic->Op(1)->IsVectorPerElementMask(simdBaseType, simdSize); } default: { assert(!GenTreeHWIntrinsic::OperIsBitwiseHWIntrinsic(oper)); break; } } return false;

There's others that qualify as well, but I expect they're more rare and the bitwise operations are the most important to consider.

ezhevita added 6 commits June 26, 2024 03:05

Cleanup code and separate it into functions

3ba78a3

fix build

edef328

Misc fixes

0d2059a

Final build fixes

1be2a81

Merge branch 'dotnet:main' into optimize-cndsel

f8813bc

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 27, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 27, 2024

EgorBo reviewed Jun 27, 2024

View reviewed changes

ezhevita added 2 commits June 27, 2024 20:34

Address review comments

f9462d0

Address review comments again

b5a9ebc

This was referenced Jun 27, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

The job running on agent NetCore-Public ran longer than the maximum time #104044

Open

tannergooding reviewed Jun 29, 2024

View reviewed changes

tannergooding approved these changes Jun 29, 2024

View reviewed changes

tannergooding reviewed Jun 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize VectorT.ConditionalSelect for constant masks #104092

Optimize VectorT.ConditionalSelect for constant masks #104092

ezhevita commented Jun 27, 2024 •

edited

Loading

dotnet-policy-service bot commented Jun 27, 2024

ezhevita commented Jun 27, 2024

EgorBo Jun 27, 2024

tannergooding Jun 27, 2024

ezhevita Jun 27, 2024

tannergooding Jun 27, 2024

EgorBo Jun 27, 2024

tannergooding Jun 27, 2024

EgorBo Jun 27, 2024

tannergooding Jun 29, 2024

tannergooding Jun 29, 2024

		// TODO-XARCH-AVX512 Use VPBLENDM* and take input directly from K registers if cond is from
		// MoveMaskToVectorSpecial.

Optimize VectorT.ConditionalSelect for constant masks #104092

Are you sure you want to change the base?

Optimize VectorT.ConditionalSelect for constant masks #104092

Conversation

ezhevita commented Jun 27, 2024 • edited Loading

dotnet-policy-service bot commented Jun 27, 2024

ezhevita commented Jun 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezhevita commented Jun 27, 2024 •

edited

Loading