JIT: CMOV aka CMOVcc not generated when method is inlined. #96593

hexawyz · 2024-01-07T16:40:56Z

Description

When playing with the now infamous "One Billion Row Challenge", I noticed that while the JIT properly emits CMOVG and CMOVL for Min/Max operations, those sometimes get replaced by more classical branches when in the context of a more complex method.
I've put the relevant code here to look at the disassembly: Code on SharpLab

To confirm that what I observed in SharpLab corresponded to what would be a Tier1 codegen, I manually observed a Release build in Visual Studio. I could see the assembly for the method changing between two occurences of a manual breakpoint (using Debugger.Break())

As far as I can tell, the code generation difference is not directly related to JIT inlining, as I would see the conditional jump opcodes before splitting this specific code in a separate method.

Relevant C# Code for the method:

struct ItemData
{
	private int _min;
	private int _max;
	private int _sum;
	private int _count;

	public void Aggregate(int value)
	{
		_min = Math.Min(_min, value);
		_max = Math.Max(_max, value);
		_sum += value;
		_count++;
	}
}

Code of the method before inlining:

ItemData.Aggregate(Int32)
    L0000: mov eax, [rcx]
    L0002: cmp eax, edx
    L0004: cmovg eax, edx
    L0007: mov [rcx], eax
    L0009: mov eax, [rcx+4]
    L000c: cmp eax, edx
    L000e: cmovl eax, edx
    L0011: mov [rcx+4], eax
    L0014: add [rcx+8], edx
    L0017: inc dword ptr [rcx+0xc]
    L001a: ret

Relevant code of the method when inlined:

Parser.Parse(System.Collections.Generic.Dictionary`2<Utf8StringRef,ItemData>, Byte ByRef, Byte ByRef)
…
    L00cc: mov ecx, [r13]
    L00d0: cmp ecx, eax
    L00d2: jle short L00e6 ; Corresponding to cmovg above
    L00d4: mov edx, eax
    L00d6: mov [r13], edx
    L00da: mov ecx, [r13+4]
    L00de: cmp ecx, eax
    L00e0: jge short L010b  ; Corresponding to cmovl above
    L00e2: mov edx, eax
    L00e4: jmp short L010d
    L00e6: mov edx, ecx
    L00e8: jmp short L00d6
…
    L010b: mov edx, ecx
    L010d: mov [r13+4], edx
    L0111: add [r13+8], eax
    L0115: inc dword ptr [r13+0xc]
    L0119: jmp short L0100
    L011b: cmp rdi, rsi
…

I initially thought that the problem with CMOVcc was that it can not have a memory destination, but as can be seen in the non inlined method, the JIT chooses to use it despite that fact, which matches with my intuition that it is the correct thing to do here.
The jumps that I see re-introduced in the inlined version (possibly because the JIT failed to detect the pattern there ?) are at least increasing the size of the assembly, but I'd assume they would possibly degrade the performance of the code, depending on the exact data processed.

I tried to trick the JIT in generating CMOV-based code in multiple different ways there (introducing intermediate variables, etc), but I could not get it to not do it.
IIRC, the worst of the cases I managed to produce was one where the conditional jumps would directly point to JMP opcodes without doing anything else uselful.

Code can also be found here: https://github.com/hexawyz/OneBillionRows/blob/adf45647de4d4d9d49c6002e29bd483cee608af3/OneBillionRows/Program.cs#L328

I apologize for not providing a more minimal repro, but hopefully, this one is still small enough so that you can understand what is happening here.

Configuration

Local: .NET 8.0.100, x64
SharpLab: Core CLR 8.0.23.53103 on x64

The text was updated successfully, but these errors were encountered:

ghost · 2024-01-07T16:41:05Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When playing with the now infamous "One Billion Row Challenge", I noticed that while the JIT properly emits CMOVG and CMOVL for Min/Max operations, those sometimes get replaced by more classical branches when in the context of a more complex method.
I've put the relevant code here to look at the disassembly: Code on SharpLab

To confirm that what I observed in SharpLab corresponded to what would be a Tier1 codegen, I manually observed a Release build in Visual Studio. I could see the assembly for the method changing between two occurences of a manual breakpoint (using Debugger.Break())

As far as I can tell, the code generation difference is not directly related to JIT inlining, as I would see the conditional jump opcodes before splitting this specific code in a separate method.

Relevant C# Code for the method:

struct ItemData
{
	private int _min;
	private int _max;
	private int _sum;
	private int _count;

	public void Aggregate(int value)
	{
		_min = Math.Min(_min, value);
		_max = Math.Max(_max, value);
		_sum += value;
		_count++;
	}
}

Code of the method before inlining:

ItemData.Aggregate(Int32)
    L0000: mov eax, [rcx]
    L0002: cmp eax, edx
    L0004: cmovg eax, edx
    L0007: mov [rcx], eax
    L0009: mov eax, [rcx+4]
    L000c: cmp eax, edx
    L000e: cmovl eax, edx
    L0011: mov [rcx+4], eax
    L0014: add [rcx+8], edx
    L0017: inc dword ptr [rcx+0xc]
    L001a: ret

Relevant code of the method when inlined:

Parser.Parse(System.Collections.Generic.Dictionary`2<Utf8StringRef,ItemData>, Byte ByRef, Byte ByRef)
…
    L00cc: mov ecx, [r13]
    L00d0: cmp ecx, eax
    L00d2: jle short L00e6 ; Corresponding to cmovg above
    L00d4: mov edx, eax
    L00d6: mov [r13], edx
    L00da: mov ecx, [r13+4]
    L00de: cmp ecx, eax
    L00e0: jge short L010b  ; Corresponding to cmovl above
    L00e2: mov edx, eax
    L00e4: jmp short L010d
    L00e6: mov edx, ecx
    L00e8: jmp short L00d6
…
    L010b: mov edx, ecx
    L010d: mov [r13+4], edx
    L0111: add [r13+8], eax
    L0115: inc dword ptr [r13+0xc]
    L0119: jmp short L0100
    L011b: cmp rdi, rsi
…

I initially thought that the problem with CMOVcc was that it can not have a memory destination, but as can be seen in the non inlined method, the JIT chooses to use it despite that fact, which matches with my intuition that it is the correct thing to do here.
The jumps that I see re-introduced in the inlined version (possibly because the JIT failed to detect the pattern there ?) are at least increasing the size of the assembly, but I'd assume they would possibly degrade the performance of the code, depending on the exact data processed.

I tried to trick the JIT in generating CMOV-based code in multiple different ways there (introducing intermediate variables, etc), but I could not get it to not do it.
IIRC, the worst of the cases I managed to produce was one where the conditional jumps would directly point to JMP opcodes without doing anything else uselful.

Code can also be found here: https://github.com/hexawyz/OneBillionRows/blob/adf45647de4d4d9d49c6002e29bd483cee608af3/OneBillionRows/Program.cs#L328

I apologize for not providing a more minimal repro, but hopefully, this one is still small enough so that you can understand what is happening here.

Configuration

Local: .NET 8.0.100
SharpLab: Core CLR 8.0.23.53103 on x64

Author:	hexawyz
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`, `untriaged`
Milestone:	-

neon-sunset · 2024-01-07T16:54:11Z

It's probably this #96380

jakobbotsch · 2024-01-07T18:15:29Z

Thanks for the report. Indeed, this is a duplicate of #96380. We do not emit conditional instructions inside loops today because it can cause significant performance regressions, and the right analysis to predict the profitability has not yet been implemented.
I'm going to close this, but we will keep the example you have shared in mind for when we get around to working on this.

hexawyz added the tenet-performance Performance related issue label Jan 7, 2024

ghost added the untriaged New issue has not been triaged by the area owner label Jan 7, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 7, 2024

jakobbotsch closed this as not planned Won't fix, can't repro, duplicate, stale Jan 7, 2024

ghost removed the untriaged New issue has not been triaged by the area owner label Jan 7, 2024

github-actions bot locked and limited conversation to collaborators Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: CMOV aka CMOVcc not generated when method is inlined. #96593

JIT: CMOV aka CMOVcc not generated when method is inlined. #96593

hexawyz commented Jan 7, 2024 •

edited

Loading

ghost commented Jan 7, 2024

Description

Configuration

neon-sunset commented Jan 7, 2024

jakobbotsch commented Jan 7, 2024

JIT: CMOV aka CMOVcc not generated when method is inlined. #96593

JIT: CMOV aka CMOVcc not generated when method is inlined. #96593

Comments

hexawyz commented Jan 7, 2024 • edited Loading

Description

Configuration

ghost commented Jan 7, 2024

Description

Configuration

neon-sunset commented Jan 7, 2024

jakobbotsch commented Jan 7, 2024

hexawyz commented Jan 7, 2024 •

edited

Loading